One of the great things about sport is its cruel clarity: there is no such thing, for example, as a bad one-hundred-metre runner, or a hopeless centre-half who got lucky; in sport, you get found out. … There are, however, plenty of bad actors or musicians or writers making a decent living, people who happened to be in the right place at the right time, or knew the right people, or whose talents have been misunderstood or overestimated.
Nick Hornby, in whose Fever Pitch that passage appears, is, of course, right. But it’s not just sport in which you get found out. There are other parts of the world of work in which performance can be fairly easily measured.
The salesman, for instance. A good salesman is one who makes more sales. Yes, of course there’s more to business than sales, and no, I’m not saying that ‘a good salesman’ is the same thing as ‘a good person,’ and no, I’m not even saying that being ‘a good salesman’ is equivalent to ‘deserving’ (whatever that means) something. Nor am I saying that there’s no element of chance in being blessed with those skills – and, indeed, particularly for women, an appearance – which can make someone good at sales. But what I am saying, which I don’t think is very controversial, is that measuring the performance of a salesman is straightforward. Most of the time the ‘best’ salesmen are those who accrue the most sales. Easy.
The cutting edge of financial services seems rather like this too. Again, I’m not endorsing the morality of the City of London. I am saying that it’s relatively easy to measure the performance of a hedge fund manager. How much money did he make for his clients? That’s all that matters. Done. Measured. Easy.
When, as a schoolboy, I worked in a warehouse, that was easy to measure too. Did I turn up on time? Did the pallets get loaded onto the lorries? Were the boxes wrapped up properly, so they wouldn’t fall over when the lorry was half-full? Yes? Fine. Then call the agency back and say yeah, we’ll have that kid again. Was he a bit odd, bringing Frank Barlow’s biography of Edward the Confessor to read during lunch, or on overnight shifts where the antisocial hours were compensated for by longer periods of idleness? Sure. Did it matter? No. Nobody cared.
But not all jobs are like this. Some just don’t lend themselves very easily to measurement. Is a good author one who sells more books? Well, yes, obviously, and in my appallingly partial and philosophically naïve view no one deserves to be multimillionaires more than JK Rowling and Julia Donaldson. But equally the argument that Vincent van Gogh was a bad artist because he only sold one painting in his lifetime is not a particularly persuasive one, is it?
This is, of course, a dangerously seductive argument. We can kid ourselves that we are van Goghs, our talents underrated by a few key decision-makers. Most of us aren’t, of course.
But where does teaching sit on the spectrum of Art to Warehousing?
You already know what I think, don’t you?
Rob Coe doesn’t agree with me. He thinks that using test results – that is, pupils’ test results – is the best way to assess teacher effectiveness. And yes, I’m conceited enough to think that I know better than a Professor of Education.
You just can’t measure the performance of a teacher the way you can measure the performance of a salesman or a hedge fund manager. It doesn’t work. I’ve banged on about this before, as have many teachers more eloquent than I.
Hang on. What if you had two teachers. Both at the same school. Teach the same classes, because they share them. Every year the same. Several years on end. Then, surely, you can look at the pupils’ results and say ‘aha, look, there’s a pattern here. We’ve got hundreds of data points over several years. We can see that Mr Happy is better than Mr Grumpy.’ Can’t we?
No. We can’t. Look, I understand that accountability is important. Gone are the days, like it or not, at least in the Anglosphere, that the State would simply hand over a pot of money to a school and say ‘there you go, spend it wisely.’ Equally we’re nowhere near a situation whereby the operation of the free market in primary and secondary education would be politically palatable. Someone has to decide whether public money is being wasted or not: to do otherwise would be intolerable. I get it.
But just because there’s a problem, doesn’t mean there’s a solution.
In my first year of teaching, there were two sets of remarkable results. One was internal: the Third Form class (the fourth ‘stream’ of eight overall) which I taught did significantly worse than the set immediately below them. This was embarrassing for me, and I worried about it. Until, that summer, a (Physics-teaching) colleague showed me some dissertation he’d done for some educational qualification.
Now this colleague had taught the fifth ‘stream’. And there, in his portfolio, signed off by some senior figure in the school, was a boast about how much better his class had done than the set above, with figures to prove it. I did not have access to the data which might have revealed whether or not he was particularly good and I was particularly bad, but a couple of discreet inquiries revealed that this particular phenomenon had been replicated in some – but not all – other subjects too. Set Five had overperformed, and Set Four had underperformed.
All right, you might say, but a reasonable person looking at all the date would see that, right? Maybe. But what about the other set of results? I shared an A Level class with my Head of Department. Eight boys. Of the eight, four did better on my paper, and four did better on his.
Now this, in retrospect, was very odd. (At the time it was just a relief.) He had a decade’s teaching experience; I had none. He was, and is, an excellent historian and teacher. Why had he done no better than me? If he was asked, here’s what he could have said. He’d spent a lot of his time helping me, time which he couldn’t devote to that class. He’d made more ‘comparative’ points than he’d usually do, to help support those pupils who he knew had a novice teacher, and that had detracted from his delivering of his own side of the course. (He taught sixteenth-century Spain, while I taught sixteenth-century England, so his superior explanations of (say) religious doctrine will have helped pupils with my side of the course too.) He gave me the choice of papers to teach. And maybe, for that particular group of pupils, the paper I was teaching them was easier than the paper he was teaching them. It was the OCR ‘synoptic’ paper, the paper which was supposed to cover continuity, change & development over a century. They were loveable but essentially idle young gentlemen for whom getting a C grade in that enterprise was easier – because it involved more ‘blagging’ and less detail – than getting a C grade in the other paper, which was centred around a ‘great man.’
You and I both know that there are many, many school managers who would have considered these to be unconvincing excuses for his underperformance.
Then I moved schools, and taught AS for the first time. This was also the OCR A Level History paper. But this time, at the end of my first year, the pupils whom I’d shared with a colleague did significantly worse on my paper than they’d done on his.
Why was this?
Well, this colleague was a cleverer and more talented teacher than me. Yes, he was. That’s not false modesty. I’m amazing. He was better. But there was more to it than that. Because of course there was. He taught the English Revolution, while I taught the French Revolution. (A great pair of topics to teach together, by the way.) But that meant, with the preposterous old system of three AS exams, that there was an uneven timetable split: he saw them twice as often as I did, and taught them for two papers. (1629-49, & 1649-1660) So what? Well, so he had four lessons a week with them, and I only two: multiply that by two and he had eight lessons a week with those classes while I had four. This meant that I got another two Lower School classes to prepare lessons for, mark work of, and write reports for. (Yes, of course the school policy was that work should be marked once a fortnight at least, and no, of course it wasn’t more nuanced than that.)
Furthermore, the examination was structured thus. You might remember it. There were three separate papers, but they were all sat together, one after the other. Our pupils sat the English papers followed by the French paper, so by the time they came to approach the paper I’d taught them, they’d already been sweating and scribbling for nearly two hours. Anyone who has marked examinations will know that the last questions tend, all other things being equal, to be done worse.
Oh, and did you notice? Better results for British history twice. Because pupils find it easier? I think so. There’s some familiarity with, say, Queen Elizabeth & Henry VIII, or even with the Civil War, which there isn’t with Philip II of Spain or the French Revolution.
I’ll give you one more example. For the last seven years I shared Upper Sixth Politics teaching with one other colleague. We did the Edexcel Route A stuff: UK Political Issues (him) and EU Political Issues (me).
My results in the A2 exams were significantly better than his. Because I was a better teacher? Cobblers. His paper was harder. Nowhere did it ever say so. But it was. To do well in EU Political Issues you just needed a far less sophisticated level of understanding than you did for UK Political Issues. Little factoids just went much further than they would on the other side of the course. This wasn’t because of the nature of the assessment: both papers were structurally identical, as were the generic mark schemes. My personal suspicion is that examiners just know much less about EU politics than UK politics, and consequently tend to over-reward the deployment of seemingly-obscure detail which are actually not remotely impressive for someone who has studied the subject for an academic year. (This isn’t my own idea though: a former colleague explained his exceptional results in the ‘Britain & Ireland 1798-1922’ A2 History paper in similar terms. The same examiners who marked ‘Russia & its Rulers 1855-1964’ also marked that paper, but – knowing much less about it – tended to over-reward, or reward structural soundness over academic argument, and the latter is rather easier to drill.)
Something which convinces me that this is true is that after my first year I decided to drop the EU and teach American government instead. I thought it’d be more approachable, and there are more textbooks. (No, in those days I wasn’t the inveterate textbook-hater I am now. This whole business helped to persuade me.) Instead, my results fell to just below my colleague’s. Why? Well, I’m not sure, but I think the overall standard expected in that paper was higher; and the existence of a textbook fostered the sort of lazy thinking & approach to the subject which was, in the end, my pupils’ downfall. So, after two years on the dark side, I came back, and the uneven results returned.
Well, if I’m right, can’t we factor this in when holding teachers to account?
Good luck with that.
No, seriously. I think that what I’ve written was true, on aggregate, for the pupils I taught. But of course it wasn’t true for all of them. That A Level ‘synoptic’ paper, which those pupils found relatively easy? Okay, but cleverer, more industrious, more erudite pupils often found it much harder, whereas they’d find the ‘great man’ paper easier to excel in. That wasn’t the class we had that year. But it’d be another class we’d have another year.
Do you think I’m making all this up? I hope not. But if you were a Deputy Head Academic, and I said this to you, after my pupils had got bad results … would you assume I was telling the truth? Or would you assume I was making excuses? The answer, of course, is that most likely you’d be influenced by two things. Did you rate me? (A prejudice you might well have formed without results in public examinations.) And did you have bosses who’d want chapter and verse on how you’d investigated anomalies in results, or not?
And if you think I’m not making this up, and that these influences are real … do you think you can distinguish real reasons for anomalous results from the excuses of substandard teachers? In every subject? Really?
And all this is before we even start the question of whether you can possibly compare results in coursework with results in examinations. I think most teachers would agree that these are not the same, though would find it impossible to produce a realistic, meaningful way to fairly compare outcomes in those two rather different types of assessment.
I know this is inconvenient. But just because there’s a problem doesn’t mean there’s a solution. I don’t want teachers booking computer suites, putting their feet up, telling pupils to research a topic on the internet instead of teaching it, and then saying afterwards, when those pupils get bad results, that those results are meaningless and that they can’t be judged by them. I really don’t.
But that doesn’t mean that I can pretend that exam results can tell us much about the effectiveness or otherwise of teaching. It’s just too complicated and too difficult. It might be satisfying or reassuring to think that, if only we get the right data and interpret it in the right way, we’ll be able to rank teachers from one to four hundred thousand. But we can’t, and pretending that we can is unlikely to make things better.