In her Answer Sheet education blog in the Washington post, Valerie Strauss has made a practice of bringing in many voices to enrich conversations about a wide range of education topics. I find that a recent column relates to my last post on teacher professionalism.
Adam Jordan and Todd Hawley make the point that the question, “Are teachers professionals?” has been kept alive primarily because of the way that schools are such a focus for political, ideological, and social debate and engineering:
From our perspective, the question of whether teachers are professionals has been allowed to persist primarily due to one simple truth: Lots of folks who are not teachers have plenty to say about teachers and education….When decisions about how to best educate children are made by people who have never been teachers, then we have a problem — one that leads folks to believe teachers aren’t professionals.
A central piece in their argument relates to whether teachers’ judgment about education policy is taken seriously (mostly, not). Take the example of “growth measures” or “value added measures” (VAMs) using student test scores to evaluate teacher quality.
State legislatures keep passing laws that base teacher evaluation on value-added models, or as they are commonly called, “growth models.” Teachers knew from day one that this was a ridiculous idea because they know that these growth models are based only on standardized test scores and solid student evaluation is much more complicated than that.
Even when teachers’ arguments were backed by such organizations as the American Educational Research Association and the American Statistical Association, with strong technical evidence, “value-added models are still used to evaluate their effectiveness.”
Misunderstandings of what education is, and therefore how it can best be measured, are supported by a range of ideological commitments that tend to make that measurement as simple and “cost-effective” as possible. Politicians and many policy makers want to be able to tell a simple story: Johnny or Jane can be scored as Successful or Not Successful, based on a simple and reliably computed measurement (usually a test score), and the outcome is a result of a few reliably computed characteristics of their teachers.
Yet there is persistent evidence that this sort of approach, however satisfying in operational management terms, is not particularly appropriate for understanding what is happening in schools. Quite aside from technical weaknesses in the VAM model (developed by an economist with little understanding of the system he was modeling), there are deep issues with the reliability of standardized tests — see here a recent article reinforcing a long-established point about non-school factors that strongly affect test scores. The author, Christopher Tienken, writes
We decided to see if we could predict standardized test scores based on demographic factors related to the community where a student lived. By looking at three to five community and family demographic variables from U.S. Census data, we have been able to accurately predict the percentages of students who score proficient or above on standardized test scores for grades three through 12. These predictions are made without looking at school district data factors such as school size, teacher experience or per pupil spending.
As he elaborates on a series of studies his team has conducted to explore the impact of non-school factors on test scores, Tienken also reminds us that there is strong evidence that standardized test scores are not as good a predictor of student success in college as is GPA — the average of grades assigned by teachers over the course of the students’ high school ‘careers.’ Though there is legitimate debate about how to evaluate teachers’ skill and performance, there are good reasons to be skeptical about systems that, derived from other areas of research, are not adequately adjusted to actually apply to education, and are not used with respect for their limitations.
Audrey Amrein-Beardsley has a recent discussion about a cascade of measurement assumptions that have shaped discussions about VAMs as opposed to other measures, and given rise to the “Widget effect,” the “national failure to acknowledge and act on differences in teacher effectiveness. The claim is that meaures of teacher effectiveness are not well-enough designed to detect differences that must exist between better teachers and worse teachers. Amrein-Beardsley points out that, though variation in teacher “effectiveness” undoubtedly do exist, most of the “widget” discussion starts from the assumption that a more satisfactory meaurement scheme should show teacher variation in something like a normal distribution, and therefore if your measurement system doesn’t show this, it should be tweaked until it does so. As Amrein-Beardsley writes,
What this means in this case, for example, is that for every teacher who is rated highly effective there should be a teacher rated as highly ineffective, more or less, to yield a symmetrical distribution of teacher observational scores across the spectrum.
In fact, one observational system of which I am aware …is marketing its proprietary system, using as a primary selling point figures illustrating (with text explaining) how clients who use their system will improve their prior “Widget Effect” results…Evidence also suggests that these scores are also (sometimes) being artificially deflated to assist in these attempts
(see the whole post for careful discusison, references, and links to further studies).
I know I am not alone in feeling that we are in the grip of persistent, inappropriate reductionism in policy-making. It’s important to make sure that our models are as simple as possible — but not so simple that they don’t actually model the system. Since lives ( life-courses) of actual people are at stake (of students, of course, but also of teachers, parents, and beyond, since we all are affected by the lived outcomes of education!), our models and systems do not trick us into forgetting the actual nature of things (save the phenomena!).