You can't specify everything in the rules. For example, I know a case where management decided to hand out bonuses to the staff for minimizing inventory, because inventory costs money. The staff minimized inventory, all right, and got their bonuses. Meanwhile, the production line would regularly get halted because they'd run out of things like 5 cent resistors. It was a disaster.
The same company decided to rate programmers based on the bug list. A big chart was posted on the wall that showed the daily bug count. It wasn't a week before huge fights erupted over the bug count - over what was and was not a bug. The programmers quickly gamed it. They'd hide bugs, they'd refuse to fix one bug if that fix would produce some other minor bugs (e.g. if the bug was "feature X missing", then feature X is added, but had a couple issues with it, then X would not get fixed). They'd even add "bugs" blamed on other programmers and then "fix" them to get the credit.
Management gave up on that after two weeks and pulled the banner down.
For teachers, the metric is (roughly speaking) "% of students capable of multiplying/dividing numbers up to 4 digits in a standardized test setting". How do you game this metric?
The fact that one company used a couple of bad objective metrics doesn't mean all objective metrics are bad. They are used with a great deal of success in many fields. Sales people are paid on commission, traders are paid proportionally to (risk adjusted) profits, etc. All it means is that if you set the wrong goals for your company, you'll probably succeed at the wrong thing.
So tell us - is "maximize the % of students capable of multiplying/dividing 4 digit numbers" the wrong goal? If so, what is the right goal, and why can't it be measured?
One popular way of gaming the metric is systematic cheating on the tests by the teachers. I say popular because it has happened on a large scale, most recently in Atlanta as I recall.
Another way to manipulate the test results is to manipulate which students are in your class or your school.
The point is, people are endlessly creative in subverting rules to their own benefit - so they conform to the letter of the rules but not the spirit.
Consider also the "work to rule" technique used by unions as a bargaining tactic. It's as simple as the workers literally adhering to their job descriptions. It doesn't work out well for the company.
To prevent cheating, test administration should be handled by someone other than teachers.
The fact that the current system has a bunch of cheaters is not an argument against more carefully and objectively measuring the current system. What next - bankers sometimes engage in rogue trading, so we should reduce monitoring of their behavior?
Another way to manipulate the test results is to manipulate which students are in your class or your school.
This is very difficult with VAM, since the goal is to increase (actual score - statistically predicted score). You need to reliably identify students who will do better than their statistical predictor.
I.e., you need to discover students who will improve drastically this year and then pack your student body with them.
> "% of students capable of multiplying/dividing numbers up to 4 digits in a standardized test setting". How do you game this metric?
You use the time that you used to use for teaching reading and devote it to teaching methods for multiplying and dividing numbers up to 4 digits in a standardized test setting.
You might say this isn't gaming the test, it's just a stupid way for schools to optimise their results for that metric. But that metric is mechanical; there's no measurement for deep understanding of the principles, just success or failure for applying mechanical rules by rote.
Most US standardised test settings use machine marked multiple choices. I'm told that there are lessons for 'bubbling in' - lesson on how to fill in the multiple choice answer bubbles to ensure fewer mis-marked answers.
You use the time that you used to use for teaching reading and devote it to teaching methods for multiplying and dividing numbers up to 4 digits in a standardized test setting.
Clever - you caught me. I got lazy and didn't feel like typing up all the goals of a 3'rd grade education system into an HN comment, preferring only to provide a simple example and hoping that a reasonable reader would extrapolate.
Mea culpa - I'll stop assuming reasonable readers in the future.
Or are you actually suggesting that the school system might forget to include reading when defining their goals?
> up all the goals of a 3'rd grade education system
Can all the goals of a 3rd grade education system be reduced to a purely mechanical list of stuff?
Let's try applying your example of arithmetic to reading. What are the goals? To get children to read individual words? Or to get children to read a sentence, and obtain meaning from it? If it's just to read words, do you get those words from a defined vocab? Should they all be real words, or do you include nonsense words too?
While schools may not have stopped teaching students to read they have cut out other parts of the curriculum to focus on what's being tested.
And there's a risk of a cut off point - there's a bunch of children who will read, and there's a few children who are struggling to read. Do you spend extra time and money on the few strugglers (some of whom are going to fail whatever you do), or do you concentrate on the majority (and get most of them through the test and thus look good)?
Can all the goals of a 3rd grade education system be reduced to a purely mechanical list of stuff?
Yes, I would hope that a multi-million dollar enterprise can clearly define their goals.
What are the goals? To get children to read individual words? Or to get children to read a sentence, and obtain meaning from it?
I don't know off the top of my head whether the latter should be learned by 3rd grade. Ultimately setting the goals of our educational system is up to the various bureaucrats in the school system.
However, regardless of what the goal is, you still haven't given an way to game the system apart from "teach kids to read [words/sentences]".
While schools may not have stopped teaching students to read they have cut out other parts of the curriculum to focus on what's being tested.
Indeed - if the school system is not achieving their primary goals, they should cut secondary goals and focus on the primary ones. That's a good thing.
Do you spend extra time and money on the few strugglers (some of whom are going to fail whatever you do), or do you concentrate on the majority (and get most of them through the test and thus look good)?
This depends on what the goals of the school system are. If you want to ignore the strugglers, set the goal to be maximizing this function:
student_scores.max()
If you want to help the strugglers and ignore the strivers, choose this one:
student_scores.min()
Choose this one if helping a struggler is equally important with helping a striver:
student_scores.mean()
This one is like the previous, but strugglers get a bit of extra weight:
log(student_scores).mean()
Setting a goal merely forces you to acknowledge possible tradeoffs and decide which ones should be made (if the need arises).
> However, regardless of what the goal is, you still haven't given an way to game the system apart from "teach kids to read [words/sentences]".
Sure I have. You ignore everything that is not tested. This gets you children that pass the tests. But it ignores all the other work that schools should be doing, and it reduces education to the worst, least inspiring, mechanical drudge work.
> I don't know off the top of my head whether the latter should be learned by 3rd grade. Ultimately setting the goals of our educational system is up to the various bureaucrats in the school system.
They can't agree. That's why I listed nonsense words in one of the requirements. There's an argument about whether phonics or whole-word approaches are better, even though we have good research showing that phonics is better. And so when you look at phonics methods (include nonsense words in the tests), you get disagreement in the phonics camp, and you also have all the non-phonics people piling on.
But this should be easy to discover, right? We have millions of children learning to read each year. We randomise them, we set up a control group, we give other groups different methods. Then we test. (assuming we can get agreement on what and how to test.)
But it ignores all the other work that schools should be doing...
Such as?
But this should be easy to discover, right?
No. Choosing your goals is about subjective value choices. If nonsense words are intrinsically valuable, they should be included, otherwise they should be excluded.
You are conflating the setting of goals with the method used to achieve them. If phonics is superior (I agree with you that it is), it will achieve higher scores. If teaching children nonsense words helps them understand real words, then teachers wishing to maximize their score will teach them.
Regarding stating goals clearly: The problem is that the real goal is something like "maximize future student happiness" or perhaps "maximize future student income", and any set of test goals is therefore necessarily an approximation. I think that if you want to make this argument, and it seems like a reasonable one, the thing that actually needs to be shown is that some specific set of easy to write down test scores does a reasonable job as an approximation, or at least could in principle if one could just find the correct test.
...the thing that actually needs to be shown is that some specific set of easy to write down test scores does a reasonable job as an approximation...
All we really need is to believe that it's a better approximation than the alternative, which is currently something along the lines of 0.5 x Principal's Opinion + 0.5 x Union Seniority (at best).
Most of these issues can be addressed simply by completely separating education and evaluation. I.e., teachers take the day off when tests are administered, some bureaucrat shows up and does it instead.
Treating damaged test materials/absent students as a score of 0 is the simplest way to prevent hacking the student body.
vie for the most favorable students by pulling strings with the administration
The most favorable students are those who will perform better than their statistical predictor. How do teachers know who those will be?
The same company decided to rate programmers based on the bug list. A big chart was posted on the wall that showed the daily bug count. It wasn't a week before huge fights erupted over the bug count - over what was and was not a bug. The programmers quickly gamed it. They'd hide bugs, they'd refuse to fix one bug if that fix would produce some other minor bugs (e.g. if the bug was "feature X missing", then feature X is added, but had a couple issues with it, then X would not get fixed). They'd even add "bugs" blamed on other programmers and then "fix" them to get the credit.
Management gave up on that after two weeks and pulled the banner down.