Hacker News new | past | comments | ask | show | jobs | submit login
The Average Student Does Not Exist (gradescope.com)
89 points by ibrahima on June 21, 2017 | hide | past | favorite | 82 comments



Somehow despite all the conversations around education in the US the education system still sucks. I went to one of the highest funded (amount spent per child) public schools in my state, and as far as I am aware it was far behind in terms of curriculum strength compared to what my parents were taught in the Soviet Union at the same age.

I mean we didn't read a classic American author till 6th or 7th grade! And if I recall correctly there were still M&M's in math class in grade 4!

The US may have an education problem but somehow the Soviet Union and China did fine years ago with out all the ed-tech snake oil.


> in the US the education system still sucks

citation?

Education is a complex matter. There are many people with OPINIONS on what the best way to teach is. These ideas are in conflict and only rarely does anyone study what really works. (rarely compared to the number of opinions - there could be a lot of studies that nobody knows about when they state their opinion)

Humans have a limited lifetime: you cannot teach all possible useful knowledge/skills in a lifetime. I limited this to useful, there is a lot of useless things that are fun to know anyway, somehow those are are interested need time to learn it for fun. I didn't define useful either: is Music/French/Algebra/Sports... useful (I can make either argument for any subject)

Why is reading a classic American author important? Reading is important in an abstract sense, but if you can understand written instructions it doesn't matter what you happened to read to get that skill.

Likewise, what is wrong with using M&Ms for learning math? a concrete example helps to learn. (to be clear, this is an opinion that I was ranting against in the first paragraph - I don't know if I agree with the opinion but I understand it enough to repeat it)

One constant in the US in popular culture is our education system sucks compared to X. We have done well over the years despite that (or maybe because of it?)


> citation?

http://www.businessinsider.com/pisa-worldwide-ranking-of-mat...

> I didn't define useful either:

> but if you can understand written instructions it doesn't matter what you happened to read to get that skill.

Of course you are free to define useful in a way that makes it impossible to argue or to have a discussion. So let's stick to the way it is defined for the purpose of say University admission.

> Why is reading a classic American author important?

Reading difficult work earlier develops higher reading comprehension faster.

> Likewise, what is wrong with using M&Ms for learning math?

I think if by 4th grade you still need concrete pieces to understand integers or denominators of a fraction or whatever they were supposed to represent, that is a sign of a weak math education. In general concrete examples are antithetical to learning advanced math, this leads do the monkey-style ability to solve problems that are similar ones presented in textbooks, but not the ability to reason effectively about an unfamiliar problems.


I dont think PISA ratings are that useful, but it does show USA ahead of Russia and China for Reading and ahead of Russia for Science so hardly evidence that USA is worse.


> Reading difficult work earlier develops higher reading comprehension faster.

What makes a classic American author better than a modern author who writes at a high level? (Note that most popular authors don't write at a difficult enough level, but out of the thousands of books published each year some will be high enough - many authors of old did not write at a high enough level either)


Classics are classics for a reason, they've stood the stand of time and scrutiny as literature of value.

Reading level of the material aside, I think it's more valuable to read The Catcher in the Rye than The Hunger Games because of the subject matter and impact on popular culture.

Classic literature is genre defining and gives you appreciation for the art of novelization.

It's hard to gain an initial appreciation for reading if you don't enjoy the reading you do, which is a good argument for the bestseller list, but it's hard to gain any depth of appreciation without understanding it's roots.

You might say you like hip-hop because lil-yachty made your head bounce on the radio, but without listening to N.W.A. you can't really say you understand it.


> We have done well over the years despite that (or maybe because of it?)

It helps that your graduate schools and corporations are full of people educated in other countries. Immigration is great.


> I went to the highest funded (amount spent per child) public school in my state, and as far as I am aware it was far behind in terms of curriculum strength compared to what my parents were taught in the Soviet Union at the same age.

This is because just pumping money into failing schools does not magically turn them around. There is little correlation between per capita secondary education spending and student outcomes.


All else being equal, schools with substantial numbers of special needs students will have much higher expenditure per pupil because they are so disproportionately expensive.

Of course funding per pupil isn't correlated to outcomes. Funding per pupil normalized to their levels of needs and preparedness might be.


Funding works strangely in USA public education. Schools in any given district seem to have a "hull speed" when it comes to money.

Once a certain amount of dollars are actually reaching the class room, adding more dollars will simply see most of the additional funds absorbed by hiring more administrators, prestige projects like sports facilities, "classroom technology" projects etc.

To detect this limit, simply check the level at which teachers begin paying for school supplies for their students from their own pockets and then back it off about 10%.


> check the level at which teachers begin paying for school supplies for their students from their own pockets and then back it off about 10%.

Surely "add on about 10%"?


You don't know many teachers. The system consumes a little bit of teacher's altruism as a raw material, converting it to extra money for those other things mentioned. Too much and turnover becomes too high so they seem to carefully feel out to edge. You can shear sheep over and over but skin them only once.

Every public school teacher I've ever known (including the several in my family), from rich districts to poor, end up buying basic supplys for their students because they can't get the district to provide them. Its all very 20th century soviet. I've know teachers who waited six months for light bulbs before finally going to Home Depot and buying them themselves.


It tends to be a negative correlation.

In the Northeast US, you'll generally see the best performing districts have a lower amount spent per child than the underperforming districts.

The underperforming districts will have higher property taxes (as a result of the higher education cost). This generally leads to parents seeking to move to a different school district for financial and educational reasons.

In education, at least, more money does not equate to better students, but instead, more mismanagement.


> It tends to be a negative correlation.

This definitely needs a citation. It might not have significant correlation either way, but I cannot find a reference for the former (some cursory googling [0][1]).

[0] https://www1.udel.edu/johnmack/research/school_funding.pdf [1] https://object.cato.org/sites/cato.org/files/pubs/pdf/pa746....


Special education students are more expensive to educate than bright students.

You give a gifted student a $100 book and let them get after it.

You give a troubled behavior student with multiple LDs a full-time ed tech at $30k per year salary minimum, or whatever else is required, by federal law, to fulfill their IEPs.


Ugh, I didn't stop to consider the special education component (and its cost). That's my bad.

This reminds me of a similar theory in regards to affluent towns with low taxes that have minimal social programs, that "export" their elderly to nearby cities with higher taxes but have programs such as Paratransit and Meals-on-Wheels.


I question whether students are ready to read classic authors before middle school at the earliest. Perhaps one can read Huckleberry Finn as an adventure story before that, but is that more than surface familiarity? Don't know about M&Ms, though.

Anyway, as I say again and again: there isn't one US education system. Within the District of Columbia, a populous but geographically small area, there are practically if not legally speaking six or seven at least: public schools, magnet; public schools prosperous; public schools shaky to desperate; parochial schools; private schools; charter schools. And within the parochial, private, and charter school worlds there are considerable differences.


I see a lot of Russians and Chinese emigrate to America to bring up their children. I dont see any sending their children back to get the "superior" education there.


[flagged]


It's chocolate, not cocaine. Calm down.


What if I as a parent don't want my children exposed to chocolate (to not form terrible eating habbits)? Not to mention that it's not JUST chocolate - it's a complex mix of chemicals engineered in a lab with a goal to be the most profitable for the company (while children's wellbeing is nowhere near the list of goals). What's wrong with using vegetables or fruits?


To be honest, I like that this article tries to perform simple analyses, but find their rationale pretty confusing.

This kind of data is commonly modeled using item response theory (IRT). I suspect that even in data generated by a unidimensional IRT model (which they are arguing against), you might get the results they report, depending on the level of measurement error in the model.

Measurement error is the key here, but is not considered in the article. That + setting an unjustified margin of 20% around the average is very strange. An analogous situation would be criticizing a simple regression, by looking at how many points fall X units above/below the fitted line, without explaining your choice of X.


Totally agree that this is not a fully rigorous analysis, and we do want to dig deeper and try to extend some IRT models to these types of questions.

The main point of this post is to highlight that the most common metric of student performance may not be that useful. Most of the time, students will get their score, the average score, and sometimes a standard deviation as well. As jimhefferon mentioned in a response to a different comment, the conventional wisdom is that two students with the same grade know roughly the same stuff, and that's seeming not to be true.

We're hoping to build some tools here to help instructors give students a better experience by helping them cater to the different groups that are present.

disclaimer: I'm one of the founders of Gradescope.


I agree with your point, that the average likely misses important factors (and think the tagging you guys are implementing looks really cool!).

However, I'd say that the issue is more than having a non-rigorous analysis. It's the wrong analysis for the question your article tries to answer. In the language often used in the analysis of tests, your analyses are essentially examining reliability (how much do student's scores vary on different test items due to "noise"), rather than validity (e.g. how many underlying skills did we test). Or rather, they don't try to separate the two, so cannot make clear conclusions.

I am definitely with you in terms of the goal of the article, and there is a rich history in psychology examining your question (but they do not use the analyses in the article for the reasons above).


You brought a smile to my face. I came here to post this same point.

The piece is kind of making a basic fundamental mistake in measurement, assuming that all variability is meaningful variability.

There are ways of making the argument they're trying to make, but they're not doing that.

Also, sometimes a single overall score is useful. A better analogy than the cockpit analogy they use is clothing sizing. Yes, tailored shirts, based on detailed measurements of all your body parts, fit awesome, but for many people, small, medium, large, x-large, and so forth suffice.

I think there's a lesson here about reinventing the wheel.

I appreciate the goals of the company and wish them the best, but they need a psychometrician or assessment psychologist on board.


I do agree that applying psychometrics would be great, but it's not as simple as it sounds -- the vast majority of work is on multiple choice questions, or binary correct/incorrect. There is some on free response, but much less.

We aren't trying to make a rigorous statement here -- we're trying to draw attention to the fact that the most common metrics do not give much insight into what a student has actually shown mastery of. This is especially important when you consider that the weightings of particular questions are often fairly arbitrary.

I certainly agree that all variability is not meaningful variability, but I'd push back a bit and say that there's meaningful variability in what's shown here. We'll go into more depth and hopefully have something interesting to report.

I've also seen a fair number of comments stating that this is not a surprising result. I'd agree (if you've thought about it), but if you look at what's happening in practice, it's clear that either many people would be surprised by this, or are at least unable to act on it. We're hoping to help with the latter.


IRT modeling doesn't care much whether an item is free response or not, just the scale on which it's scored. Binary and polytomous scoring = IRT model. Continuous scoring = Factor analysis.

If by mentioning free response, you mean students are unlikely to guess the correct answer, even when they don't know it, it's a 2 parameter IRT rather than 3.

Best of luck! :)


Does this article say anything more profound than, "If you roll 10 dice, you'll expect a score of 35, however any pair of rolls which sum to 35 are unlikely to be similar."

All the worst students will be very similar and all the best students will be very similar because the number of available states is low. Average students are all unique in their average-ness.

Am I missing some subtle statistical understanding that the toy example doesn't capture?


I think the article's contention is that on-the-ground teachers expect that two people coming out of a high school Algebra II with C+'s are similar. (Certainly that is my working hypothesis.) The article argues that it ain't so.


That's interesting that it's your working hypothesis! I have never thought grades correlated very well with anything at all. It's interesting to hear from someone who does not intuitively view it that way.


The sets of dice which have equal sums will often have different constituent values.


The article's contention is that on-the-ground teachers expect that two people coming out of a high school Algebra II with C+'s are similar.


This is exactly what we think is a fairly common attitude -- thanks for stating it so clearly! It has ramifications both within a single class and when you think about how prerequisite and dependent classes are structured.


How do you think it could be done differently? Student need to be judged for who moves ahead. That is, I have people in Calc I and I have to decide who moves on to Calc II. I can't send the next instructor a poset of their competence. I cannot require that everyone be competent at everything. I wonder what is your proposal?


You are missing that students have multiple dimensions they can be compared on.


The dimensions are the dice.


>Out of 4,063 pilots, not a single one fell within the average 30 percent on all 10 dimensions.

I wondered about a very similar problem some weeks ago. I was bothered about the terms "ectomorph" and "mesomorph" because they seemed useless once you considered height: the vast majority of "ectomorphs" seemed to be taller than the average while the vast majority of "mesomophs" seemed to be of average height, so there's no point to these words. And so I wondered how would shoulder width would change given height (which seems to have some kind "decreasing returns"), and how the average measures would relate to actual average build. I mean, is the "average guy" really the guy with the average height and average shoulders? Because it's not as if the scale had just changed, like doubling the size of a cube, but there seems to be some deformation going on as well.

Anyway, didn't get past the wondering phase at the time. But I think it's too much of an important problem to be casually thrown as part of a pitch. I don't see an immediate reason why the average tuple should be the tuple of all averages, because some of the variables might be "dislocated" and thus not coincide with the averages of other variables. Some guy might be very close to average height yet still somewhere in the left-tail when it comes to body mass, shoulder width or any other measure. So there might be a typical student, but I don't think this is the way to find him.


As you say, they definitely aren't uncorrelated dimensions - otherwise we would have seen ~50 pilots within one stdev for all 10 dimensions. So this simplified metaphor really isn't telling us anything about how statistics apply to students.


There is an analogy to clustering (an unsupervised learning technique) here.

Take the simple case of 2 dimensions (each observation is plotted in 2D space) with possible values of 0-10. Let's say the extreme (far from average) space is within 5% of the border. The total extreme area is (10x10)-(9x9) = 19 (i.e. 19%). Now add a 3rd dimension. The extreme "volume" in 3d space is now (10x10x10)-(9x9x9) = 271 (i.e. 27%). You can see where this is trending. Add enough dimensions, and every observation is now "extreme." They become so far apart that each observation almost deserves its own cluster, and you lose any idea of similarity.

Back to this particular article: when you _add_ (or average) all of the dimensions -- like you do on an exam -- suddenly they are close again.


Here's another look. If you have variables X_1, ..., X_n that are independent and random from normal distributions, if you want someone to be within 1 standard deviation from the mean in EACH dimension, then you are looking at a probability of that happening equal to about 0.68^n, which becomes really small for even a moderate n.


This is the most succinct and clearest explanation of what's going on. I see this discussed a lot when people talk about the curse of dimensionality. Another very simple example is the example of a n-hypercube with edge length 1/2 embedded in the unit n-hypercube. As n increases, the volume of the unit hypercube is constant (1), whereas the volume of the smaller hypercube is decreasing at an exponential rate.


A silly headline.

According to the article, the average person doesn't exist, either. I don't know many people that are 13% fluent in Mandarin, 13% fluent in English, 9% fluent in Hindi... At the same time, having ~2 hands and ~10 fingers seems about right. Some metrics work with averages, some don't.


I heard this summarized once as "The average person has one breast and one testicle."


Right, but number of hands and fingers doesn't form a bell curve in the first place.


Grades don't either, their composite is at most beta distributed and probably not even that.

First of all, finite. There is a minimum and maximum. Second, questions tend to be internally correlated. (After all, they correspond to subjects.)

Third, students are not expected to be average but pass all the questions.


This question of "what skills are students missing?" reminds me of the new teaching methods they were trying out as I started high school. The new teaching program centered around objectives. The idea was that each objective was a skill that the student needed to learn, but the upshot was that you had to score more than 70% on every single quiz to pass the class, and that you could retake every quiz you failed, repeatedly.

The implementation varied between classes - in my World History class, there were a large number of objectives, and each objective was met by a small quiz that tested ~one skill. (There were a lot of retaken quizzes in that class.) In Biology, there were about 10 objectives for the entire semester, so you could still pass while missing a few small skills, as long as those missing skills were spread out among different units.

My high school used that "objectives" system less and less as I moved up the grades -I assume that most teachers got tired of it pretty quickly and just decided to make their usual teaching material "look like objectives" rather than rebuild their curriculum in later years.


This sounds like Outcome Based Education -- one of many American education boondoggles. Good riddance to it.


Outcomes! Right, that's what they were called. Thanks for naming it.


I don't like the way this headline is written to match the article. All they showed is that students with similar average scores over multiple questions differed in their scores on individual questions. That is kind of obvious.


This makes me wonder. What is the "best" way to teach computer science to students? Universities are not trade schools (nor should they be), but it seems apparent that CS graduates in general are unprepared entering the workforce. The other extreme (bootcamps) seem to produce graduates that are more "industry ready" but only at a superficial level. These graduates seem to lack rigor/theory. Makes me wonder if there is a more optimal training path for training students.


This is a topic I've spent some time researching and talking with professional educators about. The general consensus is we have to accept that the skills needed for the majority of professional work (the CRUD apps, the web dev, the infrastructure, etc) is almost completely disparate from any of the topics under the umbrella "computer science" (which is really more of a subset of mathematics). The sooner we treat the skill of programming more like writing (as in you need to be able to write to do all manner of jobs, but very few people go to school for it) the sooner we'll produce students of all disciplines that will excel in the jobs that are most numerous.

Jobs that actually need a strong foundation in CS theory are very rare, and will continue to be and the fantasy that you need a computer scientist to manage your CRUD app is resulting in many people incredibly over qualified for their positions and, in my opinion, one of the major reasons there's so much mental illness in the technology space.


I am in favor of incremental education. (Agile if you will) Right now education is in waterfall mode. College -> Work.

Both focus on different goals and clearly they are not aligned and they shouldn't be either. I would be in favor of just getting the fundamentals to enter the workforce, get my feet wet, get a sense of how my interests match up with the market and then pursue focussed education in areas of interest.

This will require a lot of support from the academic institutions as well as progressive employers. This provides more arenas for longer and more meaningful relationships that are flexible, less rigid and can move faster to meet market needs.


Almost certainly an apprenticeship of some sort. You wouldn't hire a mechanical engineer and expect him to do a mechanics job straight out f University, which is in essence what you're asking people with CS degrees to do.


Shameless plug for UWaterloo's (and many other schools, for that matter) co-op (AKA internship) program. I had two years' work experience (6 x 4 months) by the time I graduated.


Internships/Co-ops basically serve this purpose already. As a mechanical engineer, if you haven't had any internships during your undergrad then you are going to have a very hard time finding a job after graduation. The same goes for software in my experience.


If the problem is already solved then why are people still saying it's not? (E.g. This post).

I'm an engineering grad (as opposed to a CS grad). Most of the people who graduated from our mech eng course studied thermodynamics, control systems, fluid mechanics, acoustics.

Most of those people are now working jobs where they use those skills (or some of them) day to day. A CS grad studies algorithms, discrete math, fuzzy logic, compilers, possibly some networking/telecoms. And Day to day, most CS grads are writing CRUD apps/glueing APIs together.


> Makes me wonder if there is a more optimal training path for training students.

A CS degree with at least 2 summer internships building real software ticks both boxes.


So does a bootcamp grad with 6 months experience that studies computer science in their free time.


So does a completely self-taught computer scientist.

Why throw your money away on expensive bootcamps when you can just teach yourself everything?!

My first programming gig was before college, and I was completely self-taught (this was before CS was offered in 99% of high schools).

Learning how to program was easy. I was probably a better web developer in middle school than I am now (although JQuery happened at the tail end of my web programming days, so there was a lot less complexity. Or at least a different type of complexity).

I needed the formal structure of a degree program to learn CS. Past Linear Algebra or so, the math became too difficult to learn on my own.

I expect I'm pretty typical in that respect.

If someone tells me they taught themselves how to program, I usually don't think twice about it. Just a "me too, aint it grand!" If someone tells me they taught themselves CS, I'm much more impressed (and therefore, in the case of hiring, incredulous).


What's so different about self-teaching programming that makes it possible when self-teaching CS isn't?


Not sure. My hypotheses:

1. Executable code provides a fast feedback loop that doesn't require instruction. That's hard to find especially for mathematics.

2. You don't really have to understand what you're doing in order to build something. So you can do useful stuff -- which is great for motivation etc. -- and then use that stuff to probe and gain a greater understanding.

3. The psychology is favorable to self-study because there aren't long periods of self-study before the material becomes truly useful.

4. A lot of CS is just more difficult. No one asks why Analysis is more difficult than Calculus -- it seems like a silly question on face. Maybe programming vs CS is similar.

2 and 3 are kind of a function of 1.


WT... wow. You really have CS in 99% of high schools?


Probably a huge over-statement on my part.

4,310 schools had a college credit CS course as of 2015 (cf 14,183 for calculus.) My anecdotal experience is that this number is increasing pretty rapidly. All the high schools in my area have a cs course.


> studies computer science in their free time.

Let's be realistic. Most people won't do this. People can barely teach themselves the bootcamp programming part.


Ok, then most people won't become software engineers without going to school or attending a bootcamp.

It doesn't have to be a thing that everyone does to be possible.


Internships and rigorous senior projects can help prepare students for real jobs. Having senior projects where you need to spend time documenting your work, scheduling tasks, etc can greatly help students get ready for when they start working at places where this is required from a management standpoint. My senior design project involved scheduling our project milestones, weekly meetings with the professor, daily meetings among group members, documenting the features, and even putting in purchase requests and justifications for hardware. That project plus my internships (once as a help desk tech, once in an intern group project, once working on a contract, and once working on an internal project) made me feel pretty confident to start my actual job and jump into the whole process again.


The "dream" scenario would be to leave training and mentoring to the actual companies. I understand a 5 person startup does not have the bandwidth to teach, but larger companies do yet most reject smart graduates because they don't happen to know Framework X.


I think it is up to the university to teach the students more theoretical topics and up to the students to either learn more technical topics on their own, or learn them quickly during the start of entering the workplace.


Let's not confuse "programming" with "computer scientist." I am a pretty good programmer. I am in no way a computer scientist. Asking what the "best" way to teach computer science is is akin or asking what is the "best" way to teach biology or physics or history.

A Computer Science degree does not, and should not, be the sole qualifier for whether or not you want to be a programmer.


I agree, hence I wrote "Universities are not trade schools (nor should they be)"


I see them as two sides to a similar coin.

Many strong graduates wind up in roles at major companies - Google, Facebook, Amazon, Microsoft, etc - where they are working with teams to implement things that do require research, rigor, etc. Their value as a contributor is wrapped up in theory, the code is just an implementation detail.

Bootcampers, meanwhile, often find themselves at younger companies that are more focused on shipping features and stamping out bugs - areas where the ability to write and ship code quickly is a priority. The differences between a b-tree and a red-black tree will be moot to them unless they're interviewing; going beyond binary search, hashmap, and bloom filter sees diminishing returns on investment in the near term for most small companies.


Working with real code in class. Finding what it takes to jump into a repository and start making incremental changes based on need. This could be an apprenticeship sort of thing or the professor finding some good OSS to have students look at.

This should be done in tandem with theory.


College + Internship / side job.


If my memory and understanding are correct, the way that Mathematics is graded at Cambridge is interesting here.

Questions are scored alpha for a completely correct solution, beta if the examinee demonstrated that they knew what they were doing by maybe made some small mistake, and gamma for a reasonable effort.

The bare minimum pass mark is one alpha.


That sounds very interesting, but I'm curious about the pass mark criteria. Is there some larger number of beta and gamma that can also pass? Otherwise, it seems like generating Beta and gamma scoring gives some nice measures for use by the teachers/students, but if ultimately passing only relies on alpha, it's a lot like any other math scoring.

It becomes a little like companies saying they value x & y, but take action only aligned to z.


Each problem in the exam is worth some number of points (20), I think that the aim is to ensure that you can't pass by mediocre performance across many problems, that a bare pass indicates that you have retained enough knowledge/ability to get basically correct answers on two problems.

Explicitly the aim is to eliminate students who haven't deeply understood some aspect of the curriculum, so accumulating lots of partial results is exactly what they don't want.

It's worth noting that out-right failure is extremely rare and subject to an appeals process etc. Partly this is because this is a set of exams at the end of each year of instruction with no mechanism for a re-sit, so a student who fails will not graduate (the system isn't totally barbaric, there are mechanisms in place to handle health related concerns etc.).


Cambridge the city or the university?


The university.

I looked up details, they can be found at https://www.maths.cam.ac.uk/undergrad/course/schedules.pdf

My memory is a little off. There is no gamma. And the pass is (in the first year) is 2alpha * beta


When I read that the data was collected from 1500 CS finals, my immediate guess was that the class was CS61A.

---

I suspect that the distribution of the curve has to depend on: subjectiviness of the test and on the grading. Tests with questions where you know it or you don't. And how much partial credit graders are willing to give.


what about the 10X student?


That I'm not sure about but a 10X Engineer is for real :)


Given a large enough sample size, I'm sure you'll find such a student. Additionally, you will have plenty of students who beat the average and are below average. Performance below or above average matters because student performance is ranked while cockpit dimensions are not.


Average is just a statistical concept - in reality there is no average.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: