> Working one-on-one with an expert personal tutor is generally regarded as the ...

> Working one-on-one with an expert personal tutor is generally regarded as the most efficient form of education [14]

> [14] B. S. Bloom, "The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring," Educational researcher 13, no. 6, 4-16 (1984).

Now quoting https://2ndbreakfast.audreywatters.com/ai-unleashed/ :

> You know what was published 40 years ago (okay in 1986, not 1985)? Benjamin Bloom's "2-Sigma Problem," a paper that many ed-tech and education reform folks still love to cite, even though its findings on the incredible effect of one-on-one tutoring have not been replicated.

For supporting evidence, see https://www.proquest.com/docview/3075409050?fromopenview=tru... :

> An experimental intervention in the 1980s raised certain test scores by two standard deviations. It wasn't just tutoring, and it's never been replicated, but it continues to inspire. ...

> As the computing and telecommunication revolutions advanced, visionaries repeatedly highlighted the potential of technology to answer Bloom's challenge. Starting in the 1980s, researchers and technologists developed and eventually brought to market "cognitive computer tutors," ... Sal Khan, founder of Khan Academy, highlighted this promise in a May 2023 TedX talk, "The Two Sigma Solution," which promoted the launch of his AI-driven Khanmigo tutoring software. ...

> It all sounds great, but if it also sounds a little farfetched to you, you're not alone. In 2020, Matthew Kraft at Brown University suggested that Bloom's claim "helped to anchor education researchers' expectations for unrealistically large effect sizes." Kraft's review found that most educational interventions produce effects of 0.1 standard deviations or less.

Huh, and 'Matt Kraft reported that effects of educational interventions generally-not just tutoring-are about twice as large when they are evaluated based on narrow as opposed to broad tests.' as well as 'Burke's and Anania's two-sigma intervention did involve tutoring, but it also had other features. Perhaps the most important was that tutored students received extra testing and feedback'.

That's a pretty interesting read about the limitations to tutoring! Now back to the Harvard paper.

My quick read of this paper says the researchers, who are from physics and engineering, studied one class over two weeks, computed a bunch of numbers, and decided it was significant.

The intro points out "Despite this recent excitement, previous studies show mixed results on the effectiveness of learning, even with the most advanced AI models2,3." so they know the possible signal is not that strong.

Give the history, I think it's unwise to make such a strong claim without a longer baseline. I don't see how the test can eliminate possible other factors.

I would also have liked to see the test pre-registered.

And there's selection bias as there are likely a lot of experiments like this being run where the lack of success is deemed not interesting enough to publish.