Hacker News new | past | comments | ask | show | jobs | submit login

> Obviously this is a single sample but saying 90% seems unlikely.

This is such an anti-intellectual comment to make, can't you see that?

You mention "sample" so you understand what statistics is, then in the same sentence claim 90% seems unlikely with a sample size of 1.

The article has done substantial research




That fact that it has some statistically significant performance is irrelevant and difficult to evaluate for most people.

He's a much simpler and correct description that almost everyone can understand: it fucks up constantly.

Getting something wrong even once can make it useless for most people. No amount of pedantry will change this reality.


What on earth? The experimental research demonstrates that it doesn't "fuck up constantly", you're just making things up. The various performance metrics people around the world to measure and compare model performance is not irrelevant because you, some random internet commenter, claim so without any evidence.

This isn't pedantry, it's science.


And also article is testing on a different task (Needle in a Needlestack which is kind of similar to Needle in a Haystack), compared to finding a difference between two documents. For sure it's useful to know that the model does ok in one and really bad in the other, does not mean that original test is flawed.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: