More

mynameisjody · 2025-11-24T21:12:09 1764018729

Every time I see an article like this, it's always missing --- but is it any good, is it correct? They always show you the part that is impressive - "it walked the tricky tightrope of figuring out what might be an interesting topic and how to execute it with the data it had - one of the hardest things to teach."

Then it goes on, "After a couple of vague commands (“build it out more, make it better”) I got a 14 page paper." I hear..."I got 14 pages of words". But is it a good paper, that another PhD would think is good? Is it even coherent?

When I see the code these systems generate within a complex system, I think okay, well that's kinda close, but this is wrong and this is a security problem, etc etc. But because I'm not a PhD in these subjects, am I supposed to think, "Well of course the 14 pages on a topic I'm not an expert in are good"?

It just doesn't add up... Things I understand, it looks good at first, but isn't shippable. Things I don't understand must be great?

Lerc · 2025-11-24T23:29:43 1764026983

I guess you have a couple of options.

You could trust the expert analysis of people in that field. You can hit personal ideologies or outliers, but asking several people seems to find a degree of consensus.

You could try varying tasks that perform complex things that result in easy to test things.

When I started trying chatbots for coding, one of my test prompts was

    Create a JavaScript function edgeDetect(image) that takes an ImageData object and returns a new ImageData object with all direction Sobel edge detection.

That was about the level where some models would succeed and some will fail.

Recently I found

    Can you create a webgl glow blur shader that takes a 2d canvas as a texture and renders it onscreen with webgl boosting the brightness so that #ffffff is extremely bright white and glowing,

Produced a nice demo with slider for parameters, a few refinements (hierarchical scaling version) and I got it to produce the same interface as a module that I had written myself and it worked as a drop in replacement.

These things are fairly easy to check because if it is performant and visually correct then it's about good enough to go.

It's also worth noting that as they attempt more and more ambitious tasks, they are quite probably testing around the limit of capability. There is both marketing and science in this area. When they say they can do X, it might not mean it can do it every time, but it has done it at least once.

taurath · 2025-11-24T23:56:57 1764028617

> You could trust the expert analysis of people in that field

That’s the problem - the experts all promise stuff that can’t be easily replicated. The promises the experts send doesn’t match the model. The same request might succeed and might fail, and might fail in such a way that subsequent prompts might recover or might not.

timschmidt · 2025-11-25T00:08:57 1764029337

That's how working with junior team members or open source project contributors goes too. Perhaps that's the big disconnect. Reviewing and integrating LLM contributions slotted right into my existing workflow on my open source projects. Not all of them work. They often need fixing, stylistic adjustments, or tweaking to fit a larger architectural goal. That is the norm for all contributions in my experience. So the LLM is just a very fast, very responsive contributor to me. I don't expect it to get things right the first time.

But it seems lots of folks do.

Nevertheless, style, tweaks, and adjustments are a lot less work than banging out a thousand lines of code by hand. And whether an LLM or a person on the other side of the world did it, I'd still have to review it. So I'm happy to take increasingly common and increasingly sophisticated wins.

Lerc · 2025-11-24T23:59:57 1764028797

The experts I am talking about trusting here are the ones doing the replication, not the ones making the claims.

stavros · 2025-11-24T21:25:43 1764019543

It's gotten more and more shippable, especially with the latest generation (Codex 5.1, Sonnet 4.5, now Opus 4.5). My metric is "wtfs per line", and it's been decreasing rapidly.

My current preference is Codex 5.1 (Sonnet 4.5 as a close second, though it got really dumb today for "some reason"). It's been good to the point where I shipped multiple projects with it without a problem (with eg https://pine.town being one I made without me writing any code).

tempestn · 2025-11-24T23:07:20 1764025640

Have you tried Gemini 3 yet? I haven't done any coding with it, but on other tasks I've been impressed compared to gpt 5 and Sonnet 4.5.

joegibbs · 2025-11-25T01:09:13 1764032953

It's very good but it feels kind of off-the-rails in comparison to Sonnet 4.5 - at least with Cursor it does strange things like putting its reasoning in comments that are about 15 lines long, deleting 90% of a file for no real reason (especially when context is reaching capacity) and making the same error that I just told it not to do.

stavros · 2025-11-24T23:11:27 1764025887

Only a tiny bit, but I should. When you say GPT-5, do you mean 5.1? Codex or regular?

adamors · 2025-11-24T21:19:59 1764019199

> Things I don't understand must be great?

Couple it with the tendency to please the user by all means and it ends up lieing to you but you won’t ever realise, unless you double check.

monooso · 2025-11-24T22:26:14 1764023174

The author goes into the strengths and weaknesses of the paper later in the article.

apendleton · 2025-11-24T21:19:11 1764019151

I think they get to that a couple of paragraphs later:

> The idea was good, as were many elements of the execution, but there were also problems: some of its statistical methods needed more work, some of its approaches were not optimal, some of its theorizing went too far given the evidence, and so on. Again, we have moved past hallucinations and errors to more subtle, and often human-like, concerns.

brightball · 2025-11-24T22:14:01 1764022441

I keep trying out different models. Gemini 3 is pretty good. It’s not quite as good at one shotting answers as Grok but overall it’s very solid.

Definitely planning to use it more at work. The integrations across Google Workspace are excellent.

Herring · 2025-11-24T21:22:33 1764019353

I think the point is we’re getting there. These models are growing up real fast. Remember 54% of US adults read at or below the equivalent of a sixth-grade level.

lm28469 · 2025-11-24T21:48:32 1764020912

> Remember 54% of US adults read at or below the equivalent of a sixth-grade level.

The sane conclusion would be to invest in education, not to dump hundreds of billions of llms, but ok

daedrdev · 2025-11-24T22:34:42 1764023682

Education is not just a funding issues. Policy choices, like making it impossible for students to fail which means they have no incentive to learn anything, can be more impactful.

taurath · 2025-11-24T23:58:10 1764028690

But holy shit is it also a funding issue when teachers make nothing.

gilfoy · 2025-11-25T00:37:12 1764031032

As far as I understand it, the problem isn’t that teachers are shit. Giving more money would bring in better teachers, but I don’t know that they’d be able to overcome the other obstacles

brightball · 2025-11-24T22:17:17 1764022637

Investing in education is a trap because no matter how much money is pumped into the current model, it’s not making a difference.

We need different models and then to invest in the successes, over and over again…forever.

thewebguyd · 2025-11-24T23:11:33 1764025893

Because education alone in a vacuum won't fix the issues.

Even if the current model was working, just continuing to invest money in it while ignoring other issues like early childhood nutrition, a good and healthy home environment, environmental impacts, etc. will just continue to fail people.

Schooling alone isn't going to help the kid with a crappy home life, with poor parents who can't afford proper nutrition, and without the proper tools to develop the mindset needed to learn (because these tools were never taught by the parents, and/or they are too focused on simply surviving).

We, as a society, need to stop allowing people to be in a situation where they can't focus on education because they are too focused on working and surviving.

brightball · 2025-11-25T00:45:20 1764031520

Exactly correct.

acheron · 2025-11-24T22:36:09 1764023769

Education funding is highest in places that have the worst results. Try again.

mythrwy · 2025-11-25T01:07:50 1764032870

New Mexico (where I live) is dead last in education out of all 50 states. They are currently advertising for elementary school teachers between 65-85K per year. Summers off. Nice pension. In this low cost of living state that is a very good salary, particularly the upper bands.

I don't think it's a money issue at this point.

Izikiel43 · 2025-11-24T23:12:02 1764025922

Because they use whole language theory (https://en.wikipedia.org/wiki/Whole_language) instead of phonics for teaching how to read.

Herring · 2025-11-24T21:56:25 1764021385

In theory yeah, but in practice 54% will also vote against funding education. Catch-22.

Izikiel43 · 2025-11-24T23:11:24 1764025884

In WA they always pass levies for education funding at local and state level however results are not there.

Mississipi is doing better on reading, the biggest difference being that they use phonics approach to teaching how to read, which is proven to work, whereas WA uses whole language theory (https://en.wikipedia.org/wiki/Whole_language), which is a terrible idea I don't know how it got traction.

So the gist of it, yes, spend on education, but ensure that you are using the right tools, otherwise it's a waste of money.

forgotoldacc · 2025-11-25T00:51:42 1764031902

First time hearing of whole language theory, and man, it sounds ridiculous. Sounds similar to the old theory that kids who aren't taught a language at all will simply speak perfect Hebrew.

tehjoker · 2025-11-24T22:04:38 1764021878

Not true, most people are not upper-middle class anti-tax wackos. They benefit from those people being taxed.

cj · 2025-11-24T22:22:03 1764022923

In my own social/family circle, there’s no correlation between net worth and how someone leans politically. I’ve never understood why given the pretty obvious pros/cons (amount paid in taxes vs. benefits received)

wmeredith · 2025-11-24T22:20:24 1764022824

The electorate in the U.S. commonly votes against its own interests.

mavhc · 2025-11-24T22:20:32 1764022832

That's why you phrase it as "woke liberals turning your children gay!"

In USA K-12 education costs about $300k

350 million people, want to get 175 million of them better educated, but we've already spent $52 trillion dollars on educating them so far

tehjoker · 2025-11-24T22:36:14 1764023774

The people most vociferously for conservative values are middle class, small business owners, or upper class, though the true upper class are libertine (notice who participated in the Epstein affair). The working class is filled with all kinds of very diverse people united by the fact they have to work for a living and often can't afford e.g. expensive weddings. Some of them are religious, a whole bunch aren't. It's easy to be disillusioned with formal institutions that seem to not care at all about you.

Unfortunately, a lot of these people have either concluded it is too difficult to vote, can't vote, or that their votes don't matter (I don't think they're wrong). Their unions were also destroyed. Some of them vote against their interests, but it's not clear that their interests are ever represented, so they vote for change instead.

Izikiel43 · 2025-11-24T23:08:56 1764025736

It's not just investing in education, it's using tools proven to work. WA spends a ton of money on education, and on reading Mississipi, the worst state for almost every metric, has beaten them. The difference? Mississipi went hard on supporting students and using phonics which are proven to work. WA still uses the hippie theory of guessing words from pictures (https://en.wikipedia.org/wiki/Whole_language) for learning how to read.

jlawson · 2025-11-24T23:03:21 1764025401

Unfortunately, people are born with a certain intellectual capacity and can't be improved beyond that with any amount of training or education. We're largely hitting peoples' capacities already.

We can't educate someone with 80 IQ to be you; we can't educate you (or I) into being Einstein. The same way we can't just train anyone to be an amazing basketball player.

Herring · 2025-11-25T00:11:37 1764029497

https://en.wikipedia.org/wiki/Comparative_advantage

Modern society benefits a lot from specialization. It's like the dumbest kid in France is still better at French than you.

tsss · 2025-11-24T23:13:02 1764025982

You don't need an educated workforce if you have machines that can do it reliably. The more important question is: who will buy your crap if your population is too poor due to lack of well paying jobs? A look towards England or Germany has the answer.

PostOnce · 2025-11-24T22:20:25 1764022825

A question for the not-too-distant future:

What use is an LLM in an illiterate society?

jcheng · 2025-11-24T22:46:10 1764024370

Automatic speech recognition and speech to text models are also growing up real fast.

PostOnce · 2025-11-24T22:51:57 1764024717

But will an illiterate person be able to articulate themselves well enough to get the LLM to do what they want, even with a speech interface?

Will they possess the skills (or even the vocabulary) to understand the output?

We won't know for another 20 years, perhaps.

throw310822 · 2025-11-24T23:00:58 1764025258

Absurd question. The correct one is "what use is an illiterate in an LLM society".

cgh · 2025-11-24T22:25:38 1764023138

This is a variation of the Gell-Mann amnesia effect: https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect

meindnoch · 2025-11-24T23:38:13 1764027493

One could say, the GeLLMann amnesia effect. ( ͡° ͜ʖ ͡°)

tsss · 2025-11-24T23:09:53 1764025793

For what it's worth I have been using Gemini 2.5/3 extensively for my masters thesis and it has been a tremendous help. It's done a lot of math for me that I couldn't have done on my own (without days of research), suggested many good approaches to problems that weren't on my mind and helped me explore ideas quickly. When I ask it to generate entire chapters they're never up to my standard but that's mostly an issue of style. It seems to me that LLMs are good when you don't know exactly what you want or you don't care too much about the details. Asking it to generate a presentation is an utter crap shoot, even if you merely ask for bullet points without formatting.

pojzon · 2025-11-24T21:53:52 1764021232

Truth is you still need human to review all of it, fix it where needed, guide it when it hallucinate and write correct instructions and prompts.

Without knowledge how to use this “PROBALISTIC” slot machine to have better results ypu are only wasting energy those GPUs need to run and answer questions.

Majority of ppl use LLMs incorrectly.

Majority of ppl selling LLMs as a panacea for everyting are lying.

But we need hype or the bubble will burst taking whole market with it, so shuushh me.

mynameisjody · 2025-06-02T19:30:21 1748892621

Child Mind Institute | Full-Time | Remote within US or NYC Hybrid | https://childmind.org/about-us/careers/ Join us in transforming the lives of children struggling with mental health or learning disorders.

These positions are focused on the development of MindLogger (soon to be renamed "Curious"), a data collection platform focused on mental health research. MindLogger is an established platform and we are looking to build out an internal engineering team to support and enhance it. It's a great opportunity to use your engineering skills for a great cause!

Email me with questions jody dot brookover at childmind dot org

mynameisjody · 2025-05-02T19:17:29 1746213449

Child Mind Institute | Full-Time | Remote within US or NYC Hybrid | https://childmind.org/about-us/careers/

Join us in transforming the lives of children struggling with mental health or learning disorders.

These positions are focused on the development of MindLogger, a data collection platform focused on mental health research.

Lead Full Stack Engineer - https://workforcenow.adp.com/mascsr/default/mdf/recruitment/... Sr. Full Stack Engineer - https://workforcenow.adp.com/mascsr/default/mdf/recruitment/...

mynameisjody · on April 1, 2024

Child Mind Institute | New York City or Remote | Full Time | Multiple Roles

The Science and Engineering team at the Child Mind Institute is dedicated to transforming the lives of children with mental health and learning disorders through the power of scientific discovery.

Our product development group is working on a number of products including interventions and data gathering tools.

Frontend Engineer, MindLogger application - https://workforcenow.adp.com/mascsr/default/mdf/recruitment/...

QA Lead, across all products - https://workforcenow.adp.com/mascsr/default/mdf/recruitment/...

Email me at jody.brookover <at> childmind.org if you have any questions about these positions, seriously. (no recruiters/agencies please)

mynameisjody · on July 16, 2023

Good lord, please take this down. These are all horrendous measurements that will lead to nothing but trouble.

mynameisjody · on Dec 1, 2022

CareRev (YC S16) | Software Engineers | Fully Remote within the US | Full-Time CareRev’s mission is to seamlessly connect healthcare facilities and professionals. Through our marketplace platform, we offer efficiency, flexibility, and opportunities for growth. Our stack is currently Ruby/Rails, React and Elm, Swift, Kotlin, and Postgres deployed on Heroku. CareRev was recently named a Ycombinator "Top Company".

Find our careers page with all our postings here - https://grnh.se/072b12f63us

Backend Engineer - Senior, 5+ years experience

Engineering Manager - 2+ years managing, 3+ years as part of the building software process

Feel free to email me if you have questions: jody[at]carerev[dot]com

mynameisjody · on Nov 1, 2022

CareRev (YC S16) | Software Engineers | Fully Remote within the US | Full-Time We are hiring backend, frontend, and android engineers. CareRev’s mission is to seamlessly connect healthcare facilities and professionals. Through our marketplace platform, we offer efficiency, flexibility, and opportunities for growth. Our stack is currently Ruby/Rails, React and Elm, Swift, Kotlin, and Postgres deployed on Heroku. CareRev was recently named a Ycombinator "Top Company".

Find our careers page with all our postings here - https://grnh.se/072b12f63us

Android Engineer (Kotlin) - Senior, 5+ years experience

Engineering Manager - 2+ years managing, 3+ years as part of the building software process

Data Analyst - Senior, 3+ years experience

Feel free to email me if you have questions: jody[at]carerev[dot]com

mynameisjody · on Oct 3, 2022

CareRev (YC S16) | Software Engineers | Fully Remote within the US | Full-Time We are hiring backend, frontend, and android engineers. CareRev’s mission is to seamlessly connect healthcare facilities and professionals. Through our marketplace platform, we offer efficiency, flexibility, and opportunities for growth. Our stack is currently Ruby/Rails, React and Elm, Swift, Kotlin, and Postgres deployed on Heroku.

CareRev was recently named a Ycombinator "Top Company".

Find our careers page with all our postings here - https://grnh.se/072b12f63us

Backend API Engineer (Ruby on Rails) - Mid, Senior (2+ years exp) Not all levels are posted rn, but just apply to the closest. Android Engineer (Kotlin) - Senior, Staff (5+ years) Frontend Engineer (ELM) - Senior+ (5+ years exp) We are committed to using ELM for a significant portion of our web frontend. Come join the fun. Backend Engineer (Marketing) - Senior (3+ years exp)

Feel free to email me if you have questions: jody[at]carerev[dot]com

mynameisjody · on Sept 2, 2022

CareRev (YC S16) | Software Engineers | Fully Remote within the US | Full-Time

We are hiring backend and android engineers as well as product managers. CareRev’s mission is to seamlessly connect healthcare facilities and professionals. Through our marketplace platform, we offer efficiency, flexibility, and opportunities for growth. Our stack is currently Ruby/Rails, React and Elm, Swift, Kotlin, and Postgres deployed on Heroku.

CareRev was recently named a Ycombinator "Top Company".

Find our careers page with all our postings here - https://grnh.se/072b12f63us

Backend API Engineer (Ruby on Rails) - Mid, Senior, Staff, Sr Staff+ (2+ years exp) Not all levels are posted rn, but just apply to the closest. Android Engineer (Kotlin) - Senior, Staff Frontend Engineer (ELM) - Senior+ (5+ years exp) We are committed to using ELM for a significant portion of our web frontend. Come join the fun. Data Engineer (Kafka) - Senior (5+ years exp)

Feel free to email me if you have questions: jody[at]carerev[dot]com

mynameisjody · on July 1, 2022

CareRev (YC S16) | Software Engineers | Fully Remote within the US | Full-Time We are hiring backend and android engineers as well as product managers. CareRev's mission is to seamlessly connect healthcare facilities and professionals. Through our marketplace platform, we offer efficiency, flexibility, and opportunities for growth. Our stack is currently Ruby/Rails, React and Elm, Swift, Kotlin, and Postgres deployed on Heroku. Find our careers page with all our postings here - https://grnh.se/072b12f63us

Backend API Engineer (Ruby on Rails) - Mid, Senior, Staff, Sr Staff+ (2+ years exp) Not all levels are posted rn, but just apply to the closest.

Frontend Engineer (ELM) - Senior+ (5+ years exp) We are committed to using ELM for a significant portion of our web frontend. Come join the fun.

Product Managers - Principal (doesn't manage others), Director (manages others), and Technical PM (7+ years PM exp, 2yrs mgmt exp)

Feel free to email me if you have questions: jody[at]carerev[dot]com