You're assuming we're saying LLMs can't reason. That's not what we're saying. Th...

kgeist · 2025-06-14T23:55:14 1749945314

GPT4o isn't considered an "advanced" LLM at this point. It doesn't use reasoning.

I gave your prompt to o3 pro, and this is what I got without any hints:

  Historic shipwrecks (1850 → 1970)
  • ~20 000 deep water wrecks recorded since the age of steam and steel  
  • 10 % were passenger or mail ships likely to carry a cabin class or saloon piano   
  • 1 piano per such vessel 20 000 × 10 % × 1 ≈ 2 000

  Modern container losses (1970 → today)
  • ~1 500 shipping containers lost at sea each year  
  • 1 in 2 000 containers carries a piano or electric piano   
  • Each piano container holds ≈ 5 units   
  • 50 year window 1 500 × 50 / 2 000 × 5 ≈ 190

  Coastal disasters (hurricanes, tsunamis, floods)
  • Major coastal disasters each decade destroy ~50 000 houses  
  • 1 house in 50 owns a piano   
  • 25 % of those pianos are swept far enough offshore to sink and remain (50 000 / 50) × 25 % × 5 decades ≈ 1 250

  Add a little margin for isolated one offs (yachts, barges, deliberate dumping): ≈ 300

  Best guess range: 3 000 – 5 000 pianos are probably resting on the seafloor worldwide.

yen223 · 2025-06-15T00:36:37 1749947797

The difference between o3 and o4-mini is so substantial I think this is the reason why people can't agree on how capable LLMs are nowadays.

theendisney · 2025-06-15T04:25:25 1749961525

The correct answer is: I'm sorry, I don't have time for this.

FINDarkside · 2025-06-14T23:36:54 1749944214

What does "choked on it" mean for you? Gemini 2.5 pro gives this, even estimating what amouns of those 3m ships that sank after pianos became common item. Not pasting the full reasoning here since it's rather long.

Combining our estimates:

From Shipwrecks: 12,500 From Dumping: 1,000 From Catastrophes: 500 Total Estimated Pianos at the Bottom of the Sea ≈ 14,000

Also I have to point out that 4o isn't a reasoning model and neither is Sonnet 4, unless thinking mode was enabled.

Jabrov · 2025-06-14T22:40:56 1749940856

That seems like a totally reasonable response ... ?

labrador · 2025-06-14T23:03:57 1749942237

I think you missed the part where I had to give them hinits to solve it. All 3 initially couldn't or refused saying it was not a real problem on their first try.

ej88 · 2025-06-14T23:16:58 1749943018

Can you share the chats? I tried with o3 and it gave a pretty reasonable answer on the first try.

https://chatgpt.com/share/684e02de-03f0-800a-bfd6-cbf9341f71...

Jabrov · 2025-06-15T00:13:40 1749946420

You must be on the wrong side of an A/B test or very unlucky.

Because I gave your exact prompt to o3, Gemini, and Claude and they all produced reasonable answers like above on the first shot, with no hints, multiple times.

gjm11 · 2025-06-14T23:39:36 1749944376

FWIW I just gave a similar question to Claude Sonnet 4 (I asked about something other than pianos, just in case they're doing some sort of constant fine-tuning on user interactions[1] and to make it less likely that the exact same question is somewhere in its training data[2]) and it gave a very reasonable-looking answer. I haven't tried to double-check any of its specific numbers, some of which don't match my immediate prejudices, but it did the right sort of thing and considered more ways for things to end up on the ocean floor than I instantly thought of. No hints needed or given.

[1] I would bet pretty heavily that they aren't, at least not on the sort of timescale that would be relevant here, but better safe than sorry.

[2] I picked something a bit more obscure than pianos.

dialup_sounds · 2025-06-14T23:51:30 1749945090

How much of that is inability to reason vs. being trained to avoid making things up?