I don't think that's true. What matters to me is the human editorial touch: I don't want to wade through 50 prompts and responses, I want a human author to have resolved that process into a final output that they think is worth sharing with me.
Try reading a manuscript copy of a book before it’s been edited. Yes I know some people do this out of interest but for most people it’s not the type of writing they are interested in reading or would get the most out of.
If you're interested in seeing the process behind this piece of writing you can read through a lot of the details in the 71 commits that went into creating the story in the PR: https://github.com/steipete/steipete.me/pull/106/commits
>You're right - I don't really care if the track playing in my favourite cafe is AI-generated or not. You're not supposed to be emotionally invested into background music
I guess different strokes but some of the best music I've ever been turned on to just happened to be playing in some random cafe or coffee shop. Conversely if the music is bland and uninspired I'm much less likely to go back.
Unfortunately that wouldn't help as much as you think since talented AI labs can just watch the public leaderboard and note what models move up and down to deduce and target whatever the hidden benchmark is testing.
I had been sleeping on Claude's ability to write books until a couple of days ago I had it write a novel set in the Accelerando universe. It whipped up a very convincing complete multi-Act 13 chapter side plot about humans learning to interact with Economics 2.0. It was quite good though I'm sure cstross would be horrified.
I have a T420 I've been using for years. Upgraded to 16GB of RAM, SSD, swapped the dual core i5 for a 4 core/8 thread i7 (yes, the CPU is in a socket!), and swapped the 1600x900 crappy display for a newer 1080p panel that looks much better. I absolutely love this laptop and am not looking forward to the day when it's too old for the modern web.
For the lmarena leaderboard to be really useful you need click the "Style Control" button so that it normalizes for LLMs that generate longer answers, etc. that, while humans may find them more stylistically pleasing, and upvote them, the answers often end up being worse. When you do that, o1 comes out on top followed by o1-preview, then Sonnet 3.5, and in fourth place Gemini Preview 1206.
I like how it cites relevant Youtube videos based on the search and shows thumbnails of the videos in its results. As far as I can tell ChatGPT doesn't do this.