This is missing the point. If I want to instruct Claude to never write a database query that doesn't hit a preexisting index, where exactly am I supposed to document that? You can either choose:
1. A centralized location, like a README (congrats, you've just invented CLAUDE.md)
2. You add a docs folder (congrats, you've just done exactly what the author suggests under Progressive Disclosure)
Moreover, you can't just do it all in a README, for the exact reasons that the author lays out under "CLAUDE.md file length & applicability".
CLAUDE.md simply isn't about telling Claude what all the parts of your code are and how they work. You're right, that's what documenting your code is for. But even if you have READMEs everywhere, Claude has no idea where to put code when it starts a new task. If it has to read all your documentation every time it starts a new task, you're needlessly burning tokens. The whole point is to give Claude important information up front so it doesn't have to read all your docs and fill up its context window searching for the right information on every task.
Think of it this way: incredibly well documented code has everything a new engineer needs to get started on a task, yes. But this engineer has amnesia and forgets everything it's learned after every task. Do you want them to have to reonboard from scratch every time? No! You structure your docs in a way so they don't have to start from scratch every time. This is an accommodation: humans don't need this, for the most part, because we don't reonboard to the same codebase over and over. And so yes, you do need to go above and beyond the "same old good best practices".
think about how this thing is interacting with your codebase. it can read one file at a time. sections of files.
in this UX, is it ergonomic to go hunting for patterns and conventions? if u have to linearly process every single thing u look at every time you do something, how are you supposed to have “peripheral vision”? if you have amnesia, how do you continue to do good work in a codebase given you’re a skilled engineer?
it is different from you. that is OK. it doesn’t mean its stupid. it means it needs different accomodations to perform as well as you do. accomodations IRL exist for a reason, different people work differently and have different strengths and weaknesses. just like humans, you get the most out of them if you meet and work with them from where they’re at.
You put a warning where it is most likely to be seen by a human coder.
Besides, no amount of prompting will prevent this situation.
If it is a concern then you put a linter or unit tests to prevent it altogether, or make a wrapper around the tricky function with some warning in its doc strings.
I don't see how this is any different from how you typically approach making your code more resilient to accidental mistakes.
But they are right, claude routinely ignores stuff from CLAUDE.md, even with warning bells etc. You need a linter preventing things. Like drizzle sql` templates: it just loves them.
You can make affordances for agent abilities without deviating from what humans find to be good documentation. Use hyperlinks, organize information, document in layers, use examples, be concise. It's not either/or unless you're being lazy.
> no amount of prompting will prevent this situation.
Again, missing the point. If you don't prompt for it and you document it in a place where the tool won't look first, the tool simply won't do it. "No amount of promoting" couldn't be more wrong, it works for me and all my coworkers.
> If it is a concern then you put a linter or unit tests to prevent it altogether
Sure, and then it'll always do things it's own way, run the tests, and have to correct itself. Needlessly burning tokens. But if you want to pay for it to waste its time and yours, go for it.
> I don't see how this is any different from how you typically approach making your code more resilient to accidental mistakes.
It's not about avoiding mistakes! It's about having it follow the norms of your codebase.
- My codebase at work is slowly transitioning from Mocha to Jest. I can't write a linter to ban new mocha tests, and it would be a pain to keep a list of legacy mocha test suites. The solution is to simply have a bullet point in the CLAUDE.md file that says "don't write new Mocha test suites, only write new test suites in Jest". A more robust solution isn't necessary and doesn't avoid mistakes, it avoids the extra step of telling the LLM to rewrite the tests.
- We have a bunch of terraform modules for convenience when defining new S3 buckets. No amount of documenting the modules will have Claude magically know they exist. You tell it that there are convenience modules and to consider using them.
- Our ORM has findOne that returns one record or null. We have a convenience function getOne that returns a record or throws a NotFoundError to return a 404 error. There's no way to exhaustively detect with a linter that you used findOne and checked the result for null and threw a NotFoundError. And the hassle of maybe catching some instances isn't necessary, because avoiding it is just one line in CLAUDE.md.
> Yes there is? Though this is usually better served with a type checker, it’s still totally feasible with a linter too if that’s your bag
It's not, because you would have to implement a full static analyzer that traces where the result of a `findOne` call is checked for `null` and then check that the condition always leads to a `NotFoundError`. At best you've got a linter that only works some of the time, at worst you've just made your linter terribly slow and buggy.
> these tools still ignore that line sometimes so I still have to check for it myself.
1. Create a tool that can check if a query hits a prexisting index
In step 2 either force Claude to use it (hooks) or suggest it (CLAUDE.md)
3. Profit!
As for "where stuff is", for anything more complex I have a tree-style graph in CLAUDE.md that shows the rough categories of where stuff is. Like the handler for letterboxd is in cmd/handlerletterboxd/ and internal modules are in internal/
Now it doesn't need to go in blind but can narrow down searches when I tell it to "add director and writer to the letterboxd handler output".
You can also use your README (and in my own private project, I do!). But for folks who don't want their README clogged up with lots of facts about the project, you have CLAUDE.md
Learned this the hard way. Asked Claude Code to run a database migration. It deleted my production database instead, then immediately apologised and started panicking trying to restore it.
Thankfully Azure keeps deleted SQL databases recoverable, so I got it back in under an hour. But yeah - no amount of CLAUDE.md instructions would have prevented that. It no longer gets prod credentials.
This is a neat idea, but it's extremely light (no pun intended) on real details. Translating a simulation into real hardware that can do real computation in a reliable manner is properly hard. As much as I'd love to be an optimist about this project, I have to say I'll believe it when I see it actually running on a workbench.
If it does work, I think one of the biggest challenges will be adding enough complexity to it for it to do real, useful computation. Running the equivalent of GPT-2 is a cool tech demo, but if there's not an obvious path to scaling it up, it's a bit of a dead end.
Oh absolutely...this is kitchen-table level at this point. There is a clear path to really huge number of parameters, but a bunch of things need to be proven first. Like...can the detector meaningfully read what comes out the end of the optical chain?
I own three electric motorcycles and respectfully disagree. You can't make tube and canvas that let a passenger survive getting t-boned by a Yukon Denali or an F-250. One high-profile accident with a mother and her child getting peeled off the road with a coal shovel are all it'll take to kill such a form factor forever.
The problem isn't the form factor you're describing, it's that you can't put those on the road with 1000+ horsepower machines that are 50 times heavier. And on top of that, a lot of people just don't want to give up their heated massage seats and connected infotainment and removable third row or whatever crap they pack in minivans these days.
The elements of the form factor implied here are already on the road. Series hybrid bikes exist today. Fully faired bikes exist today. A fully feared tricycle recumbent could get you to work, clean and dry, on a dimes worth of energy. Cities like Barcelona and Taipei that already move on gas scooters, would smell immensely better if e-bikes took over.
American pick up trucks with their butch looking front ends that kill a lot of children are a stupid idea under any circumstances. But evidently we have to live with that death and destruction until they rust out. Kids are already dying because of the stupidity and we have not got what it takes to stop it. It means other places will benefit from better mobility sooner.
A fully feared tricycle recumbent will get you killed in a city with poor bike-ability. I have friends that have been in the hospital for weeks because of bike accidents in SF and NYC, which are arguably the exact kinds of places where you'd want bikes to replace cars. But instead, we have "Vision Zero" projects that still have staggeringly far to go.
I don't disagree with you: it would be great if we could replace more cars with bikes, but the reality is that there's almost nothing serious we can do in the US to undo the omnipresence of massive vehicles in most cities.
I agree with your comment, but I'll be a little pedantic for a minute:
As a Charger Daytona owner, I'd love to call the Mach-E a mustang, but it's really just borrowing the brand. Ford has said unequivocally that they'll never make an all-electric muscle car, which is a real shame. The Mach-E is a great car if you're turned off by a Model Y, but you wouldn't choose it over a mustang GT or a charger Daytona or a Camaro.
> Ford has said unequivocally that they'll never make an all-electric muscle car
What’s the thinking here? Pandering to some market segment? It sounds like they are organising the deck chairs in the titanic.
Edit: I tried looking into the comment. It seems he was referring to Mustangs specifically, which is weird as they do make an electric one (assuming you agree it’s a ‘real’ mustang).
The Mach-E isn't a muscle car. The comment was specifically around the Mustang sedan, which they do not have an electric version of.
Honestly, it's befuddling to me. There's a lot of folks who could get talked into an electric muscle car, they just have to know how to sell it. I own a Charger Daytona and literally every car guy I show it to has interest; I genuinely think Dodge just doesn't know how to market and sell it. I'm 100% confident that the right marketing agency could sell 100k of these, but the cohort of "it'll never be a Mustang" is far louder than the "wow that thing rips" crowd.
If I take a Ford Focus and call it a Mustang, is it? Arguably, no. Mustangs have a distinctive style, feel, feature set, intended audience. It's a matter of what people expect when they buy the thing.
The Mach-E kind of snuck in. I believe they intended to make more electric Mustang-branded cars, but things changed internally and priorities shifted. Lots of women really like Mustangs, and the Mach-E is positioned to appeal to many of the same people: it makes sense to use it as a kind of Trojan horse to ease folks into EVs with a brand they already like. But if you took a Mach-E and hid the name and asked folks "is this a Mustang?" The answer you'd get is "No".
I don't think what the article writes about matters all that much. Gemini 3 Pro is arguably not even the best model anymore, and it's _weeks_ old, and Google has far more resources than Anthropic does. If the hardware actually was the secret sauce, Google would be wiping the floor with little everyone else.
But they're not.
There's a few confounding problems:
1. Actually using that hardware effectively isn't easy. It's not as simple as jacking up some constant values and reaping the benefits. Actually using the hardware is hard, and by the time you've optimized for it, you're already working on the next model.
2. This is a problem that, if you're not Google, you can just spend your way out of. A model doesn't take a petabyte of memory to train or run. Regular old H100s still mostly work fine. Faster models are nice, but Gemini 3 Pro being 50% of the latency as Opus 4.5 or GPT 5.1 doesn't add enough value to matter to really anyone.
3. There's still a lot of clever tricks that work as low hanging fruit to improve almost everything about ML models. You can make stuff remarkably good with novel research without building your own chips.
4. A surprising amount of ML model development is boots on the ground work. Doing evals. Curating datasets. Tweaking system prompts. Having your own Dyson sphere doesn't obviate a lot of the typing and staring at a screen that necessarily has to be done to make a model half decent.
5. Fancy bespoke hardware means fancy bespoke failure modes. You can search stack overflow for CUDA problems, you can't just Bing your way to victory when your fancy TPU cluster isn't doing the thing you want it to do.
I think you are addressing the issue from a developer's perspective. I don't think TPUs are going to be sold to individual users anytime soon. What the article is pointing out is that Google is now able to squeeze significantly more performance per dollar than their peer competitors in the LLM space.
For example, OpenAI has announced trillion-dollar investments in data centers to continue scaling. They need to go through a middle-man (Nvidia), while Google does not, and will be able to use their investment much more efficiently to train and serve their own future models.
> Google is now able to squeeze significantly more performance per dollar than their peer competitors in the LLM space
Performance per dollar doesn't "win" anything though. Performance (as in speed) hardly cracks the top five concerns that most folks have when choosing a model provider, because fast, good models already exist at price points that are acceptable. That might mean slightly better margins for Google, but ultimately isn't going to make them "win"
It's not slightly better margins, we are talking about huge cost reductions on the main expense which is compute. In a context where companies are making trillion dollar investments, it matters a lot.
Also, performance and user choice are definitely impacted by compute. If they ever find a way to replace a job with LLMs, those who can throw more compute at it for a lower price point will win.
Google owns 14% of Anthropic and Anthropic is using Google TPUs, as well as AWS Trainium and of course GPUs. It isn't necessary for one company to create both the winning hardware and the winning software to be part of the solution. In fact with the close race in software hardware seems like the better bet.
But price per token isn't even a directly important concern anymore. Anyone with a brain would pay 5x more per token for a model that uses 10x fewer tokens with the same accuracy. I've gone all in on Opus 4.5 because even though it's more expensive, it solves the problems I care about with far fewer tokens.
Slightly more seriously: what you say makes sense if and only if you're projecting Sam Altman and assuming that a) real legit superhuman AGI is just around the corner, and b) all the spoils will accrue to the first company that finds it, which means you need to be 100% in on building the next model that will finally unlock AGI.
But if this is not the case -- and it's increasingly looking like it's not -- it's going to continue to be a race of competing AIs, and that race will be won by the company that can deliver AI at scale the most cheaply. And the article is arguing that company will be Google.
I think you are missing the point. They are saying "weeks old" isn't very old.
> it's going to continue to be a race of competing AIs, and that race will be won by the company that can deliver AI at scale the most cheaply.
I don't see how that follows at all. Quality and distribution both matter a lot here.
Google has some advantages but some disadvantages here too.
If you are on AWS GovCloud, Anthropic is right there. Same on Azure, and on Oracle.
I believe Gemini will be available on the Oracle Cloud at some point (it has been announced) but they are still behind in the enterprise distribution race.
OpenAI is only available on Azure, although I believe their new contract lets them strike deals elsewhere.
On the consumer side, OpenAI and Google are well ahead of course.
Last week it looked like Google had won (hence the blog post) but now almost nobody is talking about antigravity and Gemini 3 anymore so yeah what op says is relevant
It definitely depends on how you're measuring. But the benchmarks don't put it at the top for many ways of measuring, and my own experience doesn't put it at the top. I'm glad if it works for you, but it's not even a month old and there are lots of folks like me who see it as definitely worse for classes of problems that 3 Pro could be the best at.
Which is to say, if Google was set up to win, it shouldn't even be a question that 3 Pro is the best. It should be obvious. But it's definitely not obvious that it's the best, and many benchmarks don't support it as being the best.
On point 5, I think this is the real moat for CUDA. Does Google have tools to optimize kernels on their TPUs? Do they have tools to optimize successive kernel launches on their TPUs? How easy is it to debug on a TPU(arguably CUDA could use work here but still...)? Does Google help me fully utilize their TPUs? Can I warm up a model on a TPU, checkpoint it, and launch the checkpoints to save time?
I am fairly pro-google(they invented the LLM, FFS...) and recognize the advantages(price/token, efficiency, vertical integration, established DCs w/ power allocations) but also know they have a habit of slightly sucking at everything but search.
I've really only found benefit on the return type of functions, when you can say that a type parameter satisfies a type (with the return type being a boolean). This let's you use `if (isPerson(foo))` and typescript will narrow the type appropriately in the conditional
I went to the grocery store and bought six meals worth of whole foods for two people two weeks ago. Rice, veggies, two fish meals, two meals based on eggs (I already had the eggs), one meal based on chicken. Thirty grams of protein for each meal. I had staples in my pantry already. I aimed for 2200 calories per person per day. I didn't buy organic because it's more expensive. This wasn't Whole Foods or some bougie store. I didn't buy ANY beverages.
It was $170 with my loyalty card.
Six meals at McDonald's is... Just about $35. Chipotle? $110, maybe less. Chick-fil-A? Under $50. And none of them need to be cooked or taste like wet cardboard.
$35 for 6 meals for two people at McDonalds? What?
Where I live, one meal at McDonalds is about $12. So 6 * 2 * 12 = $144. Not that much of a difference.
Also, if you aimed for 2200 calories per person per day with that $170, then it isn't really fair comparing to a single McDonald's meal, is it? It sounds like buying whole foods is cheaper.
Does McDonalds not have the $5 McValue Meals where you live? In the Bay Area, $5 + tax gets you a McChicken, 4 chicken nuggets, small fries, and small drink. $6 to upgrade to a McDouble cheeseburger instead of the McChicken. Altogether ~1000 calories per meal.
That's darn good value for your money, at least for a prepared hot meal that's convenient in most locales. $5 for ~1000 calories, plus the ingredients are fortified; the lack of fiber notwithstanding, it's not a horrible thing to eat several times a week. I live in SF where McDonalds is not very convenient, and where food prices, including prepared takeout, aren't too bad if you know where to go--my wife sometimes brings empty casserole dishes to one of our friendly neighborhood Chinese restaurants to fill up, without paying extra, though for us it's fortunately more about convenience when raising two kids with a bunch of extracurriculars than it is about penny pinching.
FWIW, I love cooking and cook as much as I can, usually at least 3 times a week, which with leftovers means 4 or 5 dinners. But between cooking, cleaning, and shopping, it can be be quite time consuming, and excepting myself, the rest of the family isn't keen on eating beans 3 nights a week. (I'm only allowed to make Red Beans & Rice a few times a year. Ditto for similar big pot meals :(
Have you used the app? You may need to use the app to get the McValue and similar lower-priced menu items. It's a brilliant price discrimination strategy.
I don't even have a smartphone. Why should I put up with these insane prices even with a discount by tracking, when I can buy the same thing for <1€ at the grocery store that is literally in the same building 10meters away.
Using the app a small soda at McDonald's is ~$1 ($1.14 near me, but that may include the SF soda tax). Less than $2 for a large for those beckoning diabetes.
But soda is definitely cheaper elsewhere, and drinks, even soda, are usually a profit center for almost any restaurant, but a loss leader at grocery stores. I remember in the mid 1990s when Coca-Cola and then Pepsi were trying to stem the tide of a decline in sales. They drastically lowered prices through certain channels, particularly grocery stores and, most memorably, vending machines outside grocery stores, where the price dropped from $0.50-$0.75 to $0.25 for a 12oz can. Almost overnight poor and working class people switched from cheaper alternatives like Kool-Aid (which was healthier--much less sugar!) to Coca-Cola and Pepsi.
I agree the app is exceptionally inconvenient, unless someone else in the car is ordering, as well as privacy intrusive. But my point is merely that McDonald's is trying to cater to price-sensitive consumers without taking a hit to their revenue, and doing so more effectively than any other fast food chain.
Sorry, I didn't multiply by two; it would have been $70. I checked the nutrition facts on the menu, it was a mix of meal deals and had nearly the same calorie counts that I was aiming for. Still less than half the cost of groceries (minus all the food I already had).
But if we're talking about the balance of macronutrients, I'd love to hear how you manage to beat the cost of fast food with legumes/nuts/yogurt and don't have a huge percentage of your calories from fat. 40g of protein from almonds is nearly a thousand calories and 90g of fat. 40g of protein from black beans is five Chipotle bowl orders worth of beans or four cups of fava beans. 35g of protein from nonfat Greek yogurt is nearly 3/4 of a pound of yogurt. If you can't stand to eat nearly a pound of nonfat yogurt in one sitting like most of the population, you'll only get 30g of protein from a 32oz tub and spend as much as a McDonald's sandwich (and get the same amount of fat).
1. A centralized location, like a README (congrats, you've just invented CLAUDE.md)
2. You add a docs folder (congrats, you've just done exactly what the author suggests under Progressive Disclosure)
Moreover, you can't just do it all in a README, for the exact reasons that the author lays out under "CLAUDE.md file length & applicability".
CLAUDE.md simply isn't about telling Claude what all the parts of your code are and how they work. You're right, that's what documenting your code is for. But even if you have READMEs everywhere, Claude has no idea where to put code when it starts a new task. If it has to read all your documentation every time it starts a new task, you're needlessly burning tokens. The whole point is to give Claude important information up front so it doesn't have to read all your docs and fill up its context window searching for the right information on every task.
Think of it this way: incredibly well documented code has everything a new engineer needs to get started on a task, yes. But this engineer has amnesia and forgets everything it's learned after every task. Do you want them to have to reonboard from scratch every time? No! You structure your docs in a way so they don't have to start from scratch every time. This is an accommodation: humans don't need this, for the most part, because we don't reonboard to the same codebase over and over. And so yes, you do need to go above and beyond the "same old good best practices".
reply