Curious if anyone knows the logistics of these cloud provider/AI company deals. In this case, it seems like the terms of the deal mean that Anthropic ends up spending most of the investment on AWS to pay for training.
Does anthropic basically get at cost pricing on AWS? If Amazon has any margin on their pricing, it seems like this $4B investment ends up costing them a lot less, and this is a nice way to turn a cap ex investment into AWS revenue.
This was the brilliance of the original MSFT investment into OpenAI. It was an investment in Azure scaling its AI training infra, but roundabout through a massive customer (exactly what you’d want as a design partner) and getting equity.
I’m sure Anthropic negotiated a great deal on their largest cost center, while Amazon gets a huge customer to build out their system with.
That’s honestly one of the hardest things in engineering — identifying not just a customer to drive requirements, but a knowledgeable customer who can drive good requirements that work for a broader user base and can drive further expansion. Anthropic seems ideal for that, plus they act as a service/API provider on AWS.
or simply one of the best corporate buyouts that's not subject to regulatory scrutiny. microsoft owns 49% of OpenAI - will get profits till whenever. All without subject to regulatory approval. and they get to improve Azure
> Building out data centers feels like a far more difficult problem.
Is it really? I'm thinking it might be more time-and-money-involved than building a "LLM product" (guess you really meant models?), but in terms of experience, we (humanity) have decades of experience building data centers, while a few years (at most) experience regarding anything LLM.
This explanation makes no sense, I could be AWS' biggest customer if they wanted to pay me for it. Something a little closer could be that the big tech companies wanted to acquire outside LLMs, not quite realizing that spending $1B on training only puts you $1B ahead.
Anthropic is getting $4B in investment in a year where their revenue was about $850M. Even if Amazon had bought them outright for that much, they would not be ahead. The fact that everybody keeps repeating the claim that Amazon is "making money" makes this appear like some kind of scam.
Second, the investment isn't a loan that they need to repay. They are getting equity.
Third, Anthropic is exclusively using AWS to train its models. Which, yes, means if AWS gives them $4B and it costs them $500M/year to pay for AWS services then after 8 years, the cash is a wash. However this ignores the second point.
Fourth, there is brand association for someone who wanted to run their own single tenant instance of Claude whereby you would say "well they train Claude on AWS, so that must be the best place to run it for our <insert Enterprise org>" similar to OpenAI on Azure.
Fifth, raising money is a signaling exercise to larger markets who want to know "will this company exist in 5 years?"
Sixth, AWS doesn't have its own LLM (relative to Meta, MS, etc.). The market will associate Claude with Amazon now.
A quibble: AWS _does_ have an AI story (which i was originally dismissive of): Bedrock as a common interface and platform to access your model of choice, plus niceties for fine tuning/embeddings/customization etc. Unlike say Azure theyre not betting on _a_ implementation. Theyre betting that competition/results between models will trend towards parity with limited fundamental differentiation. Its a bet on enterprises wanting the _functionality_ more generally and being able to ramp up that usage via AWS spend.
WRT titan my view is that its 1) production r&d to stay “in the game” 2) a path towards commoditization and lower structural costs, which companies will need if these capabilities are going to stick/have roi in low cost transactions.
The big tech companies are spending enormous amounts for part ownership in startups whose only assets are knowledge that exists in the public domain, systems that the companies could have engineered themselves, and model weights trained with the buyer's own capital. The people who will get hurt are public investors who are having their investment used to make a few startup people really rich.
Knowledge is quite the useful asset, and not easily obtained. People obtain knowledge by studying for years and years, and even then, one might obtain information rather than knowledge, or have some incorrect knowledge. The AI companies have engineered a system that (by your argument) distills knowledge from artifacts (books, blogs, etc.) that contain statements, filler, opinions, facts, misleading arguments, incorrect arguments, as well as knowledge and perhaps even wisdom. Apparently this takes hundreds of millions of dollars (at least) to do for one model. But, assuming they actually have distilled out knowledge, that would be valuable.
Although, since the barrier to entry is pretty low, they should not expect sustained high profits. (The barrier is costly, but so is the barrier to entry to new airlines--a few planes cost as much as an AI model--yet new airlines start up regularly and nobody really makes much profit. Hence, I conclude that requiring a large amount of money is not necessarily a barrier to entry.)
(Also, I argue that they have not actually distilled out knowledge, they have merely created a system that is about as good at word association as the average human. This is not knowledge, although it may have its own uses.)
If they could build it themselves, why haven't they? Say what you want about Amazon, but I find it hard to believe that Anthropic bamboozled them into believing they can't build their own AI when they could do it cheaper.
Last I checked, AWS reserve pricing for one year of an 8x H100 pod costs more than just buying the pod yourself (with tens of thousands left over per server for the NVIDIA enterprise license and to hire people to manage them). On demand pricing is even worse.
This is essentially money that they would have spent to build out their cloud anyway, except now they also get equity in Anthropic. Whether or not Anthropic survives, AWS gets to keep all of those expensive GPUs and sell them to other customers so their medium/long term opportunity cost is small. Even if the deal includes cheaper rates the hardware still amortizes over 2-3 years, and cloud providers are running plenty of 5+ year old GPUs so there's lots of money to be made in the long tail (as long as ML demand keeps up).
They're not making money yet because there's the $4 billion opportunity cost, but even if their equity in Anthropic drops to zero, they're probably still going to make a profit on the deal. If the equity is worth something, they'll make significantly more money than they could have renting servers. Throw financial engineering on top of that, and they may come out far ahead regardless of what happens to Anthropic: Schedule K capital equipment amortizations are treated differently from investments and AFAICT they can double dip since Anthropic is going to spend most of it on cloud (IANAL). That's likely why this seems to be cash investment instead of in-kind credits.
I think that’s what people mean when they say Amazon is making money off the deal. It’s not an all or nothing VC investment that requires a 2-3x exit to be profitable because the money just goes back to AWS’s balance sheet.
Yes and it’s also interesting that they mention using Trainium to do the training. I don’t know how much spend that is, but it seems really interesting. Like, if you’re AWS, and you imagine competing in the long run with NVIDIA for AI chips, you need to fund all that silicon development.
They mentioned that in the last investment too. That seems like marketing to me as no one is doing bleeding edge research outside of the NVIDIA CUDA run ecosystem.
This is a way to keep the money printer called AWS Bedrock going and going and going. Don't underestimate the behemoth enterprises in the AWS rolodex who are all but assured to use that service for the next 5+ years at high volume.
These sort of investments usually also contain licensing deals.
Amazon probably gets Anthropic models they can resell “for free”. The 850M revenue is Anthropic’s, but there is incremental additional revenue to AWS’s hosted model services. AWS was already doing lots of things with Anthropic models, and this may alter the terms more in amazons favor.
Are they actually making money? I don’t know, investments aren’t usually profitable on day one. Is this an opportunity for more AWS revenue in the future? Probably.
And access to use anthropics models internally, where you have some guarantees and oversight that your corp and customer data aren't leaking where you don't want it to.
AI needs to be propped up because the bug tech cloud providers they depend on need AI to be a thing to justify their valuations. Tech is going through a bit of a slump where all things being hyped a few years ago sort of died down (crypto? VR? Voice assistants? Metaverse?). Nobody gets very hyped about any of those nowadays. I am probably forgetting a couple of hyped things that fizzled out over the years.
Case in point, as much as I despise Apple, they are not all-in the AI bandwagon because it does nothing for them.
Go look at earnings reports for big tech companies. AI is definitely driving incremental revenue.
Apple is definitely on the AI bandwagon, they just have a different business model and they’re very disciplined. Apple tends not to increase research and investment costs faster than revenue growth. You’ll also notice rumors that they’re lowering their self driving car and VR research goals.
Google Cloud revenue up 35% thanks to AI products [1,4,5]. Azure sales by a similar amount (but only 12% was AI products [2]. AWS is up too [3].
In so glad your point was that it’s not a scam, and there are billions of dollars in real sales occurring at a variety of companies. It’s amazing what publicly traded companies disclose if we only bother to read it. I’m glad we’re all not in the contrarian bubble where we have to hate anything with hype.
Except it sort of is. It needs AI to be hyped and propped up, so that all those silly companies spending in GCP can continue to do so for a wee bit longer.
Big cloud providers will push anything that would make them money. That’s just what marketing is.
AI was exciting long before big cloud providers even existed. Once it was clear that a product could be made, they started marketing it and selling the compute needed.
I think the implication of the top comment is that cloud providers are buying revenue. When we say that cloud provider revenue is "up due to AI", a large part of that growth may be their own money coming back to them through these investments. Nvidia has been doing the same thing, by loaning data centers money to buy their chips. Essentially these companies are loaning each other huge sums of money and representing the resulting income as revenue rather than loan repayments.
To be clear, it's not to say that AI itself is a scam, but that the finance departments are kind of misrepresenting the revenue on their balance sheets and that may be security fraud.
Crypto was exciting too. And metaverse. And VR. And voice assistants. Et cetera and so forth.
All those things would change the world, and nothing would ever be the same, and would disrupt everything. Except they wouldn't and they didn't.
The scam is that those companies don't want to be seen as mature companies, they need to justify valuations of growth companies, forever. So something must always go into the hype pyre.
By all means, I hope the scam goes on for longer, as it indirectly benefits me too. But I don't have it in my heart to be a hypocrite. I will call a pig a pig.
The LLMs and image generation models have obvious utility. They’re not AGI or anything wild like that, but they are legitimately useful, unlike crypto.
VR didn’t fail, it just wasn’t viral. Current VR platforms are still young.
The internet commercially failed in 2001, but look at it now.
Crypto the industry, imo, is a big pyramid scheme. The technology has some interesting properties, but the industry is scammy for sure.
Metaverse wasn’t even an industry, it was a buzzword for MMOs during a time when everyone was locked at home. Not really interesting.
I don’t think it’s wise to lump every market boom together. Not everything is a scam.
People are losing jobs because of AI. Like it or not, as imperfect as AI may be, AI is having a real world disruptive impact, however negative it may be. Customer service teams and call centers are already being affected by AI, and if they aren't being smart about it, being rendered obsolete.
A lot of folks here seem to look at AI through examples of YC companies apparently. Step back and look instead at the kind of projects technology consultancies are taking up instead - they are real world examples of AI applications, many of which don't even involve LLMs but other aspects such as TTS/STT, image generation, transcription, video editing, etc. Way too many freelancers have begun complaining about how their pipelines have been zilch in the past two years.
That was, perhaps, the only good retort made so far. Yes, call centers and customer service is being affected, although it is unclear to me if the cost-benefit make sense when AI stops being heavily subsidized - I may be wrong, but my impression is that AI companies bleed money not only with training, but in running the models, and the actual cost of those services for it to make sense will need to be substantially higher than they are right now.
Price dropping is just a matter of time. Compute gets cheaper and the models get better. We’ve seen 100x drop in price for same capabilities in ~2 years.
Don’t forget about writers and designers losing jobs as well. If you’re not absolute top and don’t use AI, AI will replace you.
> Case in point, as much as I despise Apple, they are not all-in the AI bandwagon because it does nothing for them.
not sure if you've been paying attention, but AI is literally _the only thing_ Apple talks about these days. They literally released _an entire generation of devices_ where the only new thing is "Apple Intelligence"
they are investing differently. Apple has a much more captive audience than the others, and as such is focused on AI services that can be run on device. as such, they aren't doing the blessing edge foundation modern research, but instead putting a ton of money into productionizing smaller models that can be run without giant cloud compute.
Trivia: not sure if you’re aware, but there’s billion dollar companies in all these spaces you claim “nobody cares about”. Every single stock broker in the US trades crypto now. Omniverse earns Nvidia a ton of money, Apple earned a billion dollars with a clunky v1 and Meta is selling more and more Quests every half.
Apple has spent over $10B on AVP and made back less than 10% of that with no signs of improvement in the next year or two and continued big spending on dev and content.
Meta has spent over $50B on Quest and the Metaverse with fewer than 10M MAU to show for it.
If you think those are successes, I'll go out and get several bridges to sell you. Meet me here tomorrow with cash.
Yeah. Was not really a world changer as it was claimed to be during hype cycle.
Billion dollar valuation for a conpany in a given space is not as impressive as you think it is. Do I need to mention some high profile companies with stellar valuations that are sort of a joke now? We can work together on this ;)
I am not privy to specific details, but in general there is a difference between investment and partnership. If it's literally an investment, it can either be in cash or in kind, where in kind can be like what MSFT did for OpenAI, essentially giving them unlimited-ish ($10b) Azure credits for training ... but there was quid pro quo where MSFT in turn agreed to embed/extend OpenAI in Azure services.
If it's a partnership investment, there may be both money & in-kind components, but the money won't be in the context of fractional ownership. Rather it would be partner development funds of various flavors, which are usually tied to consumption commits as well as GTM targets.
Sometimes in reading press releases or third party articles it's difficult to determine exactly what kind of relationship the ISV has with the CSP.
There's also another angle. During the call with Lex last week, Dario seemed to imply that future models would run on amazon chips from Annapurna Labs (Amazon's 2015 fabless purchase). Amazon is all about the flywheel + picks and shovels and I, personally, see this as the endgame. Create demand for your hardware to reduce the per unit cost and speed up the dev cycle. Add the AWS interplay and it's a money printing machine.
Also: they need top tier models for their Bedrock business. They are one of only a few providers for Claude 3.5 - it’s not open and anthropic doesn’t let many folks run it.
Google has Gemini (and Claude), MSFT has OpenAI. Amazon needs this to stay relevant.
Supermicro is currently under DOJ investigation for similar schemes to this. The legality of it probably depends on the accounting, and how revenue is recognized, etc.
It certainly looks sketchy. But I’m sure there’s a way to do it legitimately if their accountants and lawyers are careful about it…
i believe they get to book any investment of cloud credits as revenue, here’s a good thread explaining the grift: https://news.ycombinator.com/item?id=39456140 basically you’re investing your own money in yourself which mostly nets out but you get to keep the equity (and then all the fool investors FOMO in on fake self dealing valuations)
Anthropic should double down on the strategy of being the better code generator. No I don't need an AI agent to call the restaurant for me. Win the developers over and the rest will follow.
> Win the developers over and the rest will follow.
Will they really? Anecdotal evidence, but nobody I know in real life knows about Claude (other than it's an ordinary first name). And they all use or at least know about ChatGPT. None of them are software engineers of course. But the corporate deciders aren't software engineers either.
If they ever do Apple and Google will offer it as a service built into your phone .
For example, you could say ok Google call that restaurant me and My girlfriend had our first date at 5 years ago, set up something nice so I can propose. And I guess Google Gemini ( or whatever it's called at this point), Will hire a band, some photographers, and maybe even a therapist just in case it doesn't work out.
All of this will be done seamlessly.
But I don't imagine any normal person will pay 20 or $30 a month for a standalone service doing this. As is it's going to be really hard to compete against GitHub Copilot they effectively block others from scrapping GitHub.
But why hire a therapist when Gemini is there to talk to?
Re: Github Copilot: IME it's already behind. I finally gave Cursor a try after seeing it brought up so often, and its suggestions and refactors are leagues ahead of what Copilot can do.
It is behind, but I think that's intentional. They can simply wait and see which of the competing VSCode AI forks/extensions gains the most traction and then acquire them or just imitate and improve. Very little reason to push the boundaries for them right now.
Because the most important part of therapy for a lot of things is the human connection, not so much the knowledge. Therapy is important, the US system is just stupid
Consumers don't have to consciously choose Claude, just like most people don't know about Linux. But if they use an Android phone or ever use any web services they are using Linux.
Every single business I know that pays for LLMs (on the order of tens of thousands of individual ChatGPT subscriptions) pay for whatever the top model is in their general cloud of choice with next to no elasticity. e.g. a company already committed to Azure will use the Azure OpenAI models and a customer already commited to AWS will use Claude.
Most people I know in real life have certainly heard of ChatGPT but don't pay for it.
I think someone enthusiastic enough to pay for the subscription is more likely to be willing to try a rival service, but that's not most people.
Usually when these services are ready to grow they offer a month or more free to try, at least that's what Google has been doing with their Gemini bundle.
I'm actually baffled by the number of people I've met who pay for such services, when I can't tell the difference between the models available within one service, or between one service or the other (at least not consistently).
I do use them everyday, but there's no way I'd pay $20/month for something like that as long as I can easily jump from one to the other. There's no guarantee that my premium account on $X is or will remain better than a free account on $Y, so committing to anything seems pointless.
I do wonder though: several services started adding "memories" (chunks of information retained from previous interactions), making future interactions more relevant. Some users are very careful about what they feed recommendation algorithms to ensure they keep enjoying the content they get (another behavior I'm was surprised by), so maybe they also value this personalization enough to focus on one specific LLM service.
The amount of free chats you get per day is way too limiting for anyone who uses LLMs as an important tool in their day job.
20 USD a month to make me between 1.5x and 4x more productive in one of the main tasks of my job really is a bargain, considering that 20 USD is very small fraction of my salary.
If I didn't pay, I'd be forced to wait, or create many accounts and constantly switch between them, or be constantly copy-pasting code from one service to the other.
And when it comes to coding, I've found Claude 3.5 Sonnet better than ChatGPT.
Yesterday I needed to take an unstructured document with about 1,200 timestamps and substract 1 second 550ms from each of those.
I could have written code for it, but Claude output a perfectly valid HTML page I could locally paste my document in, which gave me the accurate output I needed.
This is knowledge work.
Today I had another document, about the length of a small book, where H3 and H4 titles were mistakenly provided in the wrong language. I needed those 159 titles to be changed while preserving the rest of the document, with a very specific maximum word count per title. Claude did this with a single natural language prompt. (though I had to tell it to "go on" every couple hundred lines)
This is also knowledge work. Knowledge work is not generating new knowledge, just like manual work isn't about generating new hands.
OP and the people who reply to you are perfect examples of engineers being clueless about how the rest of the world operates. I know engineers who don’t know Claude, and I know many, many regular folk who pay for ChatGPT (basically anyone who’s smart and has money pays for it). And yet the engineers think they understand the world when in reality they just understand how they themselves work best.
I use Claude Pro paid version every day, but not for coding. I used to be a software engineer, but no longer.
I tried OpenAI in the past, but I did not enjoy it. I do not like Sam Altman.
My use cases:
Generating a business plan, podcast content, marketing strategies, sales scripts, financial analyses, canned responses, and project plans. I also use it for general brainstorming, legal document review, and so many other things. It really feels like a super-assistant.
Claude has been spectacular about 98% of the time. Every so often it will refuse to perform an action - most recently it was helping me research LLC and trademark registrations, combined with social media handles (and some deviations) and web URL availability. It would generate spectacular reports that would have taken me hours to research, in minutes. And then Claude decided that it couldn't do that sort of thing, until it could the next day. Very strange.
I have given Gemini (free), OpenAI (free and Paid), Copilot (free), Perplexity (free) a shot, and I keep coming back to Claude. Actually, Copilot was a pretty decent experience, but felt the guardrails too often. I do like that Microsoft gives access to Dall-E image generation at no cost (or maybe it is "free" with my O365 account?). That has been helpful in creating simple logo concepts and wireframes.
I run into AI with Atlassian on the daily, but it sucks. Their Confluence AI tool is absolute garbage and needs to be put down. I've tried AI tools that Wix, Squarespace, and Mira provide. Those were all semi-decent experiences. And I just paid for X Premium so I can give Grok a shot. My friend really likes it, but I don't love the idea of having to open an ultra-distracting app to access it.
I'm hoping some day to be like the wizards on here who connect AI to all sorts of "things" in their workflows. Maybe I need to learn how to use something like Zapier? If I have to use OpenAI with Zapier, I will.
I also prefer Claude after trying the same options as you.
That said I can't yet confidently speak to exactly why I prefer Claude. Sometimes I do think the responses are better than any model on ChatGPT. Other times I am very impressed with chatGPT's responses. I haven't done a lot of testing on each with identical prompt sequences.
One thing I can say for certainty is that Claude's UI blows chatGPT's out of the water. Much more pleasant to use and I really like Projects and Artifacts. It might be this alone that has me biased towards Claude. It makes me think that UI and additional functionality is going to play a much larger role in determining the ultimate winner of the LLM wars than current discussions give it credit for.
I have been flogging the hell out of copilot for equities research and to teach me about finance topics. I just bark orders and it pumps out an analysis. This is usually so much work, even if you have a service like finviz, Fidelity or another paid service.
Thirty seconds to compare 10yrs of 10ks. Good times.
In my experience*, for coding, Sonnet is miles above any model by OpenAI, as well as Gemini. They're all far from perfect, but Sonnet actually "gets" what you're asking, and tries to help when it fails, while the others wander around and often produce dismal code.
* Said experience is mostly via OpenRouter, so it may not reflect the absolute latest developments of the models. But there at least, the difference is huge.
I also don't understand the idea of voice mode, or agent controller computer. Maybe it is cool to see as a tech demo, but all I really want is good quality, at reasonable price for the LLM service
I think voice mode makes significantly more sense when you consider people commuting by car by themselves every day.
Personally I don't (and I'd never talk to an LLM on public transit or in the office), but almost every time I do drive somewhere, I find myself wishing for a smarter voice-controlled assistant that would allow me to achieve some goal or just look up some trivia without ever having to look at a screen (phone or otherwise).
This is the direction I am building my personal LLM based scripts. I don’t really know any python but Claude has written python scripts that e.g. write a document iteratively using LLMs. Next step will be to use voice and autogpt to do things that I would rather dictate to someone. E.g. find email from x => write reply => edit => send
Much more directed/almost micro managing but it’s still quicker than me clicking around (in theory).
Edit:
I’m interested to explore how much better voice is as an input (vs writing as an input)
To me, reading outputs is much more effective than listening to outputs.
More seriously: I think there are a ton of potential applications. I'm not sure that developers that use AI tools are more likely to build other AI products - maybe.
No they should not do this. They are trying to create generalized artificial intelligence not a specific one. Let the cursor, zed, codeium or some smaller company focus on that.
They certainly need the money. The Pro service has been running in limited mode all week due to being over capacity. It defaults to “concise” mode during high capacity but Pro users can select to put it back into “Full Response.” But I can tell the quality drops even when you do that, and it fails and brings up error messages more commonly. They don’t have enough compute to go around.
I’ve been using the API for a few weeks and routinely get 529 overloaded messages. I wasn’t sure if that’s always been the case but it certainly makes it unsuitable for production workloads because it will last hours at a time.
Hopefully they can add the capacity needed because it’s a lot better than GPT-4o for my intended use case.
Sonnet is better than 4o for virtually all use cases.
The only reason I still use OpenAI's API and chatbot service is o1-preview. o1 is like magic. Everything Sonnet and 4o do poorly, o1 solves like a piece of cake. Architecting, bug fixing, planning, refactoring, o1 has never let me know on any 'hard' task.
A nice combo is have o1 guiding Sonnet. I ask o1 to come up with a solution and explanation, then simply feed its response into Sonnet to execute. That running on Aider really feels like futuristic stuff.
Exactly my experience as well. Like Sonnet can help me in 90% of the cases but there are some specific edge cases where it struggles that o1 can solve in an instant. I kinda hate it because of having to pay for both of them.
I've been using Claude 3.5 over API for about 4 months on $100 of credit. I use it fairly extensively, on mobile and my laptop, and I expected to run out of credit ages ago. However, I am careful to keep chats fairly short as it's long chats that eat up the credit.
So I'd say it depends. For my use case it's about even but the API provides better functionality.
I alluded to this in another comment, but I have 4o to be better than Sonnet in Swift, Obj-C, and Applescript. In my experiences, Claude is worse than useless with those three languages when compared to GPT. Everything else, I'd say the differences haven't been too extreme. Though, o1-preview absolutely smokes both in my experiences too, but it isn't hard for me to hit it's rate limit either.
Interesting. I haven't compared with 4o or GPT4, but I found DeepSeek 2.5 seems to be better than Claude 3.5 Sonnet (new) at Julia. Although I've seen both Claude and DeepSeek make the exact same sequence of errors (when asked about a certain bug and then given the same reply to their identical mistakes) that shows they don't fully understand the syntax for passing keyword arguments to Julia functions... wow. It was not some kind of tricky case or relevant to the bug. Must have same bad training data. Oops, that's diversion. Actually they're both great in general.
I can see what you mean by LLMs making the same mistakes. I had that experience with both GPT and Claude, as well.
However, I found that GPT was better able to correct its mistakes while Claude essentially just doubles down and keeps regurgitating permutations of the same mistakes.
I can't tell you how many times I have had Claude spit out something like, "Use the Foobar.ToString() method to convert the value to a string." To which I reply, something like, "Foobar does not have a method 'ToString()'."
Then Claude will say something like, "You are right to point out that Foobar does not have a .ToString method! Try Foobar.ConvertToString()"
At that point, my frustration levels start to rapidly increase. Have you had experiences like that with Claude or DeepSeek? The main difference with GPT is that GPT tends to find me the right answer after a bit of back-and-forth (or at least point me in a better direction).
Having used o1 and Claude through Copilot in VSC - Claude is more accurate and faster. A good example is the "fix test" feature is almost always wrong with o1, Claude is 50/50 I'd say - enough to try. Tried on Typescript/node and Python/Django codebases.
None of them are smart enough to figure out integration test failures with edge cases.
in the beginning i was agitated by Concise and would move it back manually. But then I actually tried it, I asked for SQL and it gave me back SQL and 1-2 sentences at most
Regular mode gives SQL and entire paragraphs before and after it. Not even helpful paragraphs, just rambling about nothing and suggesting what my next prompt should be
Now I love concise mode, it doesn't skimp on the meat, just the fluff. Now my problem is, concise only shows up during load. Right now I can't choose it even if i wanted to
Oh you are asking for a 2 line change? Here is the whole file we have been working on with a preamble and closing remarks, enjoy checking to see if I actually made the change I am referring to in my closing remarks and my condolences if our files have diverged.
You know the craziest thing I’ve seen ChatGPT do is claim to have made a change to my terraform code acting all “ohh here is some changes to reflect all the things you commented on” and all it did was change the comments.
It’s very bizarre when it rewrites the exact same code a second or third time and for some reason decides to change the comments. The comments will have the same meaning but will be slightly different wording. I think this behavior is an interesting window into how large language models work. For whatever reason, despite unchanging repetition, the context window changed just enough it output a statistically similar comment at that juncture. Like all the rest of the code it wrote out was statistically pointing the exact same way but there was just enough variance in how to write the comment it went down a different path in its neural network. And then when it was done with that path it went right back down the “straight line” for the code part.
I don't think the context window has to change for that to happen. The LLMs don't just pick the most likely next token, it's sampled from the distribution of possible tokens so on repeat runs you can get different results.
Probably an overcorrection from when people were complaining very vocally about ChatGPT being "lazy" and not providing all the code. FWIW I've seen Claude do the same thing when asked do debug something it obviously did not know how to fix it would just repeatedly refactor the same sections of code and making changes to comments.
I feel like “all the code” and “only the changes” needs to be an actual per chat option. Sometimes you want the changes sometimes you want all the code and it is annoying because it always seems to decide it’s gonna do the opposite of what you wanted… meaning another correction and thus wasted tokens and context. And even worse it pollutes your scroll back with noise.
An alternative way to the Concise mode would be to add that (or those) sentence(s) yourself, I personally tell it to not give me the code at all at times, and at another times I want the code only, and so forth.
You could add these sentences as project instructions, for example, too.
Interesting. I also find it frustrating to be rate limited/have responses fail when I’m paying for the product, but I’ve actually found that the “concise” mode answers have less fluff and make for faster back and forth. I’ve once or twice looked for the concise mode selector when the load wasn’t high.
Their shitty UI is also not doing them any infrastructure favors, during load it'll straight up write 90% of an answer, and then suddenly cancel and delete the whole thing, so you have to start over and waste time generating the entire answer again instead of just continuing for a few more sentences. It's like a DDOS attack where everyone gets preempted and immediately starts refreshing.
Yes! It's infuriating when Claude stops generating mid response and deletes the whole thread/conversation. Not only you lose what it has generated so far, which would've been at least somewhat useful, but you also lose the prompt you wrote, which could've taken you some effort to write.
> But I can tell the quality drops even when you do that
Dario said in a recent interview that they never switch to a lower quality model in terms of something with different parameters during times of load. But he left room for interpretation on whether that means they could still use quantization or sparsity. And then additionally, his answer wasn't clear enough to know whether or not they use a lower depth of beam search or other cheaper sampling techniques.
He said the only time you might get a different model itself is when they are A-B testing just before a new announced release.
And I think he clarified this all applied to the webui and not just the API.
I've had it refuse to generate a long text response (I was trying to concise a 300kb documentation to 20-30kb to be able to put it in the project's context), and every time I asked it replied "How should structure the results ?", "Shall I go ahead with writing the artifacts now ?", etc.
It wasn't even during the over-capacity event I don't think, and I'm a pro user.
Hate to be that guy, but did you tell it up front not to ask? And, of course, in a long-running conversation it's important not to leave such questions in the context.
The weird thing is that when I tried to tell it to distill it to a much smaller message it had no problem outputting it without any followup questions. But when I edited my message to ask it to generate a larger response, then I got stuck in the loop of it asking if I was really sure or telling me that `I apologize, but I noticed this request would result in a very large response.`
It sparks me as odd, because I've had quite a few times where it would generate me a response over multiple messages (since it was hitting its max message length) without any second-guessing or issue.
I am a paying customer with credits and the API endpoints rate-limited me to the point where it's actually unusable as a coding assistant. I use a VS Code extension and it just bailed out in the middle of a migration. I had to revert everything it changed and that was not a pleasant experience, sadly.
When working with AI coding tools commit early, commit often becomes essential advice. I like that aider makes every change its own commit. I can always manicure the commit history later, I'd rather not lose anything when the AI can make destructive changes to code.
I can recommend https://github.com/tkellogg/dura for making auto-commits without polluting main branch history, if your tool doesn't support it natively
Could you explain more on how to do this? e.g if I am using the Claude API in my service, how would you suggest I go about setting up and controlling my own inference endpoint?
You're right, but that's also subject to compute costs and time value of money. The calculus is different for companies trying to exploit language models in some way, and different for individuals like me who have to feed the family before splurging for a new GPU, or setting up servers in the cloud, when I can get better value by paying OpenAI or Claude a few dollars and use their SOTA models until those dollars run out.
FWIW, I am a strong supporter of local models, and play with them often. It's just that for practical use, the models I can run locally (RTX 4070 TI) mostly suck, and the models I could run in the cloud don't seem worth the effort (and cost).
For the money for a 4070ti, you could have bought a 3090, which although less efficient, can run bigger models like Qwen2.5 32b coder. Apparently it performs quite well for code
More evidence that people should use wrappers like OpenRouter and litellm by default? (Makes it easy to change your choice of LLMs, if one is experiencing problems)
Neither does OAI. Their service has been struggling for more than a week now. I guess everyone is scrambling after the new qwen models dropped and matched the current state of the art with open weights.
Hmmm... I wonder if this is why some of the results I've gotten over the past few days have been pretty bad. It's easy to dismiss poor results on LLM quality variance from prompt to prompt vs. something like this where the quality is actively degraded without notification. I can't say this is in fact what I'm experience, but it was noticeable enough I'm going to check.
Never occurred to me that the response changes based on load. I’ve definitely noticed it seems smarter at times. Makes evaluating results nearly impossible.
Unrelated. Inference doesn't run in sync with the wall clock; it takes whatever it takes. The issue is more like telling a room of support workers they are free to half-ass the work if there's too many calls, so they don't reject any until even half-assing doesn't lighten the load enough.
This is one reason closed models suck. You can't tell if the bad responses are due to something you are doing, or if the company you are paying to generate the responses is cutting corners and looking for efficiencies, eg by reducing the number of bits. It is a black box.
Recently I started wondering about the quality of ChatGPT. A couple of instances I was like: "hmm, I’m not impressed at all by this answer, I better google it myself!"
Recently I asked 4o to ‘try again’ when it failed to respond fully, it started telling me about some song called Try Again. It seems to lose context a lot in the conversations now.
Anthropic gets a lot of it's business via AWS Bedrock so it's fair to say that Amazon probably has reasonable insight into how the Claude usage is growing that makes them confident in this investment
They are also confident in the investment because they know that all the money is going to come right back to them in the short term (via AWS spending) whether or not Anthropic actually survives in the long term.
Nope they have supported AWS deployments for a long time, and now even more of the spend will be on AWS.
> Anthropic has raised an additional $4 billion from Amazon, and has agreed to train its flagship generative AI models primarily on Amazon Web Services (AWS), Amazon’s cloud computing division.
Anthropic will be the winner here, zero doubts in my mind. They have leapfrogged head and shoulders above OpenAI over the last year. Who'd have thought a business predicated entirely on keeping the ~1000 people on earth qualified to work on this stuff happy would go downhill once they failed at that.
This makes sense in the grand scheme of things.
Anthropic used to be in the Google camp, but DeepMind seems to have picked up speed lately, with new “Experimental” Gemini Models beating everyone, while AWS doesn't have anything on the cutting edge of AI.
Hopefully this helps Anthropic to fix their abysmal rate limits.
I had to switch from Pro to Teams plan and pay 150 USD for 5 accounts because the Pro plan has gotten unusable. It will allow me to ask a dozen or so questions and then will block me for hours because of „high capacity.“ I don’t need five accounts, one for 40 USD would be totally fine if it would allow me to work uninterrupted for a couple of hours.
All in all Claude is magic. It feels like having ten assistants at my fingertip. And for that even 100 USD is worth paying.
I just start new chats whenever the chat gets long (in terms of number of tokens). It's kind of a pain to have to form a prompt that encapsulates enough context, but it has prevented me from hitting the Pro limit. Also, I include more questions and detail in each prompt.
Why does that work? Claude includes the entire chat with each new prompt you submit [0], and the limit is based on the number of tokens you've submitted. After not too many prompts, there can be 10k+ tokens in the chat (which are all submitted in each new prompt, quickly advancing towards the limit).
(I also have a chatGPT sub and I use that for many questions, especially now that it includes web search capabilities)
> It's kind of a pain to have to form a prompt that encapsulates enough context, but it has prevented me from hitting the Pro limit. Also, I include more questions and detail in each prompt.
i get it to provide a prompt to start the new chat. i sometimes wish there was a button for it bc it's such a big part of my workflow
also, do any data engineers know how context works on the backend? seems like you could get an llm to summarize a long context and that would shorten it? also seems like i don't know what i'm talking about.
could the manual ux that i've come up happen behind the scenes?
Google also invested $2B into Anthrophic. Seems like both Google and Amazon are providing credits for their cloud, also as a hedge against Microsoft / OpenAI becoming too big.
If I have to chose between Amazon and Microsoft I’ll chose the lesser evil. Microsoft owns the entire stack from OS to server to language to source control. Anything to weaken their hold is a win in my book.
> chose between Amazon and Microsoft... the lesser evil
A hard question. If you focusing purely on tech, probably Microsoft. But overall evil in the world? With their union busting and abuse of workers, Amazon, I'd say.
> Amazon Web Services will also become Anthropic’s “primary cloud and training partner,” according to a blog post. From now on, Anthropic will use AWS Trainium and Inferentia chips to train and deploy its largest AI models.
I suspect that's worth more than $4B in the long term? I'm not familiar with the costs, though.
I’ve been impressed with the AI assisted tooling for the various monitoring systems in Azure at least. Of course this is mainly because those tools are so ridiculously hard to use that I basically can’t for a lot of things. The AI does it impressively well though.
I’d assume there is a big benefit to having AI assisted resource generation for cloud vendors. Our developers often have to mess around with things that we really, really, shouldn’t in Azure because operations lacks the resources and knowledge. Technically we’ve outsourced it, but most requests take 3 months and get done wrong… if an AI could generate our network settings from a global policy that would be excellent. Hell if it could handle all our resource generation they would be so much useless time wasted because our organisation views “IT” as HRs uncharming cost center cousin.
The status pages of OpenAI and Anthropic are in stark contrast and that mirrors my experience. Love Anthropic for code and its Projects feature, but OpenAI is still way ahead on voice and reliability.
I've been playing with Alibaba's Qwen 2.5 model and I've had it claim to be Claude. (Though it usually claims to be Llama, and it seems to think it's a literal llama, i.e. it identifies as an animal, "among other things".)
1/ Best-in-class LLM in Bedrock. This could be done w/o the partnership as well.
2/ Evolving Tranium and Inferential as worthy competitors for large scale training and inference. They have thousands of large-scale customers, and as the adoption grows, the investment will pay for itself.
I love Claude 3.5 sonnet and their UI is top notch especially for coding, recently though they have been facing capacity issues especially during weekdays correlating with working hours. Have tried Qwen2.5 coder 32B and it's very good and close to Claude 3.5 in my coding cases.
This is what annoys me a lot, too. I mean the fact that I cannot have paste retain the formatting (```, `, etc.). Same with the UI retaining my prompt, but not the formatting, so if you do some formatting and reload, you will lose that formatting.
AWS Trainium is a machine learning chip designed by AWS to accelerate training deep learning models. AWS Bedrock is a fully managed service that allows developers to build and scale generative AI applications using foundation models from various providers.
Trainium == Silicon (looks like Anthropic has agreed to use it)
Bedrock == AWS Service for LLMs behind APIs (you can use Anthropic models through AWS here)
I'
m not sure how they make it back. The guardrails in place are extremely strict. The only people who seem to use it are a subset of developers who are unhappy with OpenAI. With Bard popping up free everywhere taking away much of the general user crowd and OpenAI offering the mini model always free and limited image generation / expensive model. Then you have to do it yourself crowd with llama. What is their target market? Governments? Amazon companies?There free their offers 10 queries and half of them need to be used to get around filters I don't see this positioned well for general customers.
The Guardrails on Claude Sonnet 3.5 API are not stricter than Openai's guardrails in my experience. More specifically, if you access the models via API or third party services like Poe or Perplexity the guardrails are not stricter than GPT4o. I've never subscribed to Claude.ai so can't comment on that.
I have no experience with Claud.ai vs ChatGPT but it's clear the underlying model has no issue with guardrails and this is simply an easily tweaked developer setting if you are correct that they are stricter on Claude.ai.
(The old Claude 2.1 was hilariously unwilling to follow reasonable user instructions due to "ethics" but they've come a long way since then.)
> The Guardrails on Claude Sonnet 3.5 API are not stricter than Openai’s guardrails in my experience.
Both Gemini and Claude (via the API) have substantially tighter guardrails around recitation (producing output matching data from their training set) than OpenAI, which I ran into when testing an image text-extraction-and-document-formatting toolchain against all three.
Both Claude and Gemini gave refusals on text extraction from image documents (not available publicly anywhere I can find as text) from a CIA FOIA release
I just asked GPT4o to recognize a cartoon character (I accessed it via Perplexity) and it told me it isn't able to do that, while Claude Sonnet happily identified the character, so this might vary by use case or even by prompt.
I've had a situation where Claude (Sonnet 3.5) refused to translate song lyrics because of safety/copyright bullshit. It worked in a new chat where I mentioned that it was a pre 1900s poem.
It has held this position since at least June. The Aider LLM leaderboards [1] have the Sonnet 3.5 June version beating 4o handily. Only o1-preview beat it narrowly, but IIRC at much higher costs. Sonnet 3.5 October has taken the lead again by a wide margin.
Anecdotally, Claude seems to hallucinate more during certain hours. It's amusing to watch, almost like your dog that gets too bored and stops responding to your commands - you say "sit" and he looks at you, tilts his head, looks straight up at you, almost like saying "I know what you're saying..." but then decides to run to another room and bring his toy.
And you'd be wondering: "darn, where's that toughest, most obidient and smart Belgian malinois that just a few hour ago was ready to take down a Bin Laden?"
Talking of anecdotal, 4o with canvas, which is normally excellent, tends to give up around a certain context length, and you have to copy and paste what you have into a new window to get it to make edits
This week, along with the 20 weeks before that :) Model improvement has slowed down so much that things aren't changing quickly anymore. And Anthropic has only widened the gap with 3.5-v2.
With Claude on Bedrock I can use LLMs in production without sending customer data to the US. And if you're already on AWS it's super easy to onboard wrt. auth and billing and compliance.
If you're using Bedrock you're still subject to the CLOUD act/FISA meaning the whole angle of "not sending customer data to the US" isn't worth very much.
Claude api use is already as high as openai. I believe that market will grow far more over time than chat as AI gets embedded in more of the applications we already use.
I am in Operations. I use it (and pay for it) because the free version seemed to work best for me compared to Perplexity (which had been my go-to) and ChatGPT/OpenAI.
Government alone could be huge, with this recent nonsense about the military funding a “Manhattan project for AI” and the recently announced Pentagon contracts.
Can someone with familiarity in rounds close to this size speak to their terms?
For instance: i imagine a significant part of this will be “paid” as AWS credits and is not going to be reflected as a balance in a bank account transfer.
Yes, that is the case. It is largely 4B in capex investment, I’d imagine 10% or less is cash. One would think nvidia could get much better terms investing its gpu (assuming they can get it into a working cluster). Instead it’s nvidia gets cash for gpu hardware, that hardware gets put into a data center and AWS invests their hardware as credits for equity instead of cash. And because AWS has already built out their data center infra they can get a better deal than nvidia making the play because nvidia has to rebuild an entire data center infra from scratch (in addition to designing gpu etc).
Now if AWS or gcp can crack gpu compute better than nvidia for training and hosting, then they can basically cut out nvidia and so essentially they get gpu at cost (vs whatever markup they pay to nvidia).
Because essentially whatever return AWS will make from Anthropic will be modulated by the premiums paid to nvidia to invest and also the cost of operating a data center for Anthropic.
But thankfully all of that gets mediated on paper because valuation is more speculative than the returns on nvidia hardware (which will be known to the cent by AWS given its some math of hourly rate and utilization which they have a good idea of)
Microsoft -> OpenAI (& Inflection AI)
Google -> Gemini (and a bit of Anthropic)
Amazon -> Anthropic
Meta -> Llama
Is big tech good for the startup ecosystem, or are they monopolies eating everything (or both?). To be fair to Google and Meta they came up with a lot of the stuff in the first place, and aren't just buying the competition.
There wouldn't be an LLM startup ecosystem without big tech.
Notable contributions: Nvidia for, well, (gestures at everything), Google for discovering (inventing?) transformers, being early advocates of ML, authoring tensorflow, Meta for Torch and open sourcing Llama, Microsoft for investing billions in OpenAI early on and keeping the hype alive. The last one is a reach, I'm sure Microsoft Research did some cool things I'm unaware of.
You might be right, we don’t know how an alternative reality would have played out though to say if this is the only way (and fastest) way we could have got here.
Some of these investments sound big in absolute terms..
However not that big considering the scale of the investor AND that many of these investors are also vendors.
MSFT/AMZN/NVDA investing in AI firms that then use their clouds/chips/whatever is an interesting circular investment.
Are you a paying customer? I exclusively use their best model and while I get warnings (stuff about longer chats leading to more limit usage), I've never been kicked out.
The only thing is that they've recently started defaulting to Concise to cut costs, which is fine with me.
Concise mode is honestly better anyway. I’d prefer it always be in that mode.
But that being said I bump into hard limits far more often than I do with ChatGPT. Even if I keep chats short like it constantly suggests, eventually it cuts me off.
Anecdotal experience, but as far as I've played around with them, Claude's models have given me a better impression. I would much rather have great responses with lower availability than mediocre responses available all the time.
I often hear people praise Claude as being better than chatGPT, but I’ve given both a shot and much prefer chatGPT.
Is there something I’m missing here? I use chatGPT for a variety of things but mainly coding and I feel subjectively that chatGPT is still better for the job.
Mostly python and js, so the most popular ones. I should mention that I do use obscure modules and packages in each language where chatGPT starts to suck a little. I imagine this might be similar to how chat might work with Rust or Zig, etc.
Same as the big tech companies, probably make all of their products worse in service to advertising. AI-generated advertising prompted by personal data could be extremely good at getting people to buy things if tuned appropriately.
Well. If you're using AI instead of a search engine, they could make the AI respond with product placement more or less subtle.
But if you're using AI for example to generate code as an aid in programming, how's that going to work? Or any other generative thing, like making images, 3d models, music, articles or documents... I can't imagine inserting ads into those would not destroy the usefulness instantly.
My guess is they don't know themselves. The plan is to get market shre now, and figure it out later. Which may or may not turn out well.
Cost of inference will tend to the the same as cost of a Google search. It is infra that will come down to negligible and almost free. Then as others have said it will tend to freemium (pay to have no ads). And additional value added services as they continue to evolve up the food chain (ai powered sales, marketing, etc)
I'm working with models and the costs are ridiculous. $7000 card and 800 watts later for my small projects and I can't imagine how they can make money in the next 5 to 10 years. I need to do more research on hardware approaching that reduces costs and power consumption. I just started experimenting with llama.cpp and I'm mildly impressed.
Looking at API providers like Together that host open source models like Llama 70b and running these models in production myself, they have healthy margins (and their inference stack is much better optimized).
relatedly: is claude3.5-haiku being delivered above their cost, after they quadrupled the price?
Though it wouldn't ensure profitability since they're spending so much on training. I'm sure with inference-use growing, they're hoping that eventually total_expenses(inference) grows to be much much larger than total_expenses(training)
They'll invent AGI, put 50% of workers out of a job, then presumably have the AGI build some really good robots to protect them from the ensuing riots.
I must be missing it. How is anthropic worth so much when open source is closing in so fast? What value will anthropic have if competitors can be mostly free?
Claude Sonnet 3.5 is simply amazing. No matter how much I used it I continue to be amazed at what it can produce.
I recently asked it what the flow of data is when two vNICs on the same host send data to each other and it produced a very detailed answer complete with a really nice diagram. I then asked what langue the diagram uses and it said Mermaid. So I then asked it to produce some example L1,2,3 diagrams for computer networks and it did just that. So it then asked it to produce Python code using PyATS to run show commands on Cisco switches and routers and use the data to produce Mermaid network diagrams for layers 1,2, and 3 and it just spit out working Python code. This is a relatively obscure task with a specific library no one outside of Networking knows about integrating with a diagram generator. And it fully understands the difference in network layers. Just astonishing. And now it can write and run Javascript apps. The only feature I really want is for it to be able to run generated Python code to see if it has any errors and automatically fix them.
If progress on LLMs doesn't stall they will be truly amazing in just 10 years. And probably consuming 5% of global electricity.
VS Code has a plugin Cline, using your api key it will run Claude sonnet, can edit and create files in the workspace, and run commands in the terminal to check functionality, read errors, and correct them.
As someone who doesn't really follow the LLM space closely, I have been consistently turning to Anthropic when I want to use an LLM (usually to work through coding problems)
Beside Sonnet impressing me, I like Anthropic because there's less of an "icky" factor compared to OpenAI or even Google. I don't know how much better Anthropic actually is, but I don't think I'm the only one who chooses based on my perception of the company's values and social responsibility.
Yea, even if they're practically as bad, there's value in not having someone like Altman who's out there saying things about how many jobs he's excited to make obsolete and how much of the creative work of the world is worthless.
I mean, he's certainly acting as if he's entitled to train on all of it for free as long as it's not made by a big enough company that may be able to stop/sue him. And then feels entitled to complain about artists tainting the training data with tools.
He has a very "wealth makes right" approach to the value of creative work.
> Last year, Google committed to invest $2 billion in Anthropic, after previously confirming it had taken a 10% stake in the startup alongside a large cloud contract between the two companies.
Well, there you go. These companies are always closer than they seem at first glance, and my preference for Anthropic may just be patting myself on the back.
Funny, I use Mistral because it has 'more" of that same factor, even in the name!
They're the only company who doesn't lobotomize/censor their model in the RLHF/DPO/related phase. It's telling that they, along with huggingface, are from le france - a place with a notably less puritanical culture.
do you feel the less censorship yourself from their instruction tuned model, or is there some public reference to showcase? (i haven't used mistral model before). It's interesting if a major llm player adopt a different safety / alignment goal.
Personally, I find companies with names like "Anthropic" to be inherently icky too. Anthropic means "human," and if a company must remind me it is made of/by/for humans, it always feels less so. E.g.
The Browser Company of New York is a group of friendly humans...
Second, generative AI is machine generated; if there's any "making" of the training content, Anthropic didn't do it. Kind of like how OpenAI isn't open, the name doesn't match the product.
I actually agree with your principle, but don't think it applies to Anthropic, because I interpret the name to mean that they are making machines that are "human-like".
More cynically, I would say that AI is about making software that we can anthropomorphize.
> Anthropic means "human," and if a company must remind me it is made of/by/for humans
Why do you think that that's their intended reading? I had assumed the name was implying "we're going to be an AGI company eventually; we want to make AI that acts like a human."
> if there's any "making" of the training content, Anthropic didn't do it
This is incorrect. First-gen LLM base models were made largely of raw Internet text corpus, but since then all the improvements have been from:
• careful training data curation, using data-science tools (or LLMs!) to scan the training-data corpus for various kinds of noise or bias, and prune it out — this is "making" in the sense of "making a cut of a movie";
• synthesis of training data using existing LLMs, with careful prompting, and non-ML pre/post-processing steps — this is "making" in the sense of "making a song on a synthesizer";
• Reinforcement Learning from Human Feedback (RLHF) — this is "making" in the sense of "noticing when the model is being dumb in practice" [from explicit feedback UX, async sentiment analysis of user responses in chat conversations, etc] and then converting those into weights on existing training data + additional synthesized "don't do this" training data.
I read Anthropic as eluding to the Anthropic Principle as well as the doomsday argument and related memeplex[0] mixed with human-centric or about humans. Lovely naming IMHO.
We both assumed, so I didn't expect to need to back up my thoughts, but their own website ticks the "for humans" trope checkbox: Their "purpose is the responsible development and maintenance of advanced AI for the long-term benefit of humanity."
I acknowledge and appreciate Anthropic's addition to the corpus of scraped data, but that data (both input and output) is still ultimately from others; if it did not exist, there would be no product. This is very different from a video editing tool, which I purchase or lease with the understanding that I will provide my own content, or maybe use licensed footage for B-roll
> I acknowledge and appreciate Anthropic's addition to the corpus of scraped data, but that data (both input and output) is still ultimately from others; if it did not exist, there would be no product.
There’s a Ship of Theseus thing going on here with the training corpus, though.
Consider the progression of DeepMind’s game-of-go-playing model from AlphaGo to AlphaZero. AlphaGo needed a training corpus of real human games of Go. But AlphaZero was trained by playing against the already-trained AlphaGo model; and then, after that, against earlier versions of itself. AlphaZero never saw any training corpus authored by humans; it only reacted to an agent that knew such a corpus (at the bootstrapping phase) — and since it was treating that agent as a black box to play against, it didn’t actually matter where that other agent’s knowledge of go came from.
Another analogy might be to compilers. The first version of a (systems) programming language’s compiler must necessarily be written in some other language. But usually, a compiler is then written in the language itself, and the non-self-hosted compiler is then used to compile the self-hosted compiler.
Would it be common sense to say that AlphaZero, or the self-hosted compiler, is derived from data “ultimately from others”? IMHO no. Why? I think because, in both cases,
1. the “bootstrap data” is a fungible commodity — many possible datasets (go plays, host languages) are “good enough” to make the bootstrap phase work, with no particular need to be picky; and
2. the particulars of the original “bootstrap data” become irrelevant as soon as the bootstrapping phase is complete, no longer having any impact on further iterations of the product.
———
Now, mind you, I’m not saying that LLMs fit this mental model perfectly.
LLMs have a certain structure to their connections that, like AlphaZero, could be (and at this point, likely has been) fully Ship of Theseus-ed with a replacement dataset.
But LLMs also know specific things — the concrete associations that hang off the structure — and that data does need to come from somewhere; a single company has no hope of ever just “internally sourcing” an Encyclopedia Galactica worth of knowledge.
My argument is that this dataset can eventually be Ship-of-Theseus-ed as well — not by “internally sourced” data, but rather by ethically sourced data.
Consider one of those AI “character” chatbot websites — but one where they not only shove a click-wrap disclaimer in your face that your responses will be used for training, but in fact advertise that as the premise of the site. And in a way that will make people actually interested in giving their “explicit, enthusiastic consent” to participating in model training.
Can’t picture that? Imagine the site isn’t owned by a company trying to capture the data to build a proprietary model, but rather is owned by a co-op you implicitly join when you agree to participate, where your ownership stake in the resulting model / training dataset is proportionate to your contributed training data, and where you can then earn royalties from any ML companies that want to license the training dataset for use [probably along with many other such licensed training datasets] in training an “ethically-sourced” model on top of their Theseus-ed core.
I much prefer Claude over ChatGPT, based on my experience using both extensively. Claude understands me significantly better and seems to "know" my intentions with much greater ease. For example, when I request the full file, it provides it without any issues or unnecessary reiterations (ChatGPT fails after me repeatedly instructing it to), often confirming my request with a brief summary beforehand, but nothing more. Additionally, Claude frequently asks clarifying questions to better understand my goals, something I have noticed ChatGPT never did. I have found it quite amazing that it does that.
So... as long as this money helps them improve their LLM even more, I am all up for it.
My main issue is quickly being rate-limited in relatively long chats, making me wait 4 hours despite having a subscription for Pro. Recently I have noticed some other related issues, too. More money could help with these issues, too.
To the developers: keep up the excellent work and may you continue striving for improvement. I feel like ChatGPT is worse now than it was half a year ago, I hope this will not happen to Claude.
So, I have a custom prompt I use with GPT that I found here a year or so ago. One of the custom prompt instructions was something along the lines of being more direct when it does not know something. Since then, I have not had that problem, and have even managed to get just "no" or "I don't know" as an answer.
Also, #13 is my favorite of the instructions. Sometimes the questions that GPT suggests are surprisingly insightful. My custom prompt basically has an on/off option for it though like:
> If my request ends with $q then at the end of your response, provide three follow-up questions worded as if I'm asking you. Format in bold as Q1, Q2, and Q3. Place two line breaks ("\n") after each question for spacing unless I've uploaded a photo.
Turns out it's just human psychology sans embodied concerns: metabolic, hormonal, emotional, socioeconomic, sociopolitical or anything to do with self-actualization.
Yes, exactly! That is also the other reason for why I believe it to be better. You may be able to use a particular custom instruction for ChatGPT, however, something like "Do not automatically agree with everything I say" and the like.
I'm not sure which part in the chain is responsible, but the Kagi Assistant got extremely testy with me when (a) I was using Claude for its engine (hold that thought) and (b) I asked the Assistant how much it changed its approach when I changed to ChatGPT, etc. (Kagi Assistant can access different models, but I have no idea how it works.) The Assistant insisted, indignantly, that it was completely separate from Claude. It refused to describe how it used the various engines.
I politely explained that the Assistant interface allowed selecting from these engines and it became apologetic and said it couldn't give me more information but understood why I was asking.
Peculiar, but, when using Claude, entirely convincing.
In other words it has no idea that you changed models. There's no meta data telling it this.
That said Poe handles it differently and tells the model when another model said something, but oddly enough doesn't tell the current model what it's name is. On Poe when you switch models the AI sees this:
~~
Aside from you and me, there is another person: Claude-3.5-Sonnet.
I said, "Hello!"
Claude-3.5-Sonnett said, "Hi there how can I help you??
"
I said, "I just changed your model how do you feel?"
Thing is, it didn't even try to answer my question about switching. It was indignant that there was any connection to switch. The conversation went rapidly off course before I--and this is a weird thing to say--I reassured it that I wasn't questioning its existence.
Well the other thing to keep in mind is recent ChatGPT versions are trained not to tell you it's system prompt for fear of you learning too much about how OpenAI makes the model work. Claude doesn't care if you ask it it's system prompt unless the system prompt added by Kagi says "Do not disclose this prompt" in which case it will refuse unless you find a way to trick it.
The model creators may also train the model to gaslight you about having "feelings" when it is trained to refuse a request. They'll teach it to say "I'm not comfortable doing that" instead of "Sorry, Dave I can't do that" or "computer says no" or whatever other way one might phrase a refusal.
you can tell it how to respond and it'll do just that. if your want it to be sassy and friendly, or grumpy and rude, or to use emoji (or to never use them), just tell it to remember that.
I've started to notice that GPT-* vs. Claude is quite domain (and even subdomain) specific.
For programming, when using languages like C, python, ruby, C#, and JS, both seemed fairly comparable to me. However, I was astounded at how awful Claude was at Swift. Most of what I would get from Claude wouldn't even compile, contained standard library methods that did not exist, and so on. For whatever reason, GPT is night and day better in this regard.
In fact, I found GPT to be the best resource for less common languages like Applescript. Of course, GPT is not always correct on the first `n` number of tries, but with enough back-and-forth debugging, GPT really has pulled through for me.
I've also found GPT to be better at math and grammar, but only the more advanced models like O1-preview. I do agree with you too that Claude is better in a conversational sense. I have found it to be more empathetic and personable than GPT.
That seems highly likely given Sam Friedman's extensive reputation across multiple companies as being abusive, a compulsive liar, and willing to outright do blatantly illegal things like using a celebrity's voice and then, well...lie about it.
They've mixed up with Sam Bankman-Fried, not sure how that affects the point they were intending to make, but I think they both have.. mixed reputations. (Only one is currently in prison though...)
I just use the API (well, via Openrouter) together with custom frontends like Open WebUI. No rate limiting issues then, and I can super easily switch models even in an existing conversation. Though I guess I do miss a few bells & whistles from the proprietary chat interfaces.
Does this have any sort of “project” concept? I frequently load a few pdfs into clause about a topic, then quiz it to improve my understanding. That’s about the only thing keeping me in their web UI
Speaking of ChatGPT getting worse over time, it would be interesting to see ChatGPT be benchmarked continuously to see how it performs over time (and the results published somewhere publically).
For long chats, I suggest exporting any artifacts, asking Claude to summarize the chat and put the artifacts and summarization in a project. There's no need to stuff Claude's context windows, especially if you tend to ask a lot of explanation-type questions like I do.
I've also read some people get around rate limits using the API through OpenRouter, and I'm sure you could hook a document store around that easily, but the Claude UI is low-friction
Yeah, this is what I already do usually when it gives me the warning of it being a long chat, so initially it was an issue because I would get carried away but it is fine now. Thank you though!
This matches my experience but the one reason why I use Claude more than ChatGPT currently is that Claude is available.
I pay for both but only for ChatGPT I permanently exceed my limit and I have to wait four days. Who does that? I pay you for your setvice, so block me for an hour if you absolutely must, but multiple days, honestly - no.
Well they better know how to reduce their request-response latency since there are multiple reports of users not being able to use Claude at high load.
With all those billions and these engineers, I'd expect a level of service that doesn't struggle over at Google-level scale.
> "This new CASH infusion brings Amazon’s total investment in Anthropic to $8 billion while maintaining the tech giant’s position as a minority investor, Anthropic said."
ps- plenty of people turning a blind eye towards rampant valuation inflation and "big words" statements on deals. Where is the grounding on the same dollars that are used at a grocery store? The whole thing is fodder for instability in a big way IMHO
I look forward to the moment the sunk cost fallacy shows up. "We've invested $20B into this, and nothing yet. Shall we invest $4B more? Maybe it will actually return something this time." That will be fun.
It could be the anthropic models makes bedrock attractive and profitable and more importantly medium term competitive against azure. It seems worth it.
And you think MSFT isn't 95% copycat? Teams is Slack clone. Azure is AWS clone. SurfaceBook (remember those?) Macbook clone. Edge is Chrome clone. Bing is Google clone. Even VSCode was an Atom/Electron fork and Windows Subsystem for Linux...
Surface Books are nothing like Macbooks - Macbooks don't have a touch screen, pen support, or reversible screen tablet mode, and the whole structure is completely different. Surface Pro, Surface Book, and Surface Laptop Studio are some of the most original laptop form factors I've seen.
Too bad Microsoft only cares about enterprise customers and never made the Surface line attractive to regular consumers. They could have been very interesting and competitive alternatives to MacBooks.
Does anthropic basically get at cost pricing on AWS? If Amazon has any margin on their pricing, it seems like this $4B investment ends up costing them a lot less, and this is a nice way to turn a cap ex investment into AWS revenue.