Mistral AI Valued at $2B

mark_l_watson · on Dec 10, 2023

There is a lot of hype around LLMs, but (BUT!) Mistral well deserves the hype. I use their original 7B model, as well as some derived models, all the time. I can’t wait to see what they release next (which I expect to be a commercial product, although the MoE model set they just released is free).

Another company worthy of some hype is 01.AI which released their Yi-34B model. I have been running Yi locally on my Mac (use “ ollama run yi:34b”) and it is amazing.

Hype away Mistral and 01.AI, hype away…

p1esk · on Dec 10, 2023

How do these small models compare to gpt4 for coding and technical questions?

I noticed that gpt3.5 is practically useless to me (either wrong or too generic), while gpt4 provides a decent answer 80% of the time.

modeless · on Dec 10, 2023

They are not close to GPT-4. Yet. But the rate of improvement is higher than I expected. I think there will be open source models at GPT-4 level that can run on consumer GPUs within a year or two. Possibly requiring some new techniques that haven't been invented yet. The rate of adoption of new techniques that work is incredibly fast.

Of course, GPT-5 is expected soon, so there's a moving target. And I can't see myself using GPT-4 much after GPT-5 is available, if it represents a significant improvement. We are quite far from "good enough".

vitorgrs · on Dec 10, 2023

I believe one of the problems that OSS models need to solve, is... dataset. All of them lack a good and large dataset.

And this is most noticiable if you ask anything that is not in English-American-ish.

mattigames · on Dec 11, 2023

Maybe it should be an independent model in charge only of converting your question to American English and back, instead of trying to make a single model speak all languages

vitorgrs · on Dec 12, 2023

I don't think this is a good idea. A good model if we are really aiming at anything that resembles AGI (or even a good LLM like GPT4) is a model that have world knowledge. The world is not just English.

jitl · on Dec 11, 2023

There’s a lot of world knowledge that is just not present in an American English corpus. For example knowledge of world cuisine & culture. There’s precious few good English sources on Sichuan cooking.

0xDEF · on Dec 10, 2023

>I think there will be open source models at GPT-4 level that can run on consumer GPUs within a year or two.

There is indeed already open source models rivaling ChatGPT-3.5 but GPT-4 is an order of magnitude better.

The sentiment that GPT-4 is going to be surpassed by open source models soon is something I only notice on HN. Makes me suspect people here haven't really tried the actual GPT-4 but instead the various scammy services like Bing that claim they are using GPT-4 under the hood when they are clearly not.

rmbyrro · on Dec 10, 2023

Makes me suspect you don't follow HN user base very closely.

refulgentis · on Dec 10, 2023

You're 100% right and I apologize that you're getting downvoted, in solidarity I will eat downvotes with you.

HNs funny right now because LLMs are all over the front page constantly, but there's a lot of HN "I am an expert because I read comments sections" type behavior. So many not even wrong comments that start from "I know LLaMa is local and C++ is a programming language and I know LLaMa.cpp is on GitHub and software improves and I've heard of Mistral."

p1esk · on Dec 10, 2023

I’m both excited and scared to think about this “significant improvement” over GPT-4.

It can make our jobs a lot easier or it can take our jobs.

__loam · on Dec 10, 2023

LLMs are going to spit out a lot of broken shit that needs fixing. They're great at small context work but full applications require more than they're capable of imo.

p1esk · on Dec 11, 2023

Even if so, the next gen model will fix it.

__loam · on Dec 11, 2023

Hey, I doubt it.

rmbyrro · on Dec 10, 2023

I expect the demand for SWE to grow faster than productivity gains.

Der_Einzige · on Dec 11, 2023

The idea that demand scales to fill supply doesn’t work when supply becomes effectively infinite. Induction from the past is likely wrong in this case

rmbyrro · on Dec 12, 2023

I don't see the current tech making supply infinite. Not even close.

Maybe a more advanced type of model they'll invent in the next years. Who knows... But GPT-like models? Nah, they won't write useful code applicable in prod without supervision by an experient engineer.

stavros · on Dec 10, 2023

Isn't that the same? At some point, your job becomes so easy that anyone can do it.

Spivak · on Dec 10, 2023

It's weird for programmers to be worried about getting automated out of a job when my job as a programmer is basically to try as hard as I can to automate myself out of a job.

Der_Einzige · on Dec 11, 2023

You’re supposed to automate yourself out but not tell anyone. Didn’t you see that old simpsons episode from the 90s about the self driving trucks? The drivers rightfully STFU about their innovation and cashed in on great work life balance and Homer ruined it by blabbered about it to everyone, causing the drivers to try to go after him.

We are trying to keep SWE salaries up, and lowering the barrier to entry will drop them.

OfSanguineFire · on Dec 10, 2023

Curious thought: at some point a competitor’s AI might become so advanced, you can just ask it to tell you how to create your own, analogous system. Easier than trying to catch up on your own. Corporations will have to include their own trade secrets among the things that AIs aren’t presently allowed to talk about like medical issues or sex.

rmbyrro · on Dec 10, 2023

It might work for fine-tuning an open model to a narrow use case.

But creating a base model is out of reach. You need an order of probably hundreds of millions of $$ (if not billion) to get close to GPT 4.

Der_Einzige · on Dec 11, 2023

Model merging is easy, and a unique model merge may be hard to replicate if you don’t know the original recipe.

Model merging can create truly unique models. Love to see shit from ghost in the shell turn into real life

Yes training a new model from scratch is expensive, but creating a new model that can’t be replicated by fine tuning is easy

Xenoamorphous · on Dec 10, 2023

As someone who doesn’t know much about how these models work or are created I’d love to see some kind of breakdown that shows what % of the power of GPT4 is due to how it’s modelled (layers or whatever) vs training data and the computing resources associated with it.

mensetmanusman · on Dec 11, 2023

This isn't precisely knowable now, but it might be something academics figure out years from now. Of course, first principles of 'garbage in garbage out' would put data integrity very high, the LLM code itself is supposedly not even 100k lines of code, and the HW is crazy advanced.

so the ordering is probably data, HW, LLM model

This also fits the general ordering of

data = all human knowledge HW = integrated complexity of most technologists LLM = small team

Still requires the small team to figure out what to do with the first two, but it only happened now because the HW is good enough.

LLMs would have been invented by Turing and Shannon et al. almost certainly nearly 100 years ago if they had access to the first two.

MacsHeadroom · on Dec 11, 2023

By % of cost it's 99.9% compute cost and 0.01% data costs.

In terms of "secret sauce" it's 95% data quality and 5% architectural choices.

taneq · on Dec 10, 2023

That’s true now, but maybe GPT6 will be able to tell you how to build GPT7 on an old laptop, and you’ll be able to summon GPT8 with a toothpick and three cc’s of mouse blood.

p1esk · on Dec 10, 2023

How to create my own LLM?

Step 1: get a billion dollars.

That’s your main trade secret.

chongli · on Dec 10, 2023

What is inherent about AIs that requires spending a billion dollars?

Humans learn a lot of things from very little input. Seems to me there's no reason, in principle, that AIs could not do the same. We just haven't figured out how to build them yet.

What we have right now, with LLMs, is a very crude brute-force method. That suggests to me that we really don't understand how cognition works, and much of this brute computation is actually unnecessary.

michaelt · on Dec 10, 2023

Maybe not $1 billion, but you'd want quite a few million.

According to [1] a 70B model needs $1.7 million of GPU time.

And when you spend that - you don't know if your model will be a damp squib like Bard's original release. Or if you've scraped the wrong stuff from the internet, and you'll get shitty results because you didn't train on a million pirated ebooks. Or if your competitors have a multimodal model, and you really ought to be training on images too.

So you'd want to be ready to spend $1.7 million more than once.

You'll also probably want $$$$ to pay a bunch of humans to choose between responses for human feedback to fine-tune the results. And you can't use the cheapest workers for that, if you need great english language skills and want them to evaluate long responses.

And if you become successful, maybe you'll also want $$$$ for lawyers after you trained on all those pirated ebooks.

And of course you'll need employees - the kind of employees who are very much in demand right now.

You might not need billions, but $10M would be a shoestring budget.

[1] https://twitter.com/moinnadeem/status/1681371166999707648

chongli · on Dec 10, 2023

And when you spend that - you don't know if your model will be a damp squib like Bard's original release. Or if you've scraped the wrong stuff from the internet, and you'll get shitty results because you didn't train on a million pirated ebooks.

This just screams to me that we don’t have a clue what we’re doing. We know how to build various model architectures and train them, but if we can’t even roughly predict how they’ll perform then that really says a lot about our lack of understanding.

Most of the people replying to my original comment seem to have dropped the “in principle” qualifier when interpreting my remarks. That’s quite frustrating because it changes the whole meaning of my comment. I think the answer is that there isn’t anything in principle stopping us from cheaply training powerful AIs. We just don’t know how to do it at this point.

pixl97 · on Dec 10, 2023

>Humans learn a lot of things from very little input

And also takes 8 hours of sleep per day, and are mostly worthless for the first 18 years. Oh, also they may tell you to fuck off while they go on a 3000 mile nature walk for 2 years because they like the idea of free love better.

Knowing how birds fly ready doesn't make a useful aircraft that can carry 50 tons of supplies, or one that can go over the speed of sound.

This is the power of machines and bacteria. Throwing massive numbers at the problem. Being able to solve problems of cognition by throwing 1GW of power at it will absolutely solve the problem of how our brain does it with 20 watts in a faster period of time.

nemothekid · on Dec 10, 2023

If we knew how to build humans for cheap, then it wouldn't require spending a billion dollars. Your reasoning is circular.

It's precisely because we don't know how to build these LLMs cheaply that one must so spend so much money to build them.

chongli · on Dec 10, 2023

The point is that it's not inherently necessary to spend a billion dollars. We just haven't figured it out yet, and it's not due to trade secrets.

Transistors used to cost a billion times more than they do now [1]. Do you have any reason to suspect AIs to be different?

[1] https://spectrum.ieee.org/how-much-did-early-transistors-cos...

jryle70 · on Dec 10, 2023

> Transistors used to cost a billion times more than they do now

However you would still need billions of dollars if you want state of the art chips today, say 3nm.

Similarly, LLM may at some point not require a billion dollars, you may be able to get one, on par or surpass GPT4, easily for cheap. The state of the art AI will still require substantial investment.

janalsncm · on Dec 10, 2023

Because that billion dollars gets you the R&D to know how to do it?

The original point was that an “AI” might become so advanced that it would be able to describe how to create a brain on a chip. This is flawed for two main reasons.

1. The models we have today aren’t able to do this. We are able to model existing patterns fairly well but making new discoveries is still out of reach.

2. Any company capable of creating a model which had singularity-like properties would discover them first, simply by virtue of the fact that they have first access. Then they would use their superior resources to write the algorithm and train the next-gen model before you even procured your first H100.

jlokier · on Dec 11, 2023

I agree about training time, but bear in mind LLMs like GPT4 and Mistral also have noisy recall of vastly more written knowledge than any human can read in their lifetime, and this is one of the features people like about LLMs.

You can't replace those types of LLM with a human, the same way you can't replace Google Search (or GitHub Search) with a human.

Acquiring and preparing that data may end up being the most expensive part.

janalsncm · on Dec 10, 2023

The limiting factor isn’t knowledge of how to do it, it is GPU access and RLHF training data.

CSMastermind · on Dec 10, 2023

Mistral's latest just released model is well below GPT-3 out of the box. I've seen people speculate that with fine-tuning and RLHF you could get GPT-3 like performance out of it but it's still too early to tell.

I'm in agreement with you, I've been following this field for a decade now and GPT-4 did seem to cross a magical threshold for me where it was finally good enough to not just be a curiosity but a real tool. I try to test every new model I can get my hands on and it remains the only one to cross that admittedly subjective threshold for me.

espadrine · on Dec 10, 2023

> Mistral's latest just released model is well below GPT-3 out of the box

The early information I see implies it is above. Mind you, that is mostly because GPT-3 was comparatively low: for instance its 5-shot MMLU score was 43.9%, while Llama2 70B 5-shot was 68.9%[0]. Early benchmarks[1] give Mixtral scores above Llama2 70B on MMLU (and other benchmarks), thus transitively, it seems likely to be above GPT-3.

Of course, GPT-3.5 has a 5-shot score of 70, and it is unclear yet whether Mixtral is above or below, and clearly it is below GPT-4’s 86.5. The dust needs to settle, and the official inference code needs to be released, before there is certainty on its exact strength.

(It is also a base model, not a chat finetune; I see a lot of people saying it is worse, simply because they interact with it as if it was a chatbot.)

[0]: https://paperswithcode.com/sota/multi-task-language-understa...

[1]: https://github.com/open-compass/MixtralKit#comparison-with-o...

brucethemoose2 · on Dec 10, 2023

Have you played with finetunes, like Cybertron? Augmented in wrappers and retrievers like GPT is?

It's not there yet, but its waaaay closer than the plain Mistral chat release.

rmbyrro · on Dec 10, 2023

Still, for a 7B model, this is quite impressive.

apantel · on Dec 11, 2023

One thing people should keep in mind when reading others’ comments about how good an LLM is at coding, is that the capability of the model will vary depending on the programming language. GPT-4 is phenomenal at Java because it probably ate an absolutely enormous amount of Java in training. Also, Java is a well-managed language with good backwards-compatibility, so patterns in code written at different times are likely to be compatible with each other. Finally, Java has been designed so that it is hard for the programmer to make mistakes. GPT-4 is great for Java because Java is great for GPT-4: it provides what the LLM needs to be great.

valval · on Dec 10, 2023

Open source models will probably catch up at the same rate as open source search engines have caught up to Google search.

idonotknowwhy · on Dec 10, 2023

If you can run yi34b, you can run phind-codellama. It's much better than yi and mistral for code questions. I use it daily. More useful than gpt3 for coding, not as good as gpt4, except that I can copy and paste secrets into it without sending them to openai.

mark_l_watson · on Dec 10, 2023

Thanks, I will give codellama a try.

sharemywin · on Dec 10, 2023

what types of things do you ask ChatGPT to do for you regarding coding?

yodsanklai · on Dec 10, 2023

Typically a few lines snippets that would require me a few minutes of thinking but that ChatGPT will provide immediately. It often works, but there are setbacks. For instance, if I'm lazy and don't very carefully check the code, it can produce bugs and cancel the benefits.

It can be useful, but I can see how it'll generate a class of lazy coders who can't think by themselves and just try to get the answer from ChatGPT. An amplified Stack Overflow syndrome.

dmos62 · on Dec 10, 2023

How do you use these models? If you don't mind sharing. I use GPT-4 as an alternative to googling, haven't yet found a reason to switch to something else. I'll for example use it to learn about the history, architecture, cultural context, etc of a place when I'm visiting. I've found it very ergonomic for that.

davidkunz · on Dec 10, 2023

I use them in my editor with my plugin https://github.com/David-Kunz/gen.nvim

3abiton · on Dec 10, 2023

Interesting use case, but the issue is wasting all this compute energy for prediction?

HorizonXP · on Dec 10, 2023

Can you explain what you mean by this question?

gdiamos · on Dec 10, 2023

I host them here: https://app.lamini.ai/playground

You can play with them, tune them, and download the weights

It isn’t exactly the same as open source because weights != source code, but it is close in the sense that it is editable

IMO we just don’t have great tools for editing LLMs like we do for code, but they are getting better

Prompt engineering, RAG, and finetuning/tuning are effective for editing LLMs. They are getting easier and better tooling is starting to emerge

teaearlgraycold · on Dec 10, 2023

I’ve use lm studio. It’s not reached peak user friendliness, but it’s a nice enough GUI. You’ll need to fiddle with resource allocation settings and select an optimally quantized model for best performance. But you can do all that in the UI.

loufe · on Dec 10, 2023

If you want to experiment Kobold.cpp is a great interface and goes a long distance to guarantee backwards compatibility of outdated model formats.

risho · on Dec 10, 2023

lm studio is an accessible simple way to use them. that said expecting them to be anywhere near as good as gpt-4 is going to lead to disappointment.

jay-barronville · on Dec 10, 2023

You mind sharing what you find so amazing about Yi-34B? I haven’t had a chance to try it.

mark_l_watson · on Dec 10, 2023

I just installed it on my 32B Mac yesterday, first impressions: it does very well reasoning, it does very well answering general common sense world knowledge questions, and so far when it generates Python code, the code works and is well documented. I know this is just subjective, but I have been running a 30B model for a while in my Mac and Yi-34B just feels much better. With 4bit quantization, I can still run Emacs, terminal windows and a web browser with a few tabs without seeing much page faulting. Anyway, please try it and share a second opinion.

brucethemoose2 · on Dec 10, 2023

The 200K finetunes are also quite good at understanding their huge context.

brucethemoose2 · on Dec 10, 2023

I concur, Yi 34B and Mistral 7B are fantastic.

But you need to run the top Yi finetunes instead of the vanilla chat model. They are far better. I would recommend Xaboros/Cybertron, or my own merge of several models on huggingface if you want the long context Yi.

yodsanklai · on Dec 10, 2023

> I use their original 7B model, as well as some derived models, all the time.

How does it compare to other models? and with chatgpt in particular?

valval · on Dec 10, 2023

No comparison to be made.

minimaxir · on Dec 10, 2023

Of course, the reason Mistral AI got a lot of press and publicity in the first place was because they open-sourced Mistral-7B despite the not-making-money-in-the-short-term aspect of it.

It's better for the AI ecosystem as a whole to incentive AI startups to make a business through good and open software instead of building moats and lock-in ecosystems.

sillysaurusx · on Dec 10, 2023

I don’t think that counts as open source. They didn’t share any details about their training, making it basically impossible to replicate.

It’s more akin to a SaaS company releasing a compiled binary that usually runs on their server. Better than nothing, but not exactly in the spirit of open source.

This doesn’t seem like a pedantic distinction, but I suppose it’s up to the community to agree or disagree.

minimaxir · on Dec 10, 2023

It's IMO a pedantic distinction.

A compiled binary is a bad metaphor because it gives the implication that Mistral-7B is an as-is WYSIWIG project that's not easily modifiable. In contrast, there have been a bunch of new powerful new models created by modifying or finetuning Mistral-7B such as Zephyr-7B: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta

The better analogy to Mistral-7B is something like modding Minecraft or Skyrim: although those games are closed source themselves, it has enabled innovations which helps the open-source community directly.

It would be nice to have fully open-source methodologies but lacking them isn't an inherent disqualifier.

hedgehog · on Dec 10, 2023

It's a big distinction, if I want to tinker with the model architecture I essentially can't because the training pipeline is not public.

minimaxir · on Dec 10, 2023

If you want to tinker with the architecture Hugging Face has a FOSS implementation in transformers: https://github.com/huggingface/transformers/blob/main/src/tr...

If you want to reproduce the training pipeline, you couldn't do that even if you wanted to because you don't have access to thousands of A100s.

hedgehog · on Dec 10, 2023

I'm well aware of the many open source architectures, and the point stands. Models like GPT-J have open code and data, and that allows using them as a baseline for architecture experiments in a way that Mistral's models can't be. Mistral publishes weights and code, but not the training procedure or data. Not open.

sillysaurusx · on Dec 10, 2023

We do, via TRC. Eleuther does too. I think it’s a bad idea to have a fatalistic attitude towards model reproduction.

hedgehog · on Dec 10, 2023

Exactly, nice work BTW. And no hate for Mistral, they're doing great work, but let's not confuse weights-available with fully open models.

emadm · on Dec 10, 2023

With all the new national supercomputers scale isn’t really going to be an issue, they all want large language models on 10k GH200s or whatever and the libraries are getting easier to use

mrob · on Dec 10, 2023

According to the Free Software Definition:

"Source code is defined as the preferred form of the program for making changes in. Thus, whatever form a developer changes to develop the program is the source code of that developer's version."

According to the Open Source Definition:

"The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed."

LLM models are usually modified by changing the model weights directly, instead of retraining the model from scratch. LLM weights are poorly understood, but this is an unavoidable side effect of the development methodology, not deliberate obfuscation. "Intermediate" implies a form must undergo further processing before it can be used, but LLM weights are typically used directly. LLMs did not exist when these definitions were written, so they aren't a perfect fit for the terminology used, but there's a reasonable argument to be made that LLM weights can qualify as "source code".

lmm · on Dec 10, 2023

> LLM models are usually modified by changing the model weights directly, instead of retraining the model from scratch. LLM weights are poorly understood, but this is an unavoidable side effect of the development methodology, not deliberate obfuscation.

They're understood based on knowing the training process though, and a developer working on them would want to have the option of doing a partial or full retraining where warranted.

jeron · on Dec 10, 2023

They ought to rename to “ReallyOpenAI”

seydor · on Dec 10, 2023

also because their model is unconstrained/censored. and they are commited to that according to what they say, they build it so others can build on it. GPTs are not finished business and hopefully the open source community with surpass the early successes.

JonChesterfield · on Dec 10, 2023

Anyone else think Nvidia giving companies money to spend on Nvidia hardware at very high profit margin is a dubious valuation scheme?

mcmcmc · on Dec 10, 2023

Kinda like MS giving OpenAI all those Azure credits?

SeanAnderson · on Dec 10, 2023

Why would it be a dubious valuation scheme? I guess if an investor is looking at just revenue, or only looking at one area of their business finances, maybe? Otherwise it seems like the loss in funds would be weighed against the increase in revenue and wouldn't distort earnings.

JonChesterfield · on Dec 10, 2023

Say big green gives a company $100M with the rider that it needs to spend all that on nvidia's hardware in exchange for 10% of the company.

Has Nvidia valued the company at 1B? Say their margin is 80% on the sales. So Nvidia has lost some cashflow and $20M for that 10%. Has Nvidia valued the company at $200M?

SeanAnderson · on Dec 10, 2023

I see :) Thanks for clarifying. I would say that I don't have a strong enough grasp on biz finances to do more than speculate here, but:

1) Is all the money spent up front? Or does it trickle back in over a few years? Cash flow might be impacted more than implied, but I doubt this is much of an issue.

2) I wonder how the 10% ownership at 2B valuation would be interpreted by investors. If it's viewed as a fairly liquid investment with low risk of depreciation then yeah, I could see Nvidia's strategy being quite the way to pad numbers. OTOH, the valuation could be seen as pure marketing fluff and mostly written off by the markets until regulations and profitability are firmly in place.

wongarsu · on Dec 10, 2023

If it was a good valuation scheme, then Nvidia giving them $100 million at a $2 billion valuation would mean that Nvidia thinks the company is worth $2 billion. But if Mistral uses that money to buy GPUs that Nvidia sells with 75% profit margin, the deal is profitable for Nvidia even if they believe the company is worth only $0.5 billion (since they effectively get 75% of the investment back). And if this deal fuels the wider LLM hype and leads other companies to spend just $50 million more at Nvidia, this investment is profitable for Nvidia even if Mistral had negative value.

emadm · on Dec 10, 2023

With convertible debt and many of these rounds investors get the first money out, so the first 450m would go to the investors.

candiddevmike · on Dec 10, 2023

It's the heads I win, tails you lose investment model

raverbashing · on Dec 10, 2023

You'd be surprised how this is much more common than people realize

ThalesX · on Dec 10, 2023

My 1st thought as an European, "YAY! EU startup to the moon". My 2nd thought was "n'aww, American VC". I guess that's the best we can do around here.

jamesblonde · on Dec 10, 2023

The problem is that no European VC has that amount of capital. European VCs typically have a couple of hundred million under mgmt. SV VCs have a few billion under mgmt.

jchonphoenix · on Dec 11, 2023

Index Ventures has the money. But the truth of the matter is that even most US VCs aren't willing to shell out 2B valuations for a company with no revenue.

paulddraper · on Dec 10, 2023

It may feel that there are few EU startups and that's true.

But there are even fewer EU VCs.

ThalesX · on Dec 10, 2023

Was CTO for some European startups. I'll always remember one when by the time the EU VC was mid-way through its due dilligence for 500k seed, we already had some millions lined up from some US VCs no questions asked.

bsaul · on Dec 10, 2023

There were european VCs investing in the very first round, french one in particular. Founders are french. This qualifies as european in my book (let’s not get too demanding)

asim · on Dec 10, 2023

I have realised just how meaningless valuations now are. As much as we use them as a marker of success, you can find someone to write the higher valuation ticket when it suits their agenda too e.g the markup, the status signal, or just getting the deal done ahead of your more rationale competitors in the investment landscape. Now that's not to say Mistral isn't a valuable company or that they aren't doing good work. It's just valuation markers are meaningless and most of this capital raise in the AI space is about offsetting the cloud/GPU spend. Might get downvoted to death but watching valuation news feels like no news.

seydor · on Dec 10, 2023

It's smoke. but where there is smoke, there is some level of fire

jack_riminton · on Dec 10, 2023

Not if it's a smoke machine

z7 · on Dec 10, 2023

Mistral has a lot of potential, but there's the obvious risk that without proper monetization strategies it might not achieve sustainable profitability in the long term.

niemandhier · on Dec 10, 2023

The French have a urge to be independent, the French government will hand them some juicy contract as soon as the can provide any product that justifies that.

emadm · on Dec 10, 2023

Yeah they shouldn't worry, they'll get a big French government deal at worst

lolive · on Dec 10, 2023

One of the French tycoons will eventually buy them.

fy20 · on Dec 11, 2023

I would say most European countries have that desire. That and the fact it can easily by fine tuned to the local language could make these models very popular outside the US.

yodsanklai · on Dec 10, 2023

> The French have a urge to be independent

They lose that fight a long time ago though. It seems they don't even try to pretend anymore.

digitcatphd · on Dec 10, 2023

I was wondering this. What is their business model exactly? Almost seems like Europe’s attempt to say “hey, look, we are relevant too”

rockinghigh · on Dec 11, 2023

They charge for the API, like OpenAI.

https://docs.mistral.ai/platform/pricing/

lolive · on Dec 10, 2023

Being acquired.

polygamous_bat · on Dec 10, 2023

Coupled with the concern that once you’re charging users money for a product, you are also liable for sketchy things they do with it. Not so much when you post a torrent link on twitter that happens to have model weights.

dharma1 · on Dec 10, 2023

On their pitch deck it said they will monetise serving of their models.

While it may feel like a low moat if anyone can spin up a cloud instance with the same model, it's still a reasonable starting point. I think they will also be getting a lot of EU clients who can't/don't want to use US providers.

ukuina · on Dec 11, 2023

People forget the released version is v0.1

If the commerically-served model has improved capability and is exclusive to Mistral's service, there is a possible moat there.

dharma1 · on Dec 11, 2023

they seem pretty committed to open-source AI (from interviews I've heard with the founders) - but maybe if they manage to train models with truly amazing capabilities somewhere down the line, they will keep some closed source

nothrowaways · on Dec 10, 2023

Nothing stops them from launching a chat app.

quickthrower2 · on Dec 10, 2023

The old open source, but we'll host it for you? I think Bezos is going to be in fits of evil laughter about that model in 5 years, as all the open source compute moves to the clouds, with dollars flowing his way.

But one thing Mistral could do is have a free foundational model, and have non-free (as in beer, as in speech) "pro" models. I think they will have to.

dartos · on Dec 10, 2023

Release small, open, foundational models.

Deploy larger, fine tuned variants and charge for them.

There’s a reason we don’t have the data set or original training scripts for mistral

behnamoh · on Dec 10, 2023

it’s a “mistry” ;)

simonw · on Dec 10, 2023

There are huge economy of scale benefits from providing hosted models.

I've been trying out all sorts of open models, and some of them are really impressive - but for my deployed web apps I'm currently sticking with OpenAI, because the performance and price I get from their API is generally much better than I can get for open models.

If Mistral offered a hosted version which didn't have any spin-up time and was price competitive with OpenAI I would be much more likely to build against their models.

quickthrower2 · on Dec 10, 2023

This only is defensible for closed models though.

teekert · on Dec 10, 2023

Here's to hoping such models run on dedicated chips locally, on Phones and PCs etc...

emadm · on Dec 10, 2023

They already do, we just released a model equivalent to most 40-60b base models that runs on a MacBook Air no problem.

It's like 1.6gb, ones coming are better and smaller https://x.com/EMostaque/status/1732912442282312099?s=20

I think the large language model paradigm is pretty much done as we move to satisficing tbh

echelon · on Dec 10, 2023

Zero moat. Everybody's doing it.

I suppose they could be the Google to everyone else's Yahoo and Dogpile, but I expect that to be a hard game to play these days.

skue · on Dec 10, 2023

At this valuation and given the strength of the team, it’s not hard to imagine a future acquisition yielding a significant ROI.

Besides, we don’t know what future opportunities will unfold for these technologies. Clearly there’s no shortage of smart investors happy to place bets on that uncertainty.

jsemrau · on Dec 10, 2023

Model-as-a-service should work just fine.

stillwithit · on Dec 10, 2023

Wait what? If company don’t make $ it don’t survive?

HN could really elevate the discourse if they flagged the submarine ads of VCs

minimaxir · on Dec 10, 2023

It is a relevant question in the AI industry specifically due to new concerns about ROI given the intense compute costs.

lolive · on Dec 10, 2023

Same concern I have regarding Spotify. [Which seems to have insane recurring costs. Plus some risky expansive strategic moves]

I_am_tiberius · on Dec 10, 2023

I really hope that a European startup can successfully compete with the major companies. I do not want to see privacy violations, such as OpenAI's default use of user prompts for training, become standard practice.

quickthrower2 · on Dec 10, 2023

Does Anthropic count as European?

uxp8u61q · on Dec 10, 2023

How on Earth would it count as European? It's a completely American company. Founded in the US, by Americans, headquartered in the US, funded by American VCs... I genuinely don't get how you arrived at the idea that it's European.

quickthrower2 · on Dec 10, 2023

Big office and lots of jobs in UK. And with complex tax setups these days I wasn’t sure.

uxp8u61q · on Dec 10, 2023

By that measure I guess Apple is Irish...?!

totolouis · on Dec 10, 2023

UK is not in the Europe anymore.

denlekke · on Dec 10, 2023

maybe not the distinction you meant but the UK is still in Europe (the continent) and to me, European is a word based on location not membership of the European Union (which the UK left)

baal80spam · on Dec 10, 2023

Interesting, TIL.

quickthrower2 · on Dec 10, 2023

They cut through the continental shelf as part of Brexit.

htrp · on Dec 10, 2023

Dario is italian-american?

pb7 · on Dec 10, 2023

Elon is South African but that doesn't make Tesla a South African company.

quickthrower2 · on Dec 10, 2023

That doesn't matter too much, the corporate structure is more interesting.

nbzso · on Dec 10, 2023

The old Masters have a saying: Never fall in love with your creation. The AI industry is falling into the trap of their own making (marketing). LLM's are nice toys, but implementation is resource/energy expensive and murky at best. There are a lot of real life problems that would be solved trough rational approach. If someone is thirsty, the water is the most important part, not the type of glass:)

TeMPOraL · on Dec 10, 2023

If you compared the efficiency of steam engines during industrial revolution with the ones used today, or power generation from 100 years ago to that of now, or between just about any chemical process, manufacturing method or agricultural technique at its invention and now, you'd be amazed by the difference. In some cases, the activity of today was several orders of magnitude more wasteful just 100 years ago.

Or, I guess look at how size, energy use and speed of computer hardware evolved over the past 70 years. Point is, implementation being, right now, "resource/energy expensive and murky at best" is how many very powerful inventions look at the beginning.

> If someone is thirsty, the water is the most important part, not the type of glass:)

Sure, except here, we're talking about one group selling a glass imbued with breakthrough nanotech, allowing it to keep the water at desired temperature indefinitely, and continuously refill itself by sucking moisture out of the air. Sometimes, the type glass may really matter, and then it's not surprising many groups strive to be able to produce it.

nbzso · on Dec 10, 2023

Don't fall in love with your creation, is not stop creating.

https://www.cell.com/joule/fulltext/S2542-4351(23)00365-3

quickthrower2 · on Dec 10, 2023

What is the business model?

malermeister · on Dec 10, 2023

Get the French government to throw a ton of money at you for sovereignty reasons

hnarayanan · on Dec 10, 2023

quickthrower2 · on Dec 10, 2023

Sorry I forgot, in AI $2Bn is preseed

simonebrunozzi · on Dec 11, 2023

Let me say this. Whoever is going to be able to let "normal" Mac users to install and run a local copy of an LLM, is going to reap tons of commercial benefits. (e.g. DMG, click-install, run. No command line).

It is nuts to me that we have 100M computers capable of running LLMs properly, and yet only a tiny fraction of them does.

Heck, let us do p2p, and lend our computing power to others.

Let us build a personalized LLM.

This is, IMHO, a really interesting path forward. It seems no one is doing it.

abrichr · on Dec 12, 2023

> Whoever is going to be able to let "normal" Mac users to install and run a local copy of an LLM, is going to reap tons of commercial benefits. (e.g. DMG, click-install, run. No command line).

https://gpt4all.io

https://ollama.ai

nojvek · on Dec 10, 2023

Gotta give it to Nvidia and TSMC. In the big AI race, they’re the ones with real moat and no serious competition.

No matter who wins, they’ll need those sweet GPUs and fabs.

Yujf · on Dec 10, 2023

Its the good old "in a gold rush, sell shovels"

gnabgib · on Dec 11, 2023

Previously: OpenAI Rival Mistral Nears $2B Valuation with Andreessen Horowitz Backing (6 days ago, 2 points, 0 comments)[0], OpenAI Rival Mistral Nears $2B Valuation with Andreessen Horowitz Backing (5 days ago, 9 points, 1 comment)[1], French AI startup Mistral secures €2B valuation (2 days ago, 106 points, 74 comments)[2], Mistral, French A.I. Startup, Is Valued at $2B in Funding Round (6 hours ago, 15 points, 1 comment)[3]

[0]: https://news.ycombinator.com/item?id=38522873 [1]: https://news.ycombinator.com/item?id=38533725 [2]: https://news.ycombinator.com/item?id=38580758 [3]: https://news.ycombinator.com/item?id=38593526

yodsanklai · on Dec 10, 2023

Noob questions (I don't know anything about LLM, I'm just a casual user of ChatGPT)

- is what Mistral does better than Meta or OpenAI?

- will LLM become eventually open-source commodities with little room for innovation or shall we expect to see a company with a competitive advantage that will make it the new Google? in other words, how much better can we expect these LLM to be in the future? should we expect significant progress or have we reached to diminished returns (after all, this is only statistical prediction of next word, maybe there's an intrinsic limitation of this method)

- are there some sorts of benchmarks to compare all these new models?

airspresso · on Dec 10, 2023

Too many superlatives and groundbreaking miracles reported. Probably written by AI.

jay-barronville · on Dec 10, 2023

> In a significant development for the European artificial intelligence sector, Paris-based startup Mistral AI has achieved a noteworthy milestone. The company has successfully secured a substantial investment of €450 million, propelling its valuation to an impressive $2 billion.

I’m cracking up. I don’t need to be a rocket scientist to read this and immediately conclude it’s AI-generated. I mean, they didn’t even try to hide that. Haha.

fidotron · on Dec 10, 2023

There is a lot of noise here suggesting it is too much, but relative to the supposed SV unicorns of two years ago this looks like an absolute steal.

yreg · on Dec 10, 2023

The macroeconomic situation 2 years ago and now was wildly different.

transformi · on Dec 10, 2023

Evaluation based on what? what is the business model?

antirez · on Dec 10, 2023

I believe that the rationale is that if you can do an outstanding 7B model, it is likely that you are able to create, in the near future, something that may compete with OpenAI, and something that makes money, too.

jaspa99 · on Dec 10, 2023

Curious to see how this will impact Aleph Alpha

emadm · on Dec 10, 2023

Aleph Alpha raised even more ^_^

https://sifted.eu/articles/ai-startup-aleph-alpha-raises-500...

qeternity · on Dec 10, 2023

I see a lot of comments asking what or how people are using these models for.

The promise of LLMs is not in chatbots (imho). At scale, you will not even realize you are interacting with a language model.

It just happens to be that the first, most boring, lowest hanging fruit products that OAI, Anthropic, et al pump out are chatbots.

Racing0461 · on Dec 10, 2023

With the new AI regulations the EU is going to adopt, how long will mistral be paris based?

rolisz · on Dec 10, 2023

Maybe the regulations will be Mistral shaped.

Barrin92 · on Dec 10, 2023

there's nothing in the new AI regulations hindering Mistral's work. Open Source foundation models are in no way impacted.

https://x.com/ylecun/status/1733481002234679685?s=20

Racing0461 · on Dec 10, 2023

We both know that's not how regulations work. Mistral is going to have to get a legal team to understand the regulations, have a line item for each provision, verify each one doesn't apply to them, get it signed off and continously monitor for changes both to the laws and the code to make sure it stays compliant. This will just be a mandate from HR/Legal/Investors.

Alot of work for a company with no commercial offering off the bat. And possibly an insurmountable amount of work for new players trying to enter.

arlort · on Dec 10, 2023

> Alot of work for a company with no commercial offering off the bat

If you have no commercial offering it doesn't apply to you at all in the first place

bsaul · on Dec 10, 2023

If you never have any commercial offering, you have a 0 valuation.

sofixa · on Dec 11, 2023

Meta didn't have any commercial offering until what, WhatsApp for business a few years ago, around 2018? By your logic they should have never been valued at anything or made any profit, yet they did.

kozikow · on Dec 10, 2023

Or another way to put it - if you are an enterprise based in Europe that needs to stay compliant, future regulation will make it very hard to not use Mistral :P.

andsoitis · on Dec 10, 2023

Regardless of where a company is headquartered, it has to comply with local regulations.

Racing0461 · on Dec 10, 2023

Only if it wants to do business there. If a company is just headquartered there, they have to comply with regulations no matter what.

throwaway09799 · on Dec 11, 2023

$2B is super cheap when ChatGPT wrapper AI startups are worth $500M.

Alifatisk · on Dec 11, 2023

That’s insane how much money is flowing is between these investors and startups

vinni2 · on Dec 11, 2023

Dupe https://news.ycombinator.com/item?id=38580758

mytailorisrich · on Dec 10, 2023

Perhaps someone can answer this: this is a one year old company. Does this mean that barriers to entry are low and replication relatively simple?

cavisne · on Dec 10, 2023

The part of Meta research that worked on LLaMa happened to be based in the Paris office. Then some of the leads left and started Mistral.

Complex/simple is not really the right way to think about training these models, I'd say its more arcane. Every mistake is expensive because it takes a ton of GPU time and/or human fine tuning time. Take a look at the logbooks of some of the open source/research training runs.

So these engineers have some value as they've seen these mistakes (paid for by Meta's budget).

emadm · on Dec 10, 2023

Main barrier right now is access to supercompute and how to run it, everything is standardising quickly in the space

b2bsaas00 · on Dec 10, 2023

Anyone has example of products that made large use of LLM API that could make economics sense to use self-hosted model (Mistral, LLAMA)?

sroussey · on Dec 10, 2023

Im working on embeddings database of my personal information, and ability to query it. Just a privacy reason.

eeasss · on Dec 10, 2023

Some folks on this forum seem to get irritated by the prospect of a successful AI company HQed in the EU. Why the hate?

sofixa · on Dec 11, 2023

Because many around here have a preconceived bias that Europe cannot be innovative, and any proof to the contrary needs to be shat upon as not good or innovative enough/only looking for government contracts, or that they're not the size of Meta or Alphabet or Apple so obviously they aren't really innovative, or some other goal post shifting exercise.

VirusNewbie · on Dec 10, 2023

A competitor to OpenAI in like, benchmarks?

consumer451 · on Dec 10, 2023

At least a competitor to Llama, for now.

https://medium.com/@datadrifters/mistral-7b-beats-llama-v2-1...

Frummy · on Dec 10, 2023

That's fair given it's 50 times more difficult to use their model

firebot · on Dec 10, 2023

Who comes up with these valuations? The Donald?

wholien · on Dec 10, 2023

how does Mistral monetize or plan to monetize? create a chat gpt-like service and charge? license to other businesses?

nojvek · on Dec 10, 2023

Valuation means Jack shit for early stage startup. WeWork was valued at $50B at its peak.

Until a company is consistently showing growth in revenue and a path to sustainable profitability, valuation is essentially wild speculation.

OpenAI is wildly unprofitable right now. The revenue they make is through nice APIs.

What is Mistral’s plan for profitability?

Right now stability AI is in dumps and looking for a buyer.

Only companies I see making money in AI are those who live like cockroaches and very capital efficient. Midjourney and Comma.ai come to mind.

Very much applaud them for open release of models and weights.

emadm · on Dec 10, 2023

It’s kinda weird thinking deep tech companies should be profitable a year in.

Like it takes time to make lots of money and it’s really hard to build state of the art models.

Reality is this market is huge and growing massively as it is so much more efficient to use these models than many (but not all) tasks.

At stability I told team to focus on shipping models as next year is the year for generative media where we are the leader as language models go to the edge.

mpalmer · on Dec 10, 2023

They didn't say that companies should be profitable at a year in.

To my mind they just seemed to be responding to the slightly clickbait-y title, which focuses on the valuation, which has some significance but is still pretty abstract. Still, headlines love the word "billion".

The straight-news version of the headline would probably focus more on a16z's new round.

nojvek · on Dec 10, 2023

I acknowledge it’s easy to be an armchair critic. You are the ones in battlefield doing real work and pushing the edge.

The thing is I don’t want the pro-open-source players to fizzle out and implode because funding dried up and they have no path to self sustainability.

AGI could be 6 months away or 6 decades away.

E.g Cruise has a high probability of imploding. They raised too much and didn’t deliver. Now California has revoked their license for driverless cars.

I’m 100% sure AGI, driverless cars and amazing robots will come. Fairly convinced the ones who get us there will be the cockroaches and not the dinosaurs.

emadm · on Dec 10, 2023

I think its also tough at the early stage of the diffusion (aha) of innovation curve, we are at the point of early adopters and high churn before mass adoption of these technologies over the coming years as they are good enough, fast enough and cheap enough.

AGI is a bit of a canard imo, its not really actionable on a business sense.

segmondy · on Dec 10, 2023

Profitability likewise means jack shit. You just need to be have a successful acquisition by a lazy dinosaur or go make enough income to go public. You can lose money for 10yrs straight while transferring wealth from the public to the investors/owners. With that said, I'm short Mistral for them being French. I have absolute zero faith in EU based orgs.

On profitability, For all the new comers, I don't think anyone can wager that any of them is going to make money. Capital efficiency is overrated so long as they can survive for the next year+, they are all trying to corner the market and OpenAI is the one that seems to have found a way to milk the cow for now. I truly believe that the true hitmakers are yet to enter the scene.

stavros · on Dec 10, 2023

This is just tangential, but I wouldn't call their APIs "nice", I'd be far less charitable. I spent a few hours (because that's how long it took to figure out the API, due to almost zero documentation) and wrote a nicer Python layer:

https://github.com/skorokithakis/ez-openai/

With all that money, I would have thought they'd be able to design more user-friendly APIs. Maybe they could even ask an LLM for help.

evantbyrne · on Dec 10, 2023

Valuation matters quite a bit for continued funding.

toss1 · on Dec 10, 2023

Yes, and it can matter in a very bad way if you need to subsequently have a "down round" (more funding at a lower valuation).

Initial high valuations mean the founders get a lot of initial money giving up little stock. This can be awesome if they become strongly cash-flow positive before they run out of that much runway. But if not, they'll get crammed hard in subsequent rounds.

The more key question is: how much funding did they raise at that great valuation, and is it sufficient runway? Looks like €450 million plus an additional €120 million in convertible debt. Might be enough, depending on their expenses...

evantbyrne · on Dec 10, 2023

I'm not saying that either of your concerns are invalid. The LLM space is just the wrong place to be for investors who are worried about cash-flow positivity this early in the game. These models are crazy expensive to develop _currently_, but they is getting cheaper to train all the time. Meaning Mistral spent a fraction of what OpenAI did on GPT-3 to train their debut model, and that companies started one year from now will be spending a fraction of what both are spending presently to train their debut models.

toss1 · on Dec 11, 2023

YUP. Plus, the points at the end of your post, abt how much faster and cheaper it is getting to train new models indicates that Mistral may have hit a real sweet-spot. They are getting funding at a moment where the expectations are that huge capital is needed to build these models, just when those costs are declining, so the same investment will buy them a lot more runway than it did for previous competitors...

hauget · on Dec 10, 2023

His point is with regards to reaching & maintaining profitability, not revenue spending.

evantbyrne · on Dec 10, 2023

It's too early for Mistral to focus on revenue. These AI companies are best thought of as moonshot projects.

rmbyrro · on Dec 10, 2023

Generally agree.

Instead of "path to profitability", I think path to ROI is more appropriate, though.

WhatsApp never had a path to profitability, but it had a clear path to ROI by building a unique and massive user base that major social networks would fight for.

wslh · on Dec 10, 2023

> OpenAI is wildly unprofitable right now.

Do we know some of its numbers? How many paid subscribers do they have? I pay for two subscriptions.

vagrantJin · on Dec 10, 2023

comma.ai is a great example of a good business.

But I might have a bias because I was following along as the company was built from whiteboard diagrams to what it became.

hn_throwaway_99 · on Dec 10, 2023

Perhaps too much off-topic, but I hate how the press (and often the startups themselves) focuses on the valuation number when a company receives funding. As we've seen in very recent history, those valuation numbers are at best a finger in the wind, and of course a big capital intensive project like AI requires a valuation that is at least a couple multiples of the investment, even if it's all essentially based on hope.

I think it would make much more sense to focus on the "reality side" of the transaction, e.g. "Mistral AI received a €450 million investment from top tech VC firms."

shrimpx · on Dec 10, 2023

The valuation is meaningful in the sense of "Mistral sells 22.5% of company to VC firms."

matmulbro · on Dec 10, 2023

LLM space is so cringe so much excitement from supply side and no excitement/cringe from supposed demand side

rogerkirkness · on Dec 10, 2023

Microsoft Cloud AI revenue went $90M, $900M, $2.7B in three quarters. How much more hard dollar demand growth could there possibly be at this point?

echelon · on Dec 10, 2023

They're selling to startups, not consumers.

The good startups are building, fine tuning, and running models locally.