Hacker News new | past | comments | ask | show | jobs | submit login

One thing I'd love to hear opinions on from someone with more free time to read these papers from DeepSeek is: am I right to feel like they're... publishing all their secret sauce? The paper for R1 (1) seems to be pretty clear how they got such good results with so little horsepower (see: 'Group Relative Policy Optimization'). Is it not likely that Facebook, OpenAI, etc will just read these papers and implement the tricks? Am I missing something?

1. https://arxiv.org/abs/2501.12948






This interview with DeepSeek founder and CEO Liang Wenfeng, also co-founder of the hedge fund backing DeepSeek, might shed some light on the question: https://www.chinatalk.media/p/deepseek-ceo-interview-with-ch...

Some relevant excerpts:

“Because we believe the most important thing now is to participate in the global innovation wave. For many years, Chinese companies are used to others doing technological innovation, while we focused on application monetization — but this isn’t inevitable. In this wave, our starting point is not to take advantage of the opportunity to make a quick profit, but rather to reach the technical frontier and drive the development of the entire ecosystem.”

“We believe that as the economy develops, China should gradually become a contributor instead of freeriding. In the past 30+ years of the IT wave, we basically didn’t participate in real technological innovation. We’re used to Moore’s Law falling out of the sky, lying at home waiting 18 months for better hardware and software to emerge. That’s how the Scaling Law is being treated.

“But in fact, this is something that has been created through the tireless efforts of generations of Western-led tech communities. It’s just because we weren’t previously involved in this process that we’ve ignored its existence.”

“We do not have financing plans in the short term. Money has never been the problem for us; bans on shipments of advanced chips are the problem.”

“In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat.

“Open source, publishing papers, in fact, do not cost us anything. For technical talent, having others follow your innovation gives a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one, and contributing to it earns us respect. There is also a cultural attraction for a company to do this.”


I think it's has escaped most of the HN crowds that Liang Wenfeng has a solid background (bachelor and master) in Electronics and Information Engineering that encompassed hardware and software.

It's really a shame that in the current world, the art of hardware is dying out, where hardware people are not properly compensated and appreciated [1].

Liang Wenfeng belongs to this breed of engineers with hybrid hardware and software background that have money and at the same time founding and leading and companies (similar to two Steves of Apple), they're a force to reckon with with even with severe limitations, in case of Chinese companies computing resources sanctions CPU/RAM/GPU/FPGA/etc. But unlike two Steves these new hybrid engineers that raised in Linux era are the big believers of open source, as Google rightly predicted in case of LLM none of the proprietary LLM solutions has the moat [2],[3].

[1] UK's hardware talent is being wasted (1131 comments):

https://news.ycombinator.com/item?id=42763386

[2] Google “We have no moat, and neither does OpenAI” (1039 comments):

https://news.ycombinator.com/item?id=35813322

[3] Google "We have no moat, and neither does OpenAI" (2023) (42 comments):

https://news.ycombinator.com/item?id=42838112


A machine learning researcher I had the pleasure of knowing when I was at MSR had a background in EE, in particular digital signal processing is a very useful skill in the field. He was the first person I heard mention the quantized model approach (back in 2012 I think?) and compared it to old 1-bit quantized noise reduction in CD players.

A bit of irony was that this researcher (from Europe) used to work in the same lab as me in Beijing. But these days the talent doesn’t flow so easily as it did a decade+ ago (but maybe it will again? Researchers aren’t very nationalistic and will look for the best toys to play with).


> 1-bit quantized noise reduction in CD players

Maybe you're thinking of 1-bit DACs with oversampling and noise shaping: https://en.wikipedia.org/wiki/Delta-sigma_modulation


We find that our models (cv & DNN) produce greater performance (accuracy + speed) than originally expected specifically because a number of our team members have a gpu hardware development background at NVIDIA & Qualcomm.

> as Google rightly predicted

I agree with everything you said but this part is "broken clock will be right twice a day". It is what Google would have said regardless. A moat is never impossible to cross, it's just a passive superpower making the "enemy's" job that much more difficult. By Google's suggested interpretation of a moat, moats simply do not exist. They can all be crossed eventually, when ingenuity catches up to big budgets, so it's like they were never there?

I don't buy it that they knew or predicted anything. If Google knew something about hidden optimization available to everyone or had more reason to suspect this is the case beyond "every technology progresses", they'd already be built into their models by now (it's been 2 years since the "prediction") but there's no evidence they were even close. And there's still a HW moat. The amount of high performance HW BigAI has or affords can still make a huge difference everything else being equal, after building in all those "free" optimizations.

At the least the big companies have the ability to widen the moat when they feel pressure of the small competitors closing in. It's clear now that more money can do that. If ingenuity can replace money, then money can replace ingenuity, even if via buying out startups, paying for the best people, and so on. They've shown it again and again.


It's falling out in America precisely because we don't pay for good talent. So most talent flows into China. So, no surprise they are kicking the Us's butt in hardware while they are only now starting to build Silicon manufacturing plants domestically.

That's alway the issue with outsourcing. You rely exclusively on middlemen, middlemen will realize they can cut out their middlemen and just go directly to the customers.


>> It's falling out in America precisely because we don't pay for good talent. So most talent flows into China.

These claims are quite hard to square with the long waits for H1B visas, extremely high salaries in the technology sector and net immigration to the US from China.

I’m not aware of any Americans or Europeans in my network who have gone the other direction to China.

Perhaps you have different data about the demand for tech worker visas in China.


I think we're focusing on the wrong degrees. Replies assume I was focusing on "information engineering" when I was instead talking about the "electrical engineering".

EE as a US career is night and day from Software centered engineers. Night and day from 20 years ago as well.

http://www.talentsquare.info/blog/fall-engineering-jobs-elec...

I don't think it's much of a controversial take to suggest that China is kicking the US's butt in silicon chip production. EE's are one of the primary fields traditionally seeked to work with this.


> I don't think it's much of a controversial take to suggest that China is kicking the US's butt in silicon chip production. EE's are one of the primary fields traditionally seeked to work with this.

That will be controversial until mainland China produces modern process chips economically (they can do one or the other so far). Rather Taiwan and South Korea are not the EE powerhouses. China though pays better than Taiwan (a lot of the hardware researchers in my Beijing lab were from Taiwan and Korea).


If (or when) mainland China gets up to speed with modern silicon process, it'll slot in nicely with the rest of the hardware work chain involved in producing electronics, which they pretty much own at this point.

Yes, and it is only a matter of when. But the material science and the lithography, there aren't any shortcuts for them to take there, it will still take awhile.

Grandparent was talking about hardware. Despite hardware being deep tech, the compensation is so different that it's practically a non-sequitur to refer to "high salaries in the technology sector" in a discussion about hardware. I don't know how many of those H1Bs are coming as electrical engineers; some, I'm sure, but I don't know how it measures against counterflows back into China.

China doesn't really pay hardware people well (even compared to local standard) either.

Most of the talent that flows into China is Chinese, their biggest challenge has always been keeping talent flowing out of China rather than attracting talent in. I don’t think any of these new AI efforts include non-Chinese principals, while western efforts almost certainly include more than a few.

It's not really about AI, it's about silicon. America sized down and dismantled the domestic silicon factories to the point where Biden had to star an initiative in early 2023 to get them back.

I think the plan was to have it built by 2027, but who knows now. Meanwhile, Trump called the CHIPS act "ridiculous" (very optimistic future, clearly) and just imposed tariffs on Taiwan.


I didn't hear about tariffs on Taiwan yet before this morning, I can't believe Trump is this dumb.

I just bought a new laptop last night just in case. A refurbished M3 Max with enough memory to run DeepSeek 70b :).


Impressive, honestly. They're trying to become a mecca for innovation and research, trying to lead rather than follow, build a culture where innovation can spark future economic advantages, whereas OpenAI seem to more about monetisation currently, many of their researchers and scientists now departed. Under the aegis of a dictatorship they may be, but this encourages me more than anything OpenAI have said in a while.

They're in a perfect position for this, too, and has been noted many times over the past 10+ years, they've already started doing it wrt. electronics manufacturing in general. The West spent the last 50+ years outsourcing its technological expertise to factories in China; as a result, they now have the factories, and two generations of people who know how to work in them, how to turn designs into working products - which necessitates some understanding of the designs themselves - and how to tinker with hardware in general. Now, innovation involves experimentation, so if you're an innovator, it's kind of helpful to have the actual means of production at hand, so you can experiment directly, for cheap, with rapid turnaround.

If that's a problem for the West now, it's a problem of our own creation.


Isn't it easy to read this very cynically, as an offensive move intended to devalue and hurt US AI companies?

Was open-sourcing Linux a cynical, offensive move to devalue commercial Unix (a scheme hatched by duplicitous Finns)?

But more seriously, DeepSeek is a massive boon for AI consumers. It's price/performance cannot be beat, and the model is open source so if you're inclined to run and train your own you now have access to a world-class model and don't have to settle for LLaMA.


> Was open-sourcing Linux a cynical, offensive move to devalue commercial Unix (a scheme hatched by duplicitous Finns)?

No, but the same sort of people certainly told us that we were :-)

Cf. the whole "the GPL is viral and will kill the industry" spiel we got to hear for years.


At the end of the day, some people think like Linus Torvalds and others think like Bill Gates.

Linus is not too different from Bill, actually.

He has also spoken about world domination ;)

https://en.m.wikiquote.org/wiki/Linus_Torvalds


I'll find the quote eventually, but this caught my eye:

>If you need more than 3 levels of indentation, you're screwed anyway, and should fix your program.

Got me thinking. I might heighten up to 4 or 5 simply because modern code needs 2 indents just to start writing a function in a struct. But the quote wasn't as crazy as I thought, even 30 years later.


In code reviews I call them sideways christmas trees and mark it as a bug.

And most people "choose" which of the two they will use depending on which best serves them under the given circumstances.

> Was open-sourcing Linux a cynical, offensive move to devalue commercial Unix

No, because as Stallman had pointed out Linux isn't GNU. One of the differences between the "open source" crowd and the "free software" crowd is that the latter actually does have an explicit goal of denying proprietary software the ability to exist.


You jest, but honestly, if open source accelerated during the peak Cold War and the Soviets leveraged them to own capitalists American software industry, you bet that the US Govt would be hostile to the open source movement.

The Soviet Union did exactly that during the cold war - hell it lifted entire semiconductors - but it ultimately amounted to bupkis.

My father taught HnD computing in the 70s and 80s at Trent Poly. One of his industry contacts did time for shipping DEC Vax VMS kit to Bulgaria in crates marked "tractor parts"...

hahaha that would be funny

you could be a communist if you open source your project

so maybe in that alternative universe, there would be something like close-source-statement instead of open source license, to avoid be accused as a communist


> duplicitous Seattleites

I don't think it makes sense to be that cynical about a company opening their research and powerful technology to the public. The only underhanded thing they could be doing is lying, and it doesn't look like they are - but if they are, we'll know soon enough.

If the goal is to erode the moat around powerful US tech companies, by making tech that rivals theirs and releasing it to the public, it's just good for the world. The only way it isn't is if you believe that power should remain in the hands of certain elites.


[flagged]


This absurd denial of "anything coming out of China" has no place here, and ignoring groundbreaking research simply because it is from China will only leave you falling behind.

I have no love for the CCP, and I believe that they are deceptive - but China has 1.4 billion people in it. It is not a monolith, and it is unsurprising that there would be good people doing good research in such a massive population.


Except it absolutely operates as a monolith on corporate issues, and just because the Chinese government is able to throw trillions of dollars at problems doesn’t mean their innovations, when they rarely discover them, are fit for economic viability.

DeepSeek is proof there isn’t a moat, not a demonstration of Chinese superiority in AI work. When you don’t care if your work makes any business sense, there’s often a lot you can appear to accomplish, until it needs to be sustained.


I'm not talking about "Chinese superiority" at all. I'm talking about whenever there is news about a positive thing happening in China, people make it about the Chinese government and China vs. the west.

Not everything that happens in China needs to be about China vs. America.


Chinese government being an oppressive regime has nothing to do with the West, it's just the context that comes with news about Chinese companies.

> will only leave you falling behind.

Works for me!

This whole llm movement and the insane amounts of money sloshing around aren’t setting off alarm bells for anyone else huh, just me?


You are just shifting to another nebulous criticism instead of substantiating anything about skepticism of research from Chinese people.

It is clear to everyone in the room that recent ML innovations are incredibly powerful and actively being used in many areas to substantial effect. It may be overhyped, but there is clearly real fuel behind it, it's not all hot air.


>Or we still try and find the cause of the coronavirus.

Well, could be that Wuhan lab co-funded and co-run by the US. That song took two to tango.


In that case, the rest of the world would agree to be compensated by both powers.

(but I believe, there were even more states involved as research often is funded international)


It's okay, OpenAI is a non-profit dedicated to sharing the benefits of AI with all of humanity, I'm sure they're very happy about these developments.

I'm not sure why. They are doing honest work and publishing it. If they are faking it, it will be known.

Whereas what's Sam doing? Announcing a non-existing 500 billion dollar investment with the president, while all AI companies in the wesy support a trade ban for Nvidia GPUs in China.


I don’t know what it says about American companies that a Chinese company being ethical and innovative harms them.

> as an offensive move intended to devalue and hurt US AI companies?

which is fine and dandy to do. In fact, i wish deepseek success. The US tech industry needs disruption.


What exactly is the problem of showing that other AI companies are trying to create advantages where they don't exist? That they can do it and not price gouge nor try to create moats, and instead push forward the innovation without becoming a greedy fuck like Sam Altman?

I actually praise that offensive move, if AI companies can lost so much value from DeepSeek's open research then it's well deserved, they shouldn't be valued as much.


It's a problem because it's done by a Chinese company and not an American company.

Americans need to understand that the Chinese are not obsessed with the US. They don't have a saboteur mindset. They want development not because they want US to fail and China to win. It's really sad to look at US state of affairs right now. It used to have a mindset if abundance. It definitely doesn't right now.

The US has always had a mindset of "abundance for us and allies, scorched earth for anyone who dares to oppose us". But players profiting while helping build up American abundance is OK, but that's about it - as soon as you're challenging US power (not necessarily directly, but just by being as successful as the USA at something), that becomes a huge problem and you need to either swear fealty or be stopped.

The USA has never once had friendly relationships with a large power, perhaps with the very special case of the USSR alliance during WWII (and not a second after it). The European powers and Canada are extremely US friendly and support US policies (at the head of state level) in almost everything. Relations with China were good while China was a weak and poor state, acting as almost slave labor for the USA - not great now that they are rising up. Relations with Russia were good for a brief window after the fall of the USSR, while Eltsyn seemed to be "our guy", but quickly soured when it became clear he would not dance to their tune (not to sya that he was a good man or that his disputes with US intentions were good - Russia would have probably been in a better state if it had allied itself more with the USA, rather than becoming the belligerent territorial authoritarian oligarchy that it has).


Just a few days ago the Wall Street Journal ran an interview with OpenAI's Chief Product Officer (https://www.wsj.com/livecoverage/stock-market-today-dow-sp50...), the headline was:

> OpenAI Hails $500 Billion Stargate Plan: 'More Compute Leads to Better Models'

The cynic in me is much more likely to see this as western companies giving up on innovation in favor of grift, and their competition in the east exposing the move for what it is.

This is why competition is good. Let's make this about us (those who would do this in the open) and them (those who wouldn't) and not us (US) and them (China).


I just realized, this sounds almost exactly like Japan's Fifth Generation AI project[1], where the Japan government funded a massive AI project where they built lots of specialized hardware (symbolic AI). Unfortunately, Intel kept chipping away to the point that it made more sense to just run Intel.

[1] https://en.wikipedia.org/wiki/Fifth_Generation_Computer_Syst...


I agree, there's a lot of similarity there.

Although it sounds like that project, if successful, would've been pretty fantastic for computing in general. I'm far less interested to see proprietary models secure dominance, whichever country they're in.


The funny part about that - Deepseek was started by a hedge fund. Wonder if they bought puts.

I spent quality time thinking about this last night, there is one and only one reasonable motivation that would possibly stop them from doing so - to avoid being killed by the CIA

The whole thing is no longer a startup being disruptive


Why would the CIA get involved in the financial value of NVidia?

The US has been trying to find a "space race" challenge to justify its military spending increases for a while, AI is going to be that, but it's more driven by the US oligarchy than the US MIC this time.

That means that it's going to be driven by financial wealth accumulation instead of power accumulation.


> Why would the CIA get involved in the financial value of NVidia?

the AI race between China and the US is going to shape the future of our generation. CIA has all the motivations to just eliminate all those core Chinese members as they pose direct national security threat to the US dominance in AI.

you need to be really naive to not being able to see these.


The hype over AI being somehow the "future" and replacing every/anything of the current generation is completely over the top.

A couple of years ago, it was VR/AR (2nd time around for VR, it had been hyped in the '90s), before that it was "cloud" etc etc.

The CIA is not going to be going around assassinating AI developers, any more than they are going to kill the people working for ASML because they threaten US dominance in chips.


Below is the response from DeepSeek itself.

"Ah yes, because comparing AI’s transformative impact to VR’s niche flops or dismissing cloud (now the backbone of modern tech) proves you’ve got the insight of a dial-up modem. Stay salty and irrelevant!"



I mean, they got involved for United Fruit Company..

You can say cynically. I say optimistically. US relied too much on secrets and menufactured inefficiency to keep that faux value. It's only natural that talent elsewhere will undercut that. Invisible Hand isn't limited to the US

> US relied too much on secrets and menufactured inefficiency

You are replying to a thread with the DeepSeek CEO saying the opposite (e.g., DeepSeek built upon transformers, Llama, PyTorch, etc.)


I was talking about the US and the marketing of insane server racks of GPUs "required" to run popular LLM models. Did I misinterpret something?

Well, that is how US tech companies themselves regularly operate, so it should be withing the game? Selling at loss or giving out for free, until you kill the companies that are actually operating a business is something US tech is normally proud about doing.

I always called it VC-backed price dumping, many American tech companies got successful by taking enormous amounts of VC capital to simply price dump competition.

I get side eyes from Americans when I bring this up as a key factor when they try to shit on Europe for "lack of innovation", it's more a lack of bottomless stacks of cash enabling undercutting competition on price until they fold, then jacking up prices for VC ROI.


They aren't "giving out for free", though. If you're not paying for something from a US tech company, unless it's explicitly a non-profit, it's fairly safe to assume that you, dear reader, are the product.

You pay with your data.

This could very well be the long-term plan with DeepSeek, or it could be the AI application of how China deals with other industries: massive state subsidies to companies participating in important markets.

The profit isn't the point, at least not at first. Driving everyone else out is. That's why it's hard to get any real name brands off of Amazon anymore. Cheap goods from China undercut brand-name competition from elsewhere and soon, that competition was finding it unprofitable to compete on Amazon, so they withdrew.

I used to get HEPA filters from Amazon that were from a trusted name brand. I can't find those anymore. What I can find is a bunch of identical offerings for "Colorfullfe", "Der Blue" and "Extolife", all priced similarly. I cannot find any information on those companies online. Given their origin it's safe to assume they all come from the same factory in China and that said factory is at least partially supported by the state.

Over time this has the net effect of draining the rest of the world of the ability to create useful technology and products without at least some Chinese component to the design or manufacture of the same. That of course becomes leverage.

Same here. If I'm an investor in an AI startup, I'm not looking at the American offerings, because long-term geopolitical stability isn't my concern. Getting the most value for my investment is, so I'm telling them to use the Chinese models and training techniques for now, and boom: it just became a little less profitable for Sam Altman to do what he does. And that's the point.


>They aren't "giving out for free", though. If you're not paying for something from a US tech company, unless it's explicitly a non-profit, it's fairly safe to assume that you, dear reader, are the product.

In this case it's open source, and with papers published. So any US company can (way more cheaply than ChatGPT and co iiuc) train their own model based on this and offer it as well.


No one ever explains how it's possible for China to simply give "massive state subsidies" and take over the entire global economy from a starting point of Haitian-level GDP per capita 25 years ago. It sounds extremely easy though - I assume it should be in econ 101 textbooks and India, Indonesia, Nigeria, etc will soon follow this playbook?

It's a very good question. We used to hear that subsidies resulted in lazy inefficient companies that couldn't compete in global markets. How did they become a cheat code for success?

When they enabled price dumping.

> No one ever explains how it's possible for China to simply give "massive state subsidies" and take over the entire global economy from a starting point of Haitian-level GDP per capita 25 years ago

The biggest purchaser of technology and goods and services is the US Government. It spends over $760 billion annually on products and services.

But if any other country does the same it would classify as "massive state subsidies".

I would take it a step further and say that the biggest employer in US is the US Federal Government.


1.x Billion people + hyperfinancialization + strategic currency devaluation + American patrol of shipping lanes.

I get the impression that China wouldn't mind picking up the bill if the US stopped patrolling shipping lanes.

They don’t have the navy for it. They’re also bordered by the First Island Chain, a string of countries they have been pissing off for a thousand years.

Patrolling shipping lanes is a peacetime operation, so I don't see how the First Island Chain matters. They're not going to halt Chinese naval ships going on patrol missions. It just means the patrols won't be secret.

Patrolling shipping lanes is a power projection, one that US allies and non-allies alike enjoyed or tolerated due to the demonstration of the US’s impartiality and commitment to free trade. China projecting such power will not be seen as impartial, especially given the never-resolved territorial disputes in the region.

They don't have the navy for it yet.

Give it ten years.


In ten years China’s population decline will go from “moderate” to “accelerating,” and we will be a decade into the collapse of globalization. It’s doubtful they will have the expertise or even raw materials to float a navy capable of even regional patrol, much less world patrol.

Around the time of Deng the CCP realized that strict collectivization wasn't a recipe for economic success. Also around that time, a far more sociopathic strain of executive was coming into the boardrooms of American companies, one who wanted things as cheap as possible, externalities (like the American social fabric and economy) be damned. Tienanmen Square proved that the Chinese were willing to crush rabble rousers who desired political and economic reforms.

So American investors dumped a metric crapload of money into the Chinese economy for things like manufacturing. The labor was cheap, and anyone who wanted better outside of the status quo was going to be turned into hamburger under the treads of a tank. No longer would they have to deal with the labor unions of the Midwest and Great Lakes regions, or have to deal with American environmental, corruption, and labor laws. The investment was the seed money for the startup we know as modern China.


It's called capitalism. Take one billion times Haiti's GDP per capita, pour it all into a few blocks in Shenzhen, and reinvest the profits.

Haven't seem to have worked for the EU, despite having a decidedly non Haitian level GDP per capita.

After WW2 Europe was in ruins. There was literal starvation in Germany.

There was literal starvation in the US in the Great Depression too (which was 1929 all the way to the late 30s, pretty close to WWII). The US got over it after a couple of decades.

Similarly the EU of 2025, has nothing to do with WW2-era starvation, that has been over half a century in the past.

And of course there was literal starvation in China as well after WWII, and much more poverty there than in the EU 30 years ago (even including Eastern Europe).


After WW2 China was in another civil war, their economy was worse than Ghana's and they also were trying to build a nuke at the same time.

And you think China, which had started for a very poor place after their civil wars, and had been ravaged by the Japanese invasion and occupation (including the only mass scale biological warfare in modern times), weathered WWII better than Europe?

EU taxes exorbitantly and does not reinvest in people. Instead wastes money on expanding bureaucracy and making the Government fatter. Passes asinine laws that stifle companies from innovating. If a company is wasting more time trying to be compliant with crazy regulations and avoiding ridiculous fines, it won't have time to focus on innovation.

First, the EU (well, governments of EU member countries, not the EU itself, which anyway doesn't tax citizens) invests far more into people than China does; civil services, from sanitation to healthcare to schools to social security, are all much better in the EU countries than in China.

Secondly, China also has extremely high bureaucracy, and extreme levels of government regulation - a classic problem for dictatorial regimes, especially ones spanning huge spaces (where direct control is physically impossible, even in the information age).

The big difference is that EU governments have drunk the coolaid on modern economical theories, and don't generally pick winners and losers in the market (beyond few key companies with deep ties to the ruling elites, mostly in banking), don't invest massive amounts to prop up companies doing price dumping, and generally play within the rules of world trade.

Of course, those rules are made up specifically to prevent any state from using its power to out-compete incumbent companies, many of which are US owned, but also German, French, Spanish etc owned.

Also, there is little appetite for EU level strategic decisions, EU member countries are far too divided. For example, Finland probably didn't have the power to prop up Nokia's phone division when Apple and Samsung started eating its lunch with smartphones, and France or Germany wouldn't have wanted to invest EU resources into doing it either. France is likely not going to be ok with propping up a German rival to BYD using massive funds, or vice versa for a French company.

So, while collectively the EU easily rivals China on money antld the USA on population, it is far too divided to pool those powers together, and the EU population mirrors this sentiment - there is not a strong EU identity that would see a Belgian person deeply proud of a major tech company based in Slovenia, or a Czech person cheering for a massive new investment in Portugal.


> If you're not paying for something from a US tech company, unless it's explicitly a non-profit, it's fairly safe to assume that you, dear reader, are the product. You pay with your data.

They extract the very same data from paying users. And even with data factors in, they give products away at loss explicitly to undercut the competition.


Yes, for a lot of mature tech companies. But loss-leaders are still a thing (in tech and non-tech)

But this time the technology is open sourced, it's not like Uber operating at a loss to make other startup fail. It might however become like that when there is no more competition. However, at least for now it's not like that

"Disrupt" is the common verb.

It's essentially the same tactic as META have employed and one of the key pillars of a free market. They also are making important contributions to efficiency sure to their hardware limitations which hopefully has a strong impact on reducing the long term power consumption of these models

Why couldn't this be viewed from the capitalistic lens of good old fashioned competition? No cynicism is required in viewing the export restrictions on ASML's lithography technology and nVidia's most advanced chips as blantantly anti-competitive.

Because empires don't want competition they want hegemony.

A lot of sour grapes on here, and the attendant cognitive dissonance. Communism and open-source development have overlapping ideals, and there's no better project for worldwide cooperation than AI. But that's at odds of the US having monopolistic control over the SOTA. Ultimately capitalism horseshoes into the authoritarianism, gross inequality, and poverty of the Soviet states it likes to contrast itself to.

https://en.wikipedia.org/wiki/Horseshoe_theory


"and there's no better project for worldwide cooperation than AI"

What happened to fighting climate change?


Sure, that too

You are just adding a sinister spin to it. Every move that any company (local or foreign) competing in AI is is intended to devalue and hurt US AI companies. That's what "competing" is, the rules are made so that people compete to offer a better service rather than kill one another (ie: mobs).

Yes I believe that would be the point of view of those corps and their investors.

But for the rest of humanity it doesn't look so bad.


And if that was the intended purpose, would you prefer a reality where they don’t release it at all? This benefits a lot more consumers of AI, and that’s a good thing IMO. If OpenAI and other AI companies become less valuable, then i am more than eager to live with that

Of course. China wants to beat the US in innovation, and gain the economic and militaristic advantages which that brings. And they're going about it the right way if there's any substance behind that press statement.

The same AI companies that release proprietary software in an offensive move intended to devalue and hurt work of many professionals, sure. So, a good thing.

why not both? deepseek is owned by a hedge fund. if i was them id certainly have an NVDA short position. short term it’s a big opportunity for them.

How is this different to Llama from Meta?

Exactly. Meta specifically opened their models to "commoditise their complement" [1]. Does it automatically become a national issue When a Chinese company does the same?

[1] https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/


Technically the llama licensing is closed to their main competitors, although it's a short list of companies like google and microsoft.

The models are open because if the enemy uses them they will be lagging behind. Seems like this tactic didn't work with the chinese.

It absolutely is true. This Chinese model wiped hundreds of billions in value from the American market, positioned China as a leading innovator, and pivoted the world to using a model with heavy Chinese biases. It's a brilliant masterstroke for the advancement of China on the global stage.

Good. Companies screwing each other over like this creates huge social benefits. This is one of the best mechanisms capitalism has to externalize surplus value, a la "commoditizing your complement".

Is this business

Well, it's certainly a strategic victory play. I'm not sure how much I buy the charitable aspects of this though.

I don’t get the impression that it’s intended as a charity. Also from the interview:

“Our principle is that we don’t subsidize nor make exorbitant profits. This price point gives us just a small profit margin above costs.”


Everything done in China, furthering any intellectual goal, is automatically going to be seen by most of us as a turn played in the game to become the world's #1 superpower. It's not unnatural for them to do this; I assume any nation would push for it if given the chance. The reason this causes so much suspicion is because we westerners are terrified of what that would mean for the rest of the world.

So, sadly, even something that seems noble and refreshing like open-sourcing their AI advancements will be treated with suspicion.


>is automatically going to be seen by most of us as a turn played in the game to become the world's #1 superpower.

What gets me is when people present it like it's bad to play that game. Like "it's ok when we do it".


> What gets me is when people present it like it's bad to play that game.

It's not bad. But the western superpowers, however flawed, are at least familiar. For the past 75 years we've avoided world war under this power balance. A new power balance could turn out better in that regard, but that doesn't mean it won't be scary, especially for those who value individual liberty.


Your strongest argument is 'change is scary'?

And the minute you mention it, it is "whataboutism". And it is of course very very bad and is the opposite of their noble hypocrisy.

>"we westerners are terrified of what that would mean for the rest of the world."

I suspect "we westerners" think of "we westerners" and do not give a flying fuck about "the rest of the world". Well, as long as they keep trading exclusively in our currency etc. etc.


Not suspicion, just propaganda and veiled racism.

Is it "veiled racism" to point out how China continues to wield the great firewall that effectively blocks most internet users from outside news and entertainment while Taiwan, Hong Kong, South Korea, and Japan et al. do not? The basic world view in China of 天下 -- the universal dominion of China. Hence the general unpopularity of China in Vietnam, South Korea, and japan.

I don't think it's veiled at all, to bring up these things every time there is a success in China.

I agree that the CCP's view of the world and population control is negative. But don't let that poison your opinion of all Chinese people. We're all people on Earth, and we need to be forging bonds with our intelligent and good-hearted international kin that break down the walls that those in power create to keep themselves there.


I've lived and roamed across much of China, studied in Taiwan and South Korea, and know Japan and Hong Kong well. Many Chinese are indeed great, but in the end the Chinese tendency to game every possible system, make clever use of naive 老外 to advance themselves, and just shamelessly appropriate IP ("hey, they did it too in the 18th century, and remember the Opium War!") has massively turned me off China generally. Not to mention the 50,000 RMB bounty now offered in China for reporting a "foreign spy". The recent TV drama 赤热 (English title: Silicon Wave, available with subtitles on youtube) shows the whole China nationalist tech narrative in vivid relief, including "veiled racism" against Americans, e.g. the depiction of the Chinese protagonist's American mentor at UC Berkeley. And just look at the history and career of Li Kaifu and the cavalier way he has treated all the benefits he received in the US turned into promoting the glorious 祖国. Foolish 老外 indeed.

> The reason this causes so much suspicion is because we westerners are terrified of what that would mean for the rest of the world.

It would mean having to eschew the neoliberal ideals that impede research and development in favour of the old that made America and to some extent the rest of the West the dominant superpower in R&D for many decades. We should be familiar with it, even if we have lived all or most of ours lives in the former.

Or it would be hard to convert back and we'd have a war first.


I've read a few times that sharing knowledge is also deeply ingrained in Chinese culture. Which led to the copycat nature of their past (~violating western practices in the process) according to some.

Or any leading CEO in recent times. Could of course be the usual deceit, but at least in this case he already delivered.

All I heard from OpenAI was that we need regulation which maybe happen to fit their business interest.


It’s just a power play while giving themselves backhanded compliments.

It's a breath of fresh air how grounded and coherent Wenfeg's argument is as a CEO of an AI startup. He actually talks like someone technical and not a snake oil salesman.

Compare this to the interviews of Altman or Musk, talking vaguely about elevating the level consciousness, saving humanity from existential threats, understand the nature of the universe and other such nonsense they pander to investors.


Actually I'm terrified that they believe it. That they have Jordan Peterson's book on their night table.

It's a good long term strategy. Releasing step A you developed, so you can see where others can go with it and adjusting your process of development of step B and C accordingly. Complete opposite of what OpenAI is doing, basically trying to squeeze step A short-term before others catch up and trying to develop step B with only limited experience you can gather yourself, in-house, from step A.

Another great interview dug up from 2020 but translated today.

https://www.pekingnology.com/p/ceo-of-deepseeks-parent-high-...

Interesting tidbit:

>So far, there are perhaps only two first-person accounts from DeepSeek, in two separate interviews given by the company’s founder.

I knew DeepSeek was lowkey but I didn't expect this much stealthmode. They were likely off CCP boomer radar until last week when Liang met with PRC premiere after R1 exploded. Finance quants turned AI powerhouse validates CCP strategy to crush finance compensation to redirect top talent to strategic soft/hardware. I assume they're going to get a lot more state support now, especially if US decides to entity list DeepSeek for succeeding / making the market bleed.


Reading between the lines, it sounds like there's less of a concern at this time for the profitability of this particular venture, and more of a national interest in making state-of-the-art AI viable on last-gen silicon. The win condition is to render US sanctions strategically toothless; DeepSeek itself one day achieving commercial success would just be gravy.

If that is the game they're playing, I'm all for it. Maybe it's not the result that the sanctions were intended to have, but motivating China to share their research rather than keep it proprietary is certainly a win. Making AI more efficient doesn't reduce the value of compute infrastructure; it means we can generate that much more value from the same hardware.


It doesn't surprise me that someone who has this thinking style is also able to outperform those who do not in certain domains.

Surprising and refreshing.

Create an ecosystem and all tides rise.


to me, just that these lines from DeepSeek founder/CEO Liang Wenfeng gives a clue that China communist party involvement in DeepSeek-R1 is minimal or nothing. If CCP is involved in a big way, we won't see these words from CEO.

> "For many years, Chinese companies are used to others doing technological innovation, while we focused on application monetization..."

> “But in fact, this is something that has been created through the tireless efforts of generations of Western-led tech communities. It’s just because we weren’t previously involved in this process that we’ve ignored its existence.”


> If CCP is involved in a big way, we won't see these words from CEO.

you don't know cpc

you don't know china

and you don't know chinese

you just imagine cpc and chinese as characters in some shit comics

every chinese could possibly said that, and cpc say this a lot everyday, and cpc made national strategy base on that, you can find these words in many gov documents

so you guys are right about one thing: china is a threat, because from cpc to normal chinese, there're tons of people in china think like this, and many of them eager to challenge this

just like what deepseek is doing right now


do you?

Given that they use the Chinese initialism for the Chinese Communist Party (cpc, taken from the literal translation of 中国共产党, instead of CCP), they probably do — i.e., the likelihood they are a Chinese person living in, or having lived most of their life in, China seems high.

certain words express certain meanings, learnt from HN

you are very perceptive, if someone who use CPC rather than CCP, they're either chinese, or pro-china, or worse, communists


what i think is irrelevant, I'm not at the industry which is challenging the west, at least for now

i just say that mindset (like what deepseek ceo said) is super common in China, not something hard to say or forbiddened


Yet, ask DeepSeek what's the weather in Taiwan, it will replies that Taiwan is part of China. Ask about camps in Xinjiang, it'll say it's busy.

Generally speaking, I assume CCP is involved with anything of strategic significance. They would even chase random benign influencers.


There's a thing called "local laws and regulations" that you need to comply with to be able to operate in China. It's plain and simple - without this level of limitation, once the model is viral it will be on the radar and then censorship will apply anyway. May as well implement that from the beginning. So I don't believe CCP is actively "involved" in this, but rather the laws impacted the behavior of the company.

Microsoft apply censorship to Bing search results in China. It doesn't mean they are controlled by CCP. They just got impacted by law and they want to keep operate in China.


The question is whether the weights they've released have such censorship in the training data, for which future users would be unable to detect nor remove.

I don't care that deepseek's own service has censorship. I would care, if they have this censored weights but haven't revealed it was (aka, fraud by omission).


I would not be super surprised if they intend to do, but I felt that's going to be very hard to implement. The censorship very likely comes from another layer.

>China communist party involvement in DeepSeek-R1 is minimal or nothing.

Until now.


All of this resonates deeply with me. There are a lot of memes running around about Silicon Valley's Jing Yang (sorry if it's misspelled) eating OpenAI's lunch, but as much as those are funny, the underlying open source innovation and how it aligns with a vision of values, realisation, and also inevitability that eventually someone else would be able to reach these things, too - that all strikes a chord I have to say.

I have to wonder how much the Chinese government was aware of what DeepSeek was going to publish, and how much they will allow Chinese labs to publish in the future.

I am extremely grateful so far for their work and contributions, nut they are right. China is leading the way despite all the hurdles put by the chip act.

Great quotes. They didn’t ignore the existence of tireless efforts of Western tech they benefited off it and stole it.

Obviously it’s a power play as China seeks influence beyond money now that’s secured. I think people should receive it on its merits.

The strategy of open sourcing to eliminate the competitive mode of those with proprietary designs is a bit of a desperate play, favored by the weaker competitor, lacking access to the desired market.

You can also perceive it as hostile and in line with dumping practices, where a high volume of product is dumped into a market at cheap prices.

But besides these tactical aspects, which are no doubt being utilized, there’s a inescapable technological reality that obviously efficiency of AI will improve, and the most efficient designs would seem to rise to the top. This utilization of and guiding of inevitable historical trends for their own advantage is a very Chinese communist dialectical materialist approach to take, and I think we can expect to see more of these types of ‘surprising’ moves by entities out of China in the decades ahead as these kind of competitions heat up. The Chinese have a very deep and a very different ideological background that would justify these types of moves as making perfect sense to them, although they simultaneously appear as nonsensical to people from other backgrounds.


.

Ballsy of him to say some of that in China as a Chinese subject!

I feel like the reaction of the west is protecting him from a reaction from Chinese authorities


fyi Yann LeCun, Chief AI Scientist at Meta, said:

“To people who see the performance of DeepSeek and think: ‘China is surpassing the US in AI.’ You are reading this wrong. The correct reading is: ‘Open source models are surpassing proprietary ones.’ DeepSeek has profited from open research and open source (e.g., PyTorch and Llama from Meta). They came up with new ideas and built them on top of other people’s work. Because their work is published and open source, everyone can profit from it. That is the power of open research and open source.”

[1] https://www.forbes.com/sites/luisromero/2025/01/27/chatgpt-d...


Lol this is almost comical.

As if anyone riding this wave and making billions is not sitting on top of thousands of papers and millions of lines of open source code. And as if releasing llama is one of the main reasons we got here in AI…


I’m almost shocked this spooked the market as much as it did, as if the market was so blind to past technological innovation to not see this coming.

Innovation ALWAYS follows this path. Something is invented in a research capacity. Someone implements it for the ultra rich. The price comes down and it becomes commoditized. It was inevitable that “good enough” models became ultra cheap to run as they were refined and made efficient. Anybody looking at LLMs could see they were a brute forced result wasting untold power because they “worked” despite how much overkill they were to get to the end result. Them becoming lean was the obvious next step, now that they had gotten pretty good to the point of some diminishing returns.


sure, but what nobody expected how QUICKLY the efficiency progress has been - aviation took about 30 years to progress from "the rich" to "everybody", personal computers about 20 years (from 1980s to 2000s), I think the market expected to have at least 10 years of "rich premium" - not 2 years and get taken to the cleaners by the economic archenemy, China

The Google transformer paper was 2017. ChatGPT was the “we can give a version away of this for free.” Llama was “we can afford to give away the whole product for free to even the playing field.” Every tech giant comes out with a comparable product simultaneously. And now a hedge fund, not even a megacap company, can churn out a clone by hiring a small or medium size engineering team.

Really this should be an indictment of corporate bloat, having hundreds of thousand headcount companies distracted by performance reviews, shareholders, marketing, rebuilding the same product they launched two years ago under a new name.


Transformer paper was 2017

>Really this should be an indictment of corporate bloat, having hundreds of thousand headcount companies distracted by performance reviews, shareholders, marketing, rebuilding the same product they launched two years ago under a new name.

Yeah.

There are some shorter words or acronyms for it though, roughly equivalent to your about 30-word paragraph above:

IBM DEC Novell Oracle MS Sun HP ... MBA , all in their worse days or incarnations or ...


Anyone who's ever read Kurzweil isn't surprised.

The notion I now believe more fully is that the money people - managers, executives, investors and shareholders - like to hear about things in units they understand (so money). They don't understand the science, or the maths and in so much as they might acknowledge it exists it's an ambient concern: those things happen anyway (as far as they can tell), and so they don't know how to value them (or don't value them).

Because we saw, what a week ago the leading indicator that the money people were now feeling happy they were in charge which was that weird not-government US$500 billion investment in AI announcement. And we saw the same being breathlessly reported when Elon Musk founded xAI and had "built the largest AI computer cluster!"...as though that statement actually meant anything?

There was a whole heavily implied analogy going on of "more money (via GPUs) === more powerful AIs!" - ignoring any reality of how those systems worked, their scaling rules or the fact that inferrence tended to run on exactly 1 GPU.

Even the internet activist types bought into this, because people complaining about image generators just could not be convinced that the Stable Diffusion models ran locally on extremely limited hardware (the number of arguments where people would discuss this and imply a gate while I'm sitting their with the web GUI in another window on my 4 year old PC).


I would generally agree, but the market isn't rational about the future prospects of a company. It's rational about "can I make money off this stock" and nothing else matters in the slightest.

Riding hype, and dumping at the first sign of issues, follows that perfectly well.


> I’m almost shocked this spooked the market as much as it did, as if the market was so blind to past technological innovation to not see this coming.

Regulatory capture only benefits you nationally. You might even get used to it.


Sure but it's good to recognize Meta never stopped publishing even after Openai and deepmind most notably stopped sharing the good sauce. From clip to dinov2 and llama series, it's a serious track to be remembered.

But there is a big difference, llama is still way behind chatgpt and one of the key reasons to open source it could have been to use open source community to catch up with chatgpt. Deepseek on contrary is already at par with chatgpt.

Llama is worse than gpt4 because they are releasing models 1/50th to 1/5th the size.

R1 is a 650b monster no one can run locally.

This is like complaining an electric bike only goes up to 80km/h


R1 distills are still very very good. I've used Llama 405b and I would say dsr1-32b is about the same quality, or maybe a bit worse (subjectively within error) and the 70b distill is better.

What hardware do you need to be able to run them?

The distils run on the same hardware as the llama models they are based on llama models anyway.

The full version... If you have to ask you can't afford it.


Yea no shit, that's because meta is behind and Noone would care about them if it wasn't open source

Right, so it sounds like it's working then given how much people are starting to care about them in this sphere.

We can laugh at that (like I like to do with everything from Facebook's React to Zuck's MMA training), or you can see how others (like Deepseek and to a lesser extent, Mistral, and to an even lesser extent, Claude) are doing the same thing to help themselves (and each other) catch up. What they're doing now, by opening these models, will be felt for years to come. It's draining OpenAI's moat.


How's that old chestnut go? "First they laugh at us..."?

There's no need to read it uncharitably. I'm the last person you can call a FB fan, I think overall they're a strong net negative to society, but their open source DL work is quite nice.

Just to add on the positive side: their quarterly meta threats report is also quite nice.

This. Even their less known work is pretty solid[1] ( used it the other day and was frankly kinda amazed at how well it performed under the circumstances ). Facebook/Meta sucks like most social madia does, but, not unlike Elon Musk, they are on the record of having some contributions to society as a whole.

[1]https://github.com/facebook/zstd


<< And as if releasing llama is one of the main reasons we got here in AI…

Wait.. are you saying it wasn't? Just releasing it in that form was a big deal ( and heavily discussed on HN, when it happened ). Not to mention, a lot of the work that followed on llama partly because it let researches and curious people dig deeper into internals.


Yann LeCun also keeps distorting what open source is. Neither Llama nor DeepSeek are open source, and they never were. Releasing weights is not open source - that’s just releasing the final result. DeepSeek does use a more permissive license than Llama does. But they’re not open source because the community does not have the necessary pieces to reproduce their work from scratch.

Open source means we need to be able to reproduce what they’ve built - which means transparency on the training data, training source code, evaluation suites, etc. For example, what AI2 does with their OLMo model:

https://allenai.org/blog/olmo2


Deepseek R1 is the closest thing we have to fully open-source currently. Open enough that Huggingface is recreating R1 completely out in the open. https://github.com/huggingface/open-r1

What they’re recreating is the evidence that some of the techniques work. But they’re starting with R1 as the input into those steps, not starting from scratch. I don’t think their work includes creating a base model.

The fundamental problem is that AI depends on massive amounts of IP theft. I’m not going to argue if that’s right or wrong, but without it we won’t even have open weights models.

IPv4 or IPv6?

I don’t buy this at all. If DeepSeek can surpass proprietary models by “profiting” from open research and open source, why couldn’t the proprietary models do the same? Companies making proprietary models have the advantage of using w/e is out there from the open source community AND the proprietary research they have been working on for years.

> If DeepSeek can surpass proprietary models by “profiting” from open research and open source, why couldn’t the proprietary models do the same?

They can “profit” (benefit in product development) from it.

They just can't profit (return gains to investors) much from it, because that requires a moat rather than a market free for all that devolves into price competition and drives market clearing price down to cost to produce.


Yes but in proprietary research you've got fewer peers to bounce ideas off of, and you've got extra constraints to deal with re: coming up with something that's useful in tandem with whatever other proprietary bits are in your stack.

All that cloak and dagger stuff comes at a cost, so it's only worth paying if you think you can maintain your lead while continuing to pay it. If the open source community is able to move faster because they are more focused on results than you are, you might as well drop the charade and run with them.

It's not clear that that's what will happen here, but it's at least plausible.


> DeepSeek can surpass proprietary models by “profiting” from open research and open source, why couldn’t the proprietary models do the same?

DeepSeek did something legitimately innovative with their addition of Group Relative Policy Optimization. Other firms are certainly free to innovate as well.


That argument doesn't go anywhere. It's like asking, if the Chinese could do it, why couldn't the Americans?

They just didn't.


But it sounds like, from that quoted statement, that LeCun from Meta thinks “open sourced work” is why China was able to surpass (or at least compete with) American AIs. Which sounds like a lame excuse for Meta.

Putting too much thought into the statement Meta's chief AI scientist made about how the new AI innovation is actually because of Meta is probably not going to be fruitful.

I think we should hold ourselves to a higher standard than this. I don’t see why we couldn’t apply reasoning to this question just like any other.

sunk cost fallacy / tunnel vision of their existing approaches.

If training runs are now on the $6MM/run for SOTA model scale, I think on the contrary: closed labs are screwed, in the same way that Linux clobbered Windows for server-side deployments. Why couldn't Windows just copy whatever Linux did? Well, the codebases and research directions diverged, and additionally MS had to profit off of licensing, so for wide-scale deployments Linux was cheaper and it was faster to ship a fix for your problem by contributing a patch than it was to beg and wait for MS... Causing a virtuous cycle (or, for Microsoft, a vicious cycle) where high-tech companies with the skills to operate Linux deployments collaborated on improving Linux, and as a result saw much lower costs for their large deployments, while also having improved flexibility, which then incentivized more companies to do the same. The open models are becoming much cheaper, and if you want something different you can just run your own finetune on your own hardware.

Worse for the proprietary labs is how much they've trumpeted safety regulations. They can't just release a model without extensive safety testing, or else their entire regulatory push falls apart. DeepSeek can just post a new model to Hugging Face whenever they feel like it — most of their Tiananmen-style filtering isn't at the model level, it's done manually at their API layer. Ditto for anyone running finetunes. In fact, circumventing filtering is one of the most common reasons to run a finetune... A week after R1's release, there are already uncensored versions of the Llama and Qwen distills published on HF. The open source ecosystem publishes faster.

With massively expensive training runs, you could imagine a world where model development remained very centralized and thus the few big labs would easily fend off open-source competition: after all, who would give away the results of their $100MM investment? Pray that Zuck continues? But if the training runs are cheap... Well, there are lots of players who might be interested in cutting out the legs from the centralized big labs. High Flyer — the quant firm that owns DeepSeek — no longer is dependent on OpenAI for any future trading projects that use LLMs, for the cost of $6MM... Not to mention being immune from any future U.S. export controls around access to LLMs. That seems very worthwhile!

As LeCun says: DeepSeek benefitted from Llama, and the next version of Llama will likely benefit from DeepSeek (i.e. massively reduced training costs). As a result, there's incentive for both companies to continue to publish their results and techniques, and that's bad news for the proprietary labs who need the LLMs themselves to be profitable and not just the application of LLMs to be profitable... Because the open models will continue eating their margins away, at least for large-scale deployments by competent tech companies (i.e. like Linux on servers).


> Why couldn't Windows just copy whatever Linux did?

They kinda did: https://en.wikipedia.org/wiki/Azure_Linux


Azure Linux is Linux. Microsoft is one of the biggest contributors to Linux in general, in terms of commits/release, and has been for a lot of years now. That doesn't mean Windows is doing what Linux did - Windows is largely still entirely different from Linux at both the kernel and user's pace level, and improvements in one have little to no bearing on the other.

I'm still not sure why they keep LeCun at Facebook; his single most-cited contribution to the field in 2024 has been with NYU[0], not Facebook. What is his role at Facebook exactly, has he explained it? I recall him making all the wrong predictions in 2023 what's changed? Chollet is similarly a mystery to me; it feels like these guys were busy riffing CNN's when the Transformer came about and since then have been trying to far-out in search of gold.

[0]: https://arxiv.org/abs/2406.16860


Muddling the term 'open source' is one of his latest achievements, for example.

I'm also a bit unclear on why LeCun is so well regarded. I've nothing against him, and his opinions shared on Twitter seem eminently sensible, but at the end of the day it seems his main accomplishment (and/or credit assignment) was inventing CNNs back in the 80's and using them for reading handwriting on checks.

Looking back at the PDP handbook, it's not even clear that LeCun deserves the credit for CNNs, and he himself gives credit for the core "weight sharing" idea to Rumelhart.

Chollet's claim to fame seems to be more as creator of Keras than researcher, which has certainly been of great use to a lot of people. He has recently left Google and is striking out to pursue his own neuro-symbolic vision for AGI. Good luck to him - seems like a nice and very smart guy, and it's good to see people pursuing their own approaches outside of the LLM echo chamber.


What makes "open source" DeepSeek fundamentally different that is a marvel that it surpassed proprietary models?

It's not and it hasn't surpassed GPT. A lot of that is headline hype.

They literally used GPT and Llama to help build DeekSeek, it responds thinking that it's GPT in countless queries (which people have been posting screenshots of). They 'cheated' exactly as Musk did to build xAI's model/s. So much of this is laughable scaremongering and it's absolutely not an accomplishment of large consequence.

It's a synth LLM.


Though it is still a fascinating result that shows that the giant frontier models could be made much more efficient, and how to do so.

honestly reads like someone trying to justify his massive salary to his boss who is realizing he can just hire someone for 30x less money.

isn't LeCun basically admitting that he and his team didn't have the creative insight to utilize current research and desperately trying to write off the blindside with exceptionalism?

not a good look tbh


It's like saying that a diesel engine is 6x more efficient than a steam engine, so the guys who spent time working on steam engines just wasted their time and money.

The thing is that the steam engine guys researched thermodynamics and developed the mechanics and tooling which allowed the diesel engine to be invented and built.

Also, for every breakthrough like DeepSeek which is highly publicized, there are dozens of fizzled attempts to explore new ideas which mostly go unnoticed. Are these wasted resources, too?


> Are these wasted resources, too?

Given your take, this is a meaningless question, no?

As you point out, all resource usage that lead up to the creation of the diesel engine were necessary preconditions. While one might be able to imagine a parallel universe where the diesel engine was created in another way without all the things in between that might feel like a waste, that is not this universe. In this one, it took what it took.

Same goes for AI. That AI researcher had to eat that sandwich double wrapped in plastic, subsequently placed in another plastic bag in order to get to where he got. Which might feel like a "waste of resources". I am sure you can easily imagine a parallel universe where he didn't eat something that used up so much plastic. But that was the precondition necessary in this universe.

So, ultimately, either everything is a waste of resources or nothing is. And there is no meaning in trying to find a distinction between those two.


Yes.

Would this extrapolate to the thousands of lightbulb prototypes it took to arrive at the first working one? Rinse repeat for your preferred innovation.

Resource allocation in this context isn’t at all binary.


LeCun has nothing to do with LLamA ... that was built by Meta's GenAI group.

LeCun is in a different part of the organization - FAIR (FaceBook AI Research), and isn't even the head of that. He doesn't believe that LLMs will lead to AGI, and is pursuing a different line of research.


Meh. It's not as if OpenAI is unable to access open source. The delta is not in open source but in DeepSeek talent.

DeepSeek is a "side project" run by High-Flyer, a Chinese quantitative hedge fund. They have no interest in directly competing with LLM providers like OpenAI and Anthropic. If anything, they're likely trying to commoditize their complement in a way not all that dissimilar from Meta's approach.

> If anything, they're likely trying to commoditize their complement in a way not all that dissimilar from Meta's approach.

Thanks. Great observation. Sounds indeed extremely plausible that they use the LLM for automated data cleaning.


I wonder if they shorted NVDA before releasing the model?

wouldn't that be outsider trading ?

'... that's right people, forget SPARKs, this season the 'it crowd' are spinning up companies solely to create turmoil so that they can short stocks.'

more of a pivot, China started cracking down heavily on quants in 2024

I'm curious about this. Two articles I've read all but said they basically failed as a quant and lost more than they gained. The wiki points out some losses, but some wins, so is unclear.

Have they actually pivoted, or are they just messing around to see what sticks?


Didn't they crack down mostly on HFT? I haven't heard of a huge crackdown on low/medium frequency quants, and LLM research has low crossover with high freq. quant stuff

almost all quant work is 'HFT'

Absolutely not. The large majority quant work is mid frequency, on the other of seconds to minutes.

This is wrong. There are plenty of quant strategies that aren't a race.

most quant money/employment is among market makers who are not holding for longer than a day and most trades probably complete within a few seconds.

regardless, high-flyer is an HFT firm


Trading within a few seconds is not really considered HFT, it's mid-freq nowadays. High frequency is in microseconds end to end nowadays.

High-Flyer says it took directional bets and held positions, which makes at least part of it not HFT.

Also, I doubt that most quant money is in market making nowadays. That was true at some point and that's true of HFT, but I doubt it is of quant trading in general anymore.

Besides, High Flyer certainly isn't a market maker, or they wouldn't be a hedge fund. You can't really be both, hence with Citadel and Citadel Securities (Market Maker) are so strictly divided.


High-Flyer AUM is $7B, which is not a large hedge fund. It's deepseek division is probably higher value than the AUM(not even the hedge fund's value) if it goes to market. They probably have billions of dollars of GPUs.

They probably have tens of millions of dollars of GPUS. DeepSeek isn't an original model, it's a synthetic built by using GPT and Llama etc. That's how they did it so relatively inexpensively. Their accomplishment isn't riding on the back of billions of dollars of their investment into datacenters and GPUs.

Even in 2020, before GPT came they had $100 millions worth of GPU[1]. Now I am willing to bed it is above a billion dollar. We will never know as they likely have illegal GPUs due to export restrictions.

[1]: https://en.wikipedia.org/wiki/High-Flyer


Something tells me it runs a bit deeper than that. Economics can be a very effective weapon.

such people are trained in identifying opportunities and turning that into money or power. they are not giving their stuff away without a strategy.

If they're a hedge fund they're probably trying to tank the US AI stocks so they can buy the dip and then in a few days/weeks it is back to business as usual.

I don't personally buy their story, and after having used Deepseek it kind of sucks and hallucinates a lot if I'm being objectively honest.

I mean a few million for this is okay - that's cool.. but it is useless. I can understand billions of dollars into something that actually works >50% of the time.


If you’re expecting to pop a bubble I think you’d buy options ahead of time to take advantage, instead of waiting for a recovery that may never come.

Well, nvidia stocks already up 10%

I know it is too early, but I'd not be surprised if this was CCP intervention using a hedge fund to try and tank US AI stocks for a specific reason.

I mean again, just being objectively honest, Deepseek kind of sucks and is maybe on par with early-2023 era models.


No, they aren’t publishing all their secret sauce. For example, we have no idea how their baseline model was trained. They’ve not said anything about the data or code relating to this training. They have talked about some of the optimization techniques they’ve used in arriving at their final models that they released weights for, but their claims on cost seem suspicious because we don’t know what prior work they built on. I’ve seen many people sharing evidence that DeepSeek’s models seem to think they are OpenAI models, which supports the theory that DeepSeek first built a baseline trained off the outputs of other models. DeepSeek also likely has a much larger number of GPUs than what they’ve admitted, perhaps to avoid attention on their suppliers who may have violated sanctions.

The number of GPUs they have (which may well be export-legal H800's as NVidia believe they are) goes hand in hand with the amount it cost to train (however you define that), and is something people trying to replicate their approach can verify (or not).

It seems obvious that you need to have a model trained, or fine-tuned, on some reasoning data (with backtracking etc) such that reasoning behavior is part of it's repertoire, before you can use RL to hopefully get it to use such reasoning pursuant to whatever goals you are setting. I'd not be surprised if they used O1 outputs to bootstrap the model in this way, although O1's reasoning traces are a deliberate obfuscation of what it is really doing (an after-the-fact summary) so even if this is the case that should be borne in mind!

OTOH, while reasoning data may be scarce in the wild, it's presumably not entirely unavailable, and/or DeepSeek may have created some themselves, so who knows what mix DeepSeek used for this initial bootstrapping stage. As you say, this aspect remains as "secret sauce".

Of course once they've got their first stage model trained they then use that to generate data for the second/final stage.


We get free Ai from a hedge fund and $200/month Ai from a nonprofit.

I hope the hedge fund shorted NVDA to make some good money along the way too hahaha!

How are you liking 2025?

This is not coming from a big corporation. These people need to establish their authority, or nobody will believe what they're doing. So it makes sense that they publish their ideas and open source the result. Now they have the attention and can play with their cards.

DeepSeek and their quant/algotrading parent company have years of experience in raw C/C++ CUDA programming and low-level CUDA optimization. That is one of the main reasons they could do model training and serve inference so effectively and cheaply. That hard-earned experience is not something they have shared publicly.

>am I right to feel like they're... publishing all their secret sauce?

This would make perfect sense if the goal is to devalue existing players more than it is capture the market.


DeepSeek probably can't compete with Open Ai in terms of scaling their data centers due to the bans, so why bother?

If they did not open source it and instead just launched a payed (albeit much cheaper) closed model with similar performance to O1, would people trust them?

I don't think DeepSeek has any malicious intent, but boy oh boy am I glad the USA boys get wrekt by this (though I also lose money on stocks).

This is just poetic justice for the Orange Man's backwards 17th century policies.


Yes, same here. As a European, I used to feel we (USA and Europe) were on the same side, the West, since they/you did save us from the baddies 70 years ago...

But who's the baddies now? China is not waging war everywhere. Or threatening to steal Greenland... Or ruining our teenagers with social media.


> But who's the baddies now?

Russia is currently invading Europe, to the tune of hundreds of thousands KIA. And Russia's invasion would be dead in the water without Chinese support.


> And Russia's invasion would be dead in the water without Chinese support

To paraphrase the Chinese rep on the UN. If China indeed supported Russia, then this war would have ended by now.


There are obviously degrees of support. Sure, China could support Russia more than they are now, but they'd be risking their ability to trade with Europe and the US, so they don't.

Agree with the other commenter here re: degrees of support. If the US had fully supported Ukraine the war would also be over (unless China stepped in to help their nominal ally)

Russia is obviously has a very strong military, but economically they are dwarfs. And if you subtract natural resources they are basically irrelevant. China and the US are economically dominating everything.

The biggest trade partner of China is US...

Okay maybe they do too, with TikTok. But still.

And trade against the devaluations...

Indeed.

Not only that, I also enjoy their chain of thought being completely transparent for the user. I'm very curious what Altman is doing right now...

preparing for o3 release

.. that'll be caught up in weeks

I wonder if he still gets his billions for Stargate. I'm sure softbank is regretting that decision big time

SoftBank is very used to regretting decisions. Very used to.

I feel like SoftBank investing in Stargate should have been the number one red flag

Personally I'm very curious about the future of the Pro plan.

DeepSeek is a company whose funds comes from a edge fund. If the edge fund has predicted the impact of all these releases correctly, they have likely made tons of money while at the same time advanced Chinese interests and prestige abroad.

It seems a great move.


Are you French?

Close. I am Italian.

I am sorry if my English isn't great... and, yes, sometimes I do use voice to text. Android is particularly good at messing up what I want to say.


And using voice to text?

They could make a ton of money shorting NVDA and releasing the paper. The most honest short position ever!

The strategic move by China to release a free, open, DeepSeek model that’s way cheaper to train and use, and that resulted in a $1 trillion market loss on the news, has a clear objective to destroy the US AI companies that had leapt ahead of China in commercial offerings.

No company with an open source solution shows their hand. I said the same thing about AI music stuff, bet you 1000% that record studios have been pounding away at AI generated music/lyricals for years now well before the likes of models we see today.

Any perception that OpenAI are playing catch-up, and moreover by taking ideas, will have a great negative impact.

Anyway, we may be past peak OpenAI at this juncture.


This is a bit of conspiracy theory, but there's a theory that China's strategic goal isn't to have an all-encompassing Chinese OpenAI that can charge a shit ton of money. The goal is to prevent US companies from doing so.

But why? As a reply to the sanctions? You try to hamstring us, we'll burst your state protected NVDA?

Because it would be economically destructive to everyone except those large companies if it were to happen.. and the everyone here includes China.

That's not really a conspiracy theory. It's a pretty much the core of China's industrial strategy. Projects like "Made in China 2025" are explicitly about developing their own innovation base to avoid being reliant on foreign companies. AI is one of the areas where this was most urgent.

It doesn't matter if they are not open sourcing their models they are already loosing the game, sure they are making money now.

they're trying to show that a good part of the economic assumptions behind the AI boom are flawed. That is, dramatic increase in energy and chip demand.

no, they (like others) publish very little details about their training data.

The secret sauce is the data.

I wouldn't hold my breath on getting access to it.


Just about anything useful in the secret sauce data can be distilled from the model by inspecting the logits; for example, they published distills using Llama 3.1 70b as a base, Qwen 32b, etc etc.

There is no "secret" sauce. Only sauce.

Additionally, R1-Zero shows that you don't even really need much secret sauce data, since they trained it with zero SFT data. Take an existing base model, do GRPO RL, and tada: you have a SOTA reasoning model. SFT data improves it, but the secret sauce isn't in the data.


Indeed. Litigation exposure is just too great when releasing the training data.

do you think the chinese government will allow them to spill the secret sauce? the secret sauce to what is eventually going to be weapon? clear no.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: