Hacker News new | past | comments | ask | show | jobs | submit login

how is this open vs whatdeepseek did?



From that article:

> The release of DeepSeek-R1 is an amazing boon for the community, but they didn’t release everything—although the model weights are open, the datasets and code used to train the model are not.

> The goal of Open-R1 is to build these last missing pieces so that the whole research and industry community can build similar or better models using these recipes and datasets.


Genuine question, but how do you replicate the effort exactly without $5M in compute? and can you test that the published weights etc are actually those in the model?

Am I missing something?


The $5.5m in compute wasn't for R1, it was for DeepSeek v3.

The R1 trick looks like it may be a whole lot cheaper than that. R1 apparently used just 800,000 samples - I don't fully understand the processing needed on top of those samples but I get the impression it's a whole lot less compute than the $5.5m used to train v3.


deepseek claims they are "open source" but they are not. They are open weight.

IMO a truly "open" AI model should have 3 components publicly available: the weights, the code, and the dataset.

Without all 3 the model is not reproducible. Could make the argument that the code and data are sufficient though.


No one will release the dataset because we all know it is gathered through dodgy means.


And likely massaged by the CCP.


Deep mind has one set of censorship, OpenAI another, anything musk does a third

It’s all “massaged”


I don’t think “Taiwan is China” is the same kind of massaging as not telling people how to make napalm…

What a weird thing to equate, though!


But Taiwan is China. Even the USA acknowledge that.


No, they’ve acknowledged the policy. The US has not “agreed” with the policy, which is different.

Also, as someone in the US I can very safely say that Taiwan is not China and have no concern for my safety.


> Also, as someone in the US I can very safely say that Taiwan is not China and have no concern for my safety.

But you will be arrested if you stage a peaceful pro-Palestine protest asking for an end to the ongoing genocide.

Or even worse, if you say that Palestine is not Israel.


No I won't, in either case.

Watch: Palestine is not Israel.

Look, nobody swooping in from the rafters to lock me up. I have zero worries the government will do anything at all about my making this claim.

And if staging peaceful pro-Palestine protests result in arrests, what happened here?

https://en.wikipedia.org/wiki/National_March_on_Washington:_...

or here?

https://en.wikipedia.org/wiki/March_on_Washington_for_Gaza

Or does that not fit your narrative?


> Watch: Palestine is not Israel.

Yes, that works because you're an anon and nobody really cares. Try to publicly make that statement if you're in any relevant position and you'll very quickly be looking for a new job, if you can ever find it.

> And if staging peaceful pro-Palestine protests result in arrests, what happened here?

Be honest, you can literally google "Palestine protest arrests" and get more results than you could process in a while. You presenting a couple examples doesn't negate the many other protests ended in mass arrests.


https://en.wikipedia.org/wiki/Rashida_Tlaib would like a word.

She would not be a politician (or even alive) if any of what you claim is true. You claimed that the US government censors people who speak out against Israel's occupation of Palestine, and specifically that saying Palestine isn't Israel would not be possible in the United States in the same way that saying, for example, Xi Jinping looks like Winnie the pooh is censored in China.

This is, of course, completely false, and demonstrably so by observing the protests I just linked (of which there are thousands, not a few), and the statements Rep. Tlaib, a Palestinian American and member of the US government, regularly says on the national stage.

The equivocation of Chinese censorship and Western censorship simply doesn't work.


Nice, you even have a token irrelevant politician without any power, perfect to use as example that all is allowed in the free US of A.

I'll reply with a few actual examples of what I mean:

- https://www.insidehighered.com/news/faculty-issues/academic-...

- https://www.theguardian.com/us-news/2024/oct/24/university-p...

- https://www.thecrimson.com/article/2024/1/3/claudine-gay-res...

- https://hwsherald.com/2024/04/14/jodi-dean-suspended-from-te...

I think western propaganda is overall the cleverest, because it manages to completely marginalize and silence any non-aligned opinion, while at the same time convincing you that you are completely free to have said opinion.


Why do you think anything you've just linked is at all related to this conversation? A system must be perfect to be good? That's an insane bar that is not the actual standard.

And if you think a US representative is powerless then you completely fail to understand how the US government actually works.


It is though. Western AI tries to hide information like that with the justification of safety as well as things that might be offensive to current popular beliefs. Chinese AI presumably says Taiwan is China to help get more people on side for a possible future invasion. Propaganda does work - look at how many people think Donbas is still Ukraine and Israel is still Palestine.


The difference is that in China the info isn’t available without use of Western content, due to the totalitarian control over media, whereas in the West, information is pretty trivially available, even if the big companies keep it off of their platforms.

And sure ignorance is prevalent, but even GPT4 will tell me Donbas is still Ukraine, for instance. What a strange example to use, though!


"GPT4 will tell me Donbas is still Ukraine"

But is it though? What's really the meaning of which country a region belongs to? Once somewhere has been occupied long enough, it usually becomes de-facto theirs. But how long is long enough? Other countries either do or don't recognize it and usually a consensus is reached, but not always.


If western governments are so tolerant and permissive with information, I wonder why can't I access RT in Europe?


Because you don’t know how to use a Western VPN?


Isn't it the same in China then? They can also use VPNs


...and where, pray tell, do they VPN into?


That's what the Open Source AI Definition states https://opensource.org/ai

In any case, Deepseek like Llama fail much before hitting that new definition. Both have licenses containing restrictions on field of use and discrimination of users. Their license will never be approved as Open Source.


This nitpicking is pointless.

DeepSeek's gifts to the world of its open weights, public research and OSS code of its SOTA models are all any reasonable person should expect given no organization is going to release their dataset and open themselves up to criticism and legal exposure.

You shouldn't expect to any to see datasets behind any SOTA models until they're able to be synthetically generated from larger models. Models only trained on sanctioned "public" datasets are not going to perform as well which makes them a lot less interesting and practically useful.

Yes it would be great for their to be open models containing original datasets and a working pipeline to recreate models from scratch. But when few people would even have the resources to train the models and the huge training costs just result in worse performing models, it's only academically interesting to a few research labs.

Open model releases should be celebrated, not criticized with unreasonable nitpicking and expectations that serves no useful purpose other than discouraging future open releases. When the norm is for Open Models to include their datasets, we can start criticizing those that don't, but until then be gracious that they're contributing anything at all.


Terminology exists for a reason. Doubly so for well-established terms of art that pertain to licensing and contract law.

They could have used "open wights" which would have conveyed the company's desired intent just as well as "open source", but without the ambiguity. They deliberately chose to misuse a well established term instead.

I applaud and thank deepseek for opening their weights, but i absolutely condemn them and others (e.g Facebook) for their deliberate and continued misuse of the term. I and others like me will continue to raise this point as long as we are active in this field, so expect to see this criticism for decades.

Hopefully one of these companies losses a lawsuit due to these shenanigans. Perhaps then they wouldn't misuse these terms so brazenly.


> i absolutely condemn them and others (e.g Facebook) for their deliberate and continued misuse of the term

This is the kind of inconsequential nitpicking diatribe I'm referring to. When has "open data" ever meant Open Source?

> They deliberately chose to misuse a well established term instead.

Their model weights as well as their repositories containing their technical papers and any source code are published under an OSS MIT license, which is the reason why initiatives like this looking to reproduce R1 are even possible.

But no, we have to waste space in every open model release complaining that they must be condemned for continuing to use the same label the rest of the industry uses to describe their open models which are released under an OSS License as Open Source - instead of using whatever preferred unused label you want them to use.


We're talking past each other at this point. I believe both our positions have been adequately presented. Cheers.


“Gifts to the world”?

What a strange thing to say…


The only one I’ve see people talk about that shares all the components is OLMo (https://allenai.org/blog/olmo2)


There are more, like the work by Eleuther AI and LLM360.


Only meaningful if code+data deterministically reproduce the weights.

At that point, the weights are just the cached output. Which has value since it's costly to produce from code+data.


I don't think it needs to be deterministic - and if it isn't, having the data and code becomes even more important!

Compilers generally aren't deterministic (see the reproduceable build movement), yet we still use their output binaries.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: