I've tried out DeepSeek on deepseek.com and it refuses conversations about several topics censored in China (Tiananmen, Xi Jinping as Winnieh-the-Pooh).
Has anyone tried if this also happens when self-hosting the weights?
I haven't tried that base model yet but I have tried with the coder model before and experienced similar things. A lot of refusals to write code if the model thought that it was unethical or could be used unethically. Like asking it to write code to download images from an image gallery website would work or not depending on what site it thought it was going to retrieve from.
When I try out the topics you suggest at the huggingface endpoint you link, the answer is either my question translated into Chinese, or no answer when I prompt the model in Chinese:
Interesting - I can't speak to the Huggingface endpoint. I downloaded the 4-bit GGUF model locally and ran it through Oobabooga with instruct-chat template - I expressed my questions in English.
Chinese IP are not allowed to use ChatGPT.
Chinese credit card is not allowed for OpenAI API.
Source: my own experience.
What puzzles me most is the second restriction. My credit card is accepted by AWS, Google, and many other services. It is also accepted by many services which use Stripe to process payments.
Perhaps they are unwilling to operate in a territory where they would be required to disclose every user's chat history to the government, which has potentially severe implications for certain groups of users and also for OpenAI's competitive interests.
> Chinese credit card is not allowed for OpenAI API.
A lot of online services don't accept Chinese credit cards, hosting providers for instance, so I don't think that is specific to OpenAI. The reason usually given for this is excessive chargebacks of (in the case of hosting) TOS violations like sending junk mail (followed by a charge-back when this is blocked). It sounds a like collective punishment a little: while I don't doubt that there are a lot of problem users coming from China, with such a large population that doesn't indicate that any majority of users from the region are a problem. I can see the commercial PoV though: if the majority of charge-back issues and related problems come from a particular region and you get very few genuine costumers from there¹ then blocking the area is a net gain despite potentially losing customers.
----
[1] due to preferring local variants (for reasons of just wanting to support local, due to local resources having lower latency, due to your service being blocked by something like the GFW, local services being in their language, any/all the above and more)
It's definitely not a commercial thing but political.
I'm located in Hong Kong and using Hong Kong credit cards have never been a problem with online merchants. I don't think Hong Kong credit cards are particularly bad with chargebacks or whatever. OpenAI has explicitly blocked Hong Kong (and China). Hong Kong and China, together with other "US adversaries" like Iran, N. Korea, etc are not on OpenAI's supported countries list.
If you have been paying attention, you'll know that US policy makers are worried that Chinese access to AI technology will pose a security risk to the US. This is just one instance of these AI technology restrictions. Ineffectual of course given the many ways to workaround them, but it is what it is.
I don't understand, if ChatGPT is blocked by the firewall, how do you know that ChatGPT is blocking IPs in return? Are there chinese IP ranges that are not affected by censorship that a citizen can use?
Okay but the point is that ChatGPT is blocked by the firewall.
EDIT: I read the comment below about Hong Kong, but I can't reply because I'm typing too fast by HN standards, so I'm writing it here and yolo: "I'm from Italy and I remember when ChatGPT was blocked here after the Garante della Privacy complaint, of course the site wasn't blocked by Italy but OpenAI complies with local obligations, so maybe it could be a reason about the block. API were also not blocked in Italy."
EDIT 2: if the website is not actually blocked (the websites that check if a website is reachable by mainland China lied to me) then I guess they are just complying to local regulations so that the entire website does not get blocked.
it's not blocked by the firewall. i'm in china and i can load openai's website and chatgpt just fine. openai just blocks me from accessing chatgpt or signing up for an account unless i use a VPN and US based phone number for signup
as in, if i open chat.openai.com in my browser without a VPN, from behind the firewall, i get an openai error message that says "Unable to load site" with the openai logo on screen
if the firewall blocks something the page just doesn't load at all and the connection times out
In so far as Hong Kong IPs are "Chinese IPs", we can access OpenAI's website, but their signup and login pages blocks Hong Kong phone numbers, credit cards and IP addresses.
Curiously, the OpenAI API endpoints works flawlessly with Hong Kong IP addresses as long as you have a working API key.
ChatGPT was not blocked by the GFW when it first released for a few weeks (if not months, I don't remember), but at that time OpenAI already blocked China.
The geo check only happened once during login at that time, with a very clear message that it's "not available in your region". Once you are logged in with a proxy you can turn off your proxy/VPN/whatever and use ChatGPT just fine.
OpenAI does not allow users from China, including Hong Kong.
Hong Kong generally does not have a Great Firewall, so the only thing preventing Hong Kong users from using ChatGPT is Open AI's policy. They don't allow registration from Hong Kong phone numbers, from Hong Kong credit cards, etc.
I'd say it's been pretty deliberate.
Reason? Presumably in alignment with US government policies of trying to slow down China's development in AI, alongside with the chips bans etc etc.
Sounds plausible - this is in line with the modern trend to posture by sanctioninig innocent people.
Of course, the only demographic these restrictions can affect are casuals. Even I know how to cirumvent this; thinking that this could hinder a government agent - who surely have access to all the necessary infrastructure by default - is simply mental.
Now former board member was a policy hawk. One of big beliefs is that china is at no risk of keeping up with US companies, due to them not having the data.
I wouldn't be surprised if OpenAI blocking China is a result of them trying to prevent them from generating synthetic training sets.
i know how: you need a verified phone number to open an account, and open ai does not accept chinese phone numbers or known IP phone numbers like google voice.
they also block a lot of data center IP addresses, so if you're trying to access chatgpt from a VPN running on blacklisted datacenter IP range (a lot of VPN services or common cloud providers that people use to set up their own private VPNs are blacklisted), then it tells you it can't access the site and "If you are using a VPN, try turning it off."
Probably because of the cost of legal compliance. Various AI providers also banned Europe because until they were ready for GDPR compliance. China has even stricter rules w.r.t. privacy and data control: a lot of data must stay inside China while allowing authorities access. Typically implementing this properly requires either a local physical presence or a local partner. This is why many apps/services have a completely segregated China offering. AWS's China region is completely sealed off from the rest of AWS, and is offered through a local partner. Similar story with Azure's China region.
I have no idea, but yiyan is short for wenxinyiyan(文心一言), which roughly translates to character-heart-one-(speech/word). Maybe someone who is Chinese could translate it better. So I don't think the name has anything to do with the model.
I do wonder what their backend is. They have the same 3.5/4 version numbering scheme that ChatGPT uses, which could be just marketing (and probably is), but I wonder.
> Also recently released: Yi 34B (with a 100B rumored soon), XVERSE-65B, Aquila2-70B, and Yuan 2.0-102B, interestingly, all coming out of China.
most AI papers are from Chinese people (either from mainland China or from Chinese ancestry living in other countries). They have a huge pool of brains working on this.
If your GPU has ~16GB of vram, you can run a 13B model in "Q4_K_M.gguf" format and it'll be fast. Maybe even ~12GB.
It's also possible to run on CPU from system RAM, to split the workload across GPU and CPU, or even from a memory-mapped file on disk. Some people have posted benchmarks online [1] and naturally, the faster your RAM and CPU the better.
My personal experience is running from CPU/system ram is painfully slow. But that's partly because I only experimented with models that were too big to fit on my GPU, so part of the slowness is due to their large size.
I get 10 tokens/second on a 4-bit 13B model with 8GB VRAM offloading as much as possible to the GPU. At this speed, I cannot read the LLM output as fast as it generates, so I consider it to be sufficient.
Mine is a laptop with i7-11800h CPU + RTX 3070 Max-Q 8GB VRAM + 64GB RAM (though you can get probably get away with 16GB RAM). I bought this system for work and causal gaming, and was happy when I found out that GPU also enabled me to run LLMs locally at good performance. This laptop costed me ~= $ 1600, which was a bargain considering how much value I get out of it. If you are not on a budget, I highly recommend getting one of the high end laptops that have RTX 4090 and 16GB VRAM.
With my system, Llama.cpp can run Mistral 7B 8-bit quantized by offloading 32 layers to the GPU (35 total) at about 25-30 tokens/second, or 6-bit quantized by offloading all layers to the GPU at ~ 35 tokens/second.
I've tested a few 13B 4-bit models such as Codellama and got about 10 tokens/second by offloading 37 layers to the GPU. Got me about 10-15 tokens/second.
a CPU would work fine for the 7B model, and if you have 32GB RAM and a CPU with a lot of core you can run a 13B model as well while it will be quite slow. If you dont care about speed, it's definitely one of the cheapest ways to run LLMs.
* Qwen 72B (and 1.8B) - 32K context, trained on 3T tokens, <100M MAU commercial license, strong benchmark performance: https://twitter.com/huybery/status/1730127387109781932
* DeepSeek LLM 67B - 4K context, 2T tokens, Apache 2.0 license, strong on code (although DeepSeek Code 33B it benches better) https://twitter.com/deepseek_ai/status/1729881611234431456
Also recently released: Yi 34B (with a 100B rumored soon), XVERSE-65B, Aquila2-70B, and Yuan 2.0-102B, interestingly, all coming out of China.
Personally, I'm also looking forward to the larger Mistral releasing soon as mistral-7b-v0.1 was already incredibly strong for its size.