Still surprised that the $3000 NVIDIA Digits doesn’t come up more often in that and also the gung-ho market cap discussion.
I was an AI sceptic until 6 months ago, but that’s probably going to be my dev setup from spring onwards - running DeepSeek on it locally, with a nice RAG to pull in local documentation and datasheets, plus a curl plugin.
Call me naive, but I somehow trust them to deliver in time/specs?
It’s also a more general comment around „AI desktop appliance“ vs homebuilts. I’d rather give NVIDIA/AMD $3k for a well adjusted local box than tinkering too much or feeding the next tech moloch, and have a hunch I’m not the only one feeling that way. Once it’s possible of course.
DIGITS isn't that impressive... It is a RTX 5070 Ti laptop GPU (992 TOPS, clocked less than 1% higher, to reach 1000 TOPS/1 PFLOP. As a reference RTX 5090 desktop have 3352 TOPS, more than 3x...), with 128 GB of unified memory.
Just because Jensen calls it a super computer and gives it a DGX-1 design, doesn't make it one.
In the Cleo Abram interview [1], Jensen said that DIGITS is 6 times more powerful than the first DGX-1.
According to this PDF [2], DGX-1 had 170 TFLOPS of FP16 (half precision). 170x6=1020 TFLOP (~1 PFLOP). Yes DIGITS is suppose to have 1 PFLOP, but according to the presentation, it should be in FP4...
He also said that it will draw 10k times less power. But DGX-1 had a TDP of 3.5kW [3] and I highly doubt DIGITS will draw 3500/10000=0.35W... the GPU alone will have a peak TDP that is more like 200 times higher than that.
I mean, we all know that NVIDIA does fudge the numbers in charts. Like comparing FP8 from last generation, to FP4 on this. But this is extreme.
Having said that. Do I believe that they can deliver a laptop (in another form factor) and it will perform 1 PFLOP of FP4. Of course! Like I said, it is nothing special. Both Apple and AMD have unified memory in relatively cheap systems.
Seeing as it is going to deliver 1 PFLOP, it will need to have similar speed as the "native" (GDDR) counterpart otherwise it will only be able to hit that performance as long as all data is in the cache...
My guess is that they will use the RTX 5070 Ti laptop version (992 TFLOPS, slightly higher clocked to reach 1000 TFLOPS/ 1 PFLOP).
Their big GB200 chips have 546 GB/s to their LPDDR memory, they could use the same memory controler on the GB10. They don't need to design a new one. It would still be slower than what they are currently using on the RTX 5070 Ti laptop GPU, but any slower than that, and there is no chance that they could argue that it would hit anywhere near 1 PFLOP of FP4. It would only be possible in extreme edge case scenarios when all data will fit in it's 40MB L2 cache.
I think you have the reasoning backwards, there's no "must" here. Historically there are lots and lots of systems which have struggled to approach their peak FLOPS in real-world apps due to off-chip bottlenecks.
and people are missing the "Starting at" price. I suspect the advertised specs will end up more than $3k. If it comes out at that price, i'm in for 2. But I'm not holding my breath given Nvidia and all.
CPU (20 ARM cores), GPU (1 PFLOP of FP4) and memory (128 GB) seems fixed, so the only configurable parts would be storage (up to 4TB) and cabling (if you want to connect two DIGITS).
We kind of know what storage cost in a store and we know that Apple (Mac computers) and every phone manufacturer adds a ton of cost for a small increase. NVIDIA will probably do the same.
I have no idea what the cost for their cabling would be, but they exist in 100G, 200G, 400G and 800G speeds and you seem to need two of them.
If you are only going to use one DIGITS, and you can make do with whatever is the smallest storage option, then it is $3000. Many people might have another computer (set up FTP/SMB or similar solution), NAS or USB thumbdrive/external hardrive where they can stor extra data, and in that case you can have more storage without paying for more.
I'm not sure you can fit a decent quant of R1 in digits, 128 GB of memory is not enough for 8 and I'm not sure of 4 but I have my doubts. So you might have to go for around 1, which has a significant quality loss.
They won't have different models, in any other ways than if you want more storage (up to 4 TB, we don't know the lowest they will sell) and cabling necessary for connecting two DIGITS (it won't be included in the box).
We already know that it is going to be one single CPU and GPU and fixed memory. The GPU is most likely the RTX 5070 Ti laptop model (992 TFLOPS, clocked 1% higher to get 1 PFLOP).
I was an AI sceptic until 6 months ago, but that’s probably going to be my dev setup from spring onwards - running DeepSeek on it locally, with a nice RAG to pull in local documentation and datasheets, plus a curl plugin.
https://www.nvidia.com/en-us/project-digits/