Hacker News new | past | comments | ask | show | jobs | submit | bcatanzaro's comments login

This is amazing. Thanks for posting.


Anyone else found it odd that the article says "For all the improvements evident in contemporary surgical technology, electron microscopic images actually confirm that the edge of Neolithic obsidian blades exceed today’s steel scalpels in sharpness" and then cites a paper about obsidian blades from 1982?

Has there been no improvement in sharpness in 42 years? Or are Neolithic obsidian blades just that much sharper?


There are glass fractured blades that form the same edges and diamond scalpels too, all of them show brittleness issues to lateral moves especially. The physics of steel cutting edges get very complicated in general as can be shown by an example of how human hair can damage steel here: https://www.reddit.com/r/EngineeringPorn/s/6xHn8TQEtK

I tend to geek out over the topic of cutting as it’s endlessly complex from biological matter all the way to cnc.


From a pure sharpness perspective, you can't actually get any sharper, but it turns out there are many other physical properties (toughness, ease of sterilization, repeatability of manufactured tolerances at-scale, etc.) of a practical cutting edge where steel overwhelmingly beats amorphous ceramics.

FWIW, many surgical operations done today that require a very high-precision cut use numerically controlled lasers. Unfortunately, lasers will never work for certain procedures where cauterization would hamper healing or tissue reintegration.


apparently yes. Broken glass (which obsidian is, a volcanic glass) have edges a few atoms wide beating anything manmade.


Yes but it is not very durable and the blade dulls quickly, hence all the effort on fancy steel blades.


They are fragile I've heard, and quite likely they dull quickly (don't know), but I'm not saying they're 'better' than steel; it's not a competition. Horses/courses.


I may be the only person who loves LilyPond but I really do love it. The LaTeX of music notation.


LilyPond is great, but the writing process for a lot of music involves a lot of playback, so integration with a half-decent playback engine is really useful. On top of that, you can do almost everything in Dorico and Sibelius with keyboard shortcuts, so they are very power-user-friendly (which is what I like about LaTeX).


Plus, as professional software used by people along (mostly not very much) money with it, productivity is key. Lilypond loses badly here.

I can enter 200 or 300 bars in Dorico in the time I could do 20 in Lilypond - and that’s at the rate I could manage when I was using lilypond regularly.

I also think the output is nicer, which also matters here.


Lilypond is for music typesetting (they call it "engraving").

Finale, Sibelius, Dorico, MuseScore are primarily for composing (though they have each made their own strides on the engraving front too).


I do "professional" engraving as a side gig, mostly turning scratch people's make on lined paper into actual sheet music, and I exclusively use lilypond


Lilypond makes gorgeous music. However, getting things besides music to look good is painful at best and sometimes impossible. I spent hours trying to figure out how to get a good looking lead sheet setup (music, chord name, lyrics). Especially font sizes and spacing. Ugh. Good luck getting an annotation (such as "intro" or "chorus") anywhere less than about 2em from the top of the staff...

I think they've changed things in the five years since then, so I think I'd have to do it all over again.


I feel like I pay a lot for news. I pay for: * WSJ * NYTimes * Economist * LATimes * SJ Mercury News * Apple News

And yet I constantly run into paywalls (which I circumvent). How much per month does the news industry think is fair for me to pay?

I wish I could just pay a fee per article I read. I think the business model is broken because there are too many individual entities and they all want a subscription. And this makes no sense in the age of the internet.


This is why my next car won’t be a Tesla (after driving one for the past 8 years). It’s sad, they get so many things right but they are dangerously irresponsible.


Yep, agreed. I'm on my third Tesla, and probably my last. The cars are still very structurally safe, but operationally they're getting more dangerous because of changes like the UI, FSD feeling actually worse than it used to (I had it on my Model 3, but I couldn't transfer it to my Model X, and it wasn't good enough on the 3, why would it be good enough on the X which actually had FEWER sensors?) and the absolute insistence than vision-only is safer than vision plus radar and ultrasonic. I understand that the radar they had was limited, but you improve the radar, don't drop it entirely.

Also, I haven't been an Elon fan since 2018 or so, but I was still able to look past his shenanigans when I bought this a few years ago, but he's gone straight over the cliff since then and keep falling, so this is almost definitely the last one I'm getting.


You have a different reaction than most Tesla owners [1].

[1] https://www.marketwatch.com/story/tesla-has-the-most-loyal-b...


Tesla owners that bought a car last year is a very different number than all Tesla owners.


Maybe the best benefit of LLM as a search engine is that they haven't figured out yet how to serve you 10 ads before they give you the link.


The point is: if you sue claiming this model breaks the law, you lose your license to use it.

Apache 2.0 has a similar restriction: “ If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.”


True, although it's unusual to see it for copyright not patents.

That said, the far bigger issue is the end of the same clause 2.1:

> NVIDIA may update this Agreement to comply with legal and regulatory requirements at any time and You agree to either comply with any updated license or cease Your copying, use, and distribution of the Model and any Derivative Model


Oh, I didn't realize that it was a standard term. I'm sure there's a good motivation then, it doesn't seem so bad.


People have been saying that GPUs randomly happened to be good at ML since at least 2014 when Nervana was founded. It was clear 12 years ago to anyone paying attention that DL was revolutionizing the world and that NVIDIA GPUs were at the center of the revolution. Whatever random chance factored into NVIDIA's success is outweighed by the decade+ of pedal-to-the-metal development NVIDIA has undertaken while its competitors decried AI as an overhyped bubble.

I have been there for this history, working on ML on GPUs at NVIDIA for a few years before Jensen decided to productize my little research project, CUDNN.


The history page for CUDA is pretty accessible [1]. It originated from experiments at Stanford in 2000 with Ian taking on leadership of CUDA development in 2004.

> In pushing for CUDA, Jensen Huang aimed for the Nvidia GPUs to become a general hardware for scientific computing. CUDA was released in 2006. Around 2015, the focus of CUDA changed to neural networks.[8]

Credit to Jensen for pivoting, but I recall hearing about CUDA networks from Google tech talks in 2009 and realizing they would be huge. It wasn't anything unique to realize NNs were a huge innovation but it did take another 5 years for it to mature enough and for it to become clear that GPUs could be useful for training and whatnot. Additionally, it's important to remember that Google had a huge early lead here & worked closely with Nvidia since CUDA was much more mature than OpenCL (due to intentional sabotage or otherwise) and Nvidia's chips satisfied the compute needs of that early development.

So it was more like Google leading Nvidia to the drinking well and Nvidia eventually realizing it was potentially an untapped ocean and investing some resources. Remember, they also put resources behind cryptocurrency when that bubble was inflating. They're good at opportunistically taking advantage of those bubbles. It was also around this time period that Google realized they should start investing in dedicated accelerators with their TPUs because Nvidia could not meet their needs due to lack of focus (+ dedicated accelerators could outperform) leading to the first TPU being used internally by 2015 [2].

Pretending like Jensen is some unique visionary seeing something no one else in the industry didn't is insane. It was a confluence of factors and Jensen was adept at navigating his resources to take advantage of it. You can appreciate Nvidia's excellence here without pretending like Jensen is some kind of AI messiah.

[1] https://en.wikipedia.org/wiki/CUDA

[2] https://en.wikipedia.org/wiki/Tensor_Processing_Unit


I was at NVIDIA at that time working on ML on GPUs. Jensen is indeed a visionary. It’s true as you point out that NVIDIA paid attention to what its customers were doing. It’s also true that Ian Buck published a paper using the GPU for neural networks in 2005 [1], and I published a paper using the GPU for ML in 2008 while I did my first internship at NVIDIA [2]. It’s just not true that NVIDIA’s success in AI is all random chance.

[1] https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1575717 [2] https://dl.acm.org/doi/pdf/10.1145/1390156.1390170


Yeah, you really get this sense when you page through their research history on the topic: https://research.nvidia.com/publications?f%5B0%5D=research_a...

CUDA won big because it made a big bet. Were it that OpenCL was ubiquitous and at feature-parity with CUDA, maybe there would be more than one player dealt-in at the table today. But everyone else folded while Nvidia ran the dealer for 10 long years.


GW or GWh???

I’m guessing they are talking about GW but that is such a frustrating unit to measure storage in. It’s like measuring your gas tank in gallons/minute.

Yes, power capacity matters, but energy is conserved and power is not.

If you have a bank of supercapacitors that can deliver a GW * femtosecond, you do not have significant storage capacity.

I still think journalists should use TJ instead of GWh in these articles in order to clear up this ambiguity.


Innumeracy in this area drives me nuts. Case 17 in https://www.eia.gov/analysis/studies/powerplants/capitalcost... for instance specs out a solar installation that can store a bit more than an hour of its peak output energy in batteries and quotes $2,175 per kWh.

People are going to compare that to Case 9 and a $7,681 kWh for a dual installation of AP-1000s and declare it is game over but you're going to need a lot more than one hour of storage to get through the night and when you add that and consider the need that you'll either need storage or overcapacity to get through the winter the cost is going to closer to the AP-1000. Personally I am pretty irked that that paper gives a number for the capital cost of a PV system which is not dependent on the location because you could a difference by more than a factor of two in how much energy the same array could produce in different spots.

It's one of those things that adds to a culture of people just talking past each other.


So much random numbers and units and so little critical thinking or understanding. Frustrating. Journalists fail basically every time they write.


GPUs have evolved to be AI machines with as little baggage as possible. People have been arguing GPUs were old technology and therefore unsuited for AI since at least 2014 (when Nervana was founded), but what they perhaps didn’t expect is that the GPU would evolve so quickly to be an AI machine.


Bill Dally from Nvidia argues that there is "no gain in building a specialized accelerator", in part because current overhead on top of the arithmetic is in the ballpark of 20% (16% of IMMA and 22% for HMMA units) https://www.youtube.com/watch?v=gofI47kfD28


There does seem to be a somewhat obvious advantage: If all it has to do is matrix multiplication and not every other thing a general purpose GPU has to be good at then it costs less to design. So now someone other than Nvidia or AMD can do it, and then very easily distinguish themselves by just sticking a ton of VRAM on it. Which is currently reserved for GPUs that are extraordinarily expensive, even though the extra VRAM doesn't cost a fraction of the price difference between those and an ordinary consumer GPU.


Exactly. And that means you not only save the 22% but also a large chunk of the Nvidia margin.


And, sure enough, there's a new AI chip from Intellifusion in China that's supposed to be 90% cheaper. 48 TOPS in int8 training performance for US$140.[1]

[1] https://www.tomshardware.com/tech-industry/artificial-intell...


I wonder what the cost of power to run these chips is. If the power cost ends up being large compared to the hardware cost, it could make sense to buy more chips and run them when power is cheap. They could become a large source of dispatchable demand.


Int8 training has very few applications, and int8 ops generally are very easy to implement. Int8 is a decent inference format, but supposedly doesn't work well for LLMs that need a wide dynamic range.


There are other operations for things like normalization in training, which is why most successful custom stuff has focused on inference I think. As architectures changed and needed various different things some custom built training hardware got obsoleted, Keller talked about that affecting Tesla's Dojo and making it less viable (they bought a huge nvidia cluster after it was up). I don't know if TPU ran into this, or they made enough iterations fast enough to keep adding what they needed as they needed it.


Designing it is easy and always has been. Programming it is the bottleneck. Otherwise Nvidia wouldn't be in the lead.


but programming it is "import pytorch" - nothing nvidia-specific there.

the mass press is very impressed by Cuda, but at least if we're talking AI (and this article is, exclusively), it's not the right interface.

and in fact, Nv's lead, if it exists, is because they pushed tensor hardware earlier.


Someone does, in fact, have to implement everything underneath that `import` call, and that work is _very_ hard to do for things that don't closely match Nvidia's SIMT architecture. There's a reason people don't like using dataflow architectures, even though from a pure hardware PoV they're very powerful -- you can't map CUDA's, or Pytorch's, or Tensorflow's model of the world onto them.


I'm talking about adding Pytorch support for your special hardware.

Nv's lead is due to them having Pytorch support.


Eh if you're running in production you'll want something lower level and faster than pytorch.


AI models are not all matrix multiplications, and they tend to involve other operations. Also, they change super fast, much faster than hardware cycles, so if your hardware isn't general-purpose enough, the field will move past you and obsolete your hardware before it comes out.


AI models are mostly matrix multiplications and have been that way for a few years now, which is longer than a hardware cycle. Moreover, if the structure changes then the hardware changes regardless of whether it's general purpose or not, because then it has to be optimized for the new structure.

Everybody cares about VRAM right now yet you can get a P40 with 24GB for 10% of the price of a 24GB RTX 4090. Why? No tensor cores, the things used for matrix multiplication.


I really hope we see AI-PU (or with some other name, INT16PU, why not) for the consumer market sometime soon. Or been able to expand GPU memory using a pcie socket (not sure if technically possible).


The while point of GPU memory is that it's faster to access than going to memory (like your main RAM) through the PCIe bottleneck.


My uninformed question about this is why can't we make the VRAM on GPUs expandable? I know that you need to avoid having the data traverse some kind of bus that trades overhead for wide compatibility like PCIe but if you only want to use it for more RAM then can't you just add more sockets whose traces go directly to where they're needed? Even if it's only compatible with a specific type of chip it would seem worthwhile for the customer to buy a base GPU and add on however much VRAM they need. I've heard of people replacing existing RAM chips on their GPUs[0] so why can't this be built in as a socket like motherboards use for RAM and CPUs?

[0] https://www.tomshardware.com/news/16gb-rtx-3070-mod


Expandable VRAM on GPUs has been tried before - the industry just hates it. It's like Apple devices - want more internal storage? Buy a new computer so we can have the fat margins.

The original REV A iMac in late 90s had slotted memory for its ATI card, as one example - shipped with 2mb, could be upgraded to 6mb after the fact with a 4MB SGRAM DIMM. There are also a handful of more recent examples floating around.

While I'm sure there are also packaging advantages to be had by directly soldering memory chips instead of slotting them etc, I strongly suspect the desire to keep buyers upgrading the whole card ($$$) every few years trumps this massively if you are a GPU vendor.

Put another way, what's in it for the GPU vendor to offer memory slots? Possibly reduced revenue, if it became industry norm.


Expansion has to answer one fundamental question: if you're likely to need more X tomorrow, why aren't you just buying it today?

The answer to this question almost has to be "because it will be cheaper to buy it tomorrow." However, GPUs bundle together RAM and compute. If RAM is likely to be cheaper tomorrow, isn't compute also probably going to be cheaper?

If both RAM and compute are likely cheaper tomorrow, then the calculus still probably points towards a wholesale replacement. Why not run/train models twice as quickly alongside the RAM upgrades?

> I strongly suspect the desire to keep buyers upgrading the whole card ($$$) every few years trumps this massively if you are a GPU vendor.

Remember as well that expandable RAM doesn't unlock higher-bandwidth interconnects. If you could take the card from five years ago and load it up with 80 GB of VRAM, you'd still not see the memory bandwidth of a newly-bought H100.

If instead you just need the VRAM and don't care much about bandwidth/latency, then it seems like you'd be better off using unified memory and having system RAM be the ultimate expansion.


> The answer to this question almost has to be "because it will be cheaper to buy it tomorrow."

No, it doesn't. It could just as easily be "because I will have more money tomorrow." If faster compute is $300 and more VRAM is $200 and I have $300 today and will have another $200 two years from now, I might very well like to buy the $300 compute unit and enjoy the faster compute for two years before I buy the extra VRAM, instead of waiting until I have $500 to buy both together.

But for something which is already a modular component like a GPU it's mostly irrelevant. If you have $300 now then you buy the $300 GPU, then in two years when you have another $200 you sell the one you have for $200 and buy the one that costs $400, which is the same one that cost $500 two years ago.

This is a much different situation than fully integrated systems because the latter have components that lose value at different rates, or that make sense to upgrade separately. You buy a $1000 tablet and then the battery goes flat and it doesn't have enough RAM, so you want to replace the battery and upgrade the RAM, but you can't. The battery is proprietary and discontinued and the RAM is soldered. So now even though that machine has a satisfactory CPU, storage, chassis, screen and power supply, which is still $700 worth of components, the machine is only worth $150 because nothing is modular and nobody wants it because it doesn't have enough RAM and the battery dies after 10 minutes.


hmm seems you're replying as a customer, but not as a GPU vendor...

the thing is, there's not enough competition in the AI-GPU space.

Current only option for no-wasting-time on running some random research project from github? buy some card from nvidia. cuda can run almost anything on github.

AMD gpu cards? that really depends...

and gamers often don't need more than 12?gb of GPU ram for running games on 4k.. so most high-vram customers are on the AI field.

> If you could take the card from five years ago and load it up with 80 GB of VRAM, you'd still not see the memory bandwidth of a newly-bought H100.

this is exactly what nvidia will fight against tooth-and-nail -- if this is possible, its profit margin could be slashed to 1/2 or even 1/8


Replacing RAM chips on GPUs involves resoldering and similar things - those (for the most part) maintain the signal integrity and performance characteristics of the original RAM. Adding sockets complicates the signal path (iirc), so it's harder for the traces to go where they're needed, and realistically given a trade-off between speed/bandwidth and expandability I think the market goes with the former.


The problem with GPUs is they're designed to be saturated.

If you have a CPU and it has however many cores, the amount of memory or memory bandwidth you need to go with that is totally independent, and memory bandwidth is rarely the bottleneck. So you attach a couple memory channels worth of slots on there and people can decide how much memory they want based on whether they intend to have ten thousand browser tabs open or only one thousand. Neither of which will saturate memory bandwidth or depend on how fast the CPU is, so you don't want the amount of memory and the number of CPU cores tied together.

If you have a device for doing matrix multiplications, the amount of RAM you need is going to depend on how big the matrix you want to multiply is, which for AI things is the size of the model. But the bigger the matrix is, the more memory bandwidth and compute units it needs for the same number of tokens/second. So unlike a CPU, there aren't a lot of use cases for matching a small number of compute units with a large amount of memory. It'd be too slow.

Meanwhile the memory isn't all that expensive. For example, right now the spot price for 64GB of GDDR6 is less than $200. Against a $1000 GPU which is fast enough for that much, that's not a big number. Just include it to begin with.

Except that they don't. The high end consumer GPUs are heavy on compute and light on memory. For example, you can get the RTX 4060Ti with 16GB of VRAM. The RTX 4090 has four times as much compute but only 50% more VRAM. There would be plenty of demand for a 4090 that cost $200 more and had four times as much VRAM, only they don't make one because of market segmentation.

Obviously if they don't do that then they're not going to give one you can upgrade. But you don't really want to upgrade just the VRAM anyway, what you want is for the high performance cards to come with that much VRAM to begin with. Which somebody other than Nvidia might soon provide.


Technically we definitely can, but are there sufficiently many people willing to pay a sufficiently high premium for that feature? How much more would you be willing to pay for an otherwise identical card that has the option to expand RAM, and do you expect that a significant portion of buyers would want to pay a non-trivial up-front cost for that possibility?


Its a minor technical challenge with no financial benefit for the GPU makers.


Isn't that what NPUs are technically?

https://en.m.wikipedia.org/wiki/AI_accelerator


Isn't this what resizeable BAR and direct storage are for?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: