More

yfontana · 2025-10-09T19:29:28 1760038168

I think I was shadow-banned because my very first comment on the site was slightly snarky, and have now been unbanned.

yfontana · 2025-10-09T14:10:09 1760019009

Properly measuring "GPU load" is something I've been wondering about, as an architect who's had to deploy ML/DL models but is still relatively new at it. With CPU workloads you can generally tell from %CPU, %Mem and IOs how much load your system is under. But with GPU I'm not sure how you can tell, other than by just measuring your model execution times. I find it makes it hard to get an idea whether upgrading to a stronger GPU would help and by how much. Are there established ways of doing this?

sailingparrot · 2025-10-09T16:54:15 1760028855

For kernel-level performance tuning you can use the occupancy calculator as pointed out by jplusqualt or you can profile your kernel with Nsight compute which will give you a ton of info.

But for model-wide performance, you basically have to come up with your own calculation to estimate the FLOPs required by your model and based on that figure out how well your model is maxing out the GPU capabilities (MFU/HFU).

Here is a more in-depth example on how you might do this: https://github.com/stas00/ml-engineering/tree/master/trainin...

hatthew · 2025-10-09T18:17:57 1760033877

It's harder than measuring CPU load, and depends a lot on context. For example, often 90% of a GPU's available flops are exclusively for low-precision matrix multiply-add operations. If you're doing full precision multiply-add operations at full speed, do you count that as 10% or 100% load? If you're doing lots of small operations and your warps are only 50% full, do you count that as 50% or 100% load? Unfortunately, there isn't really a shortcut to understanding how a GPU works and knowing how you're using it.

jplusequalt · 2025-10-09T16:44:28 1760028268

CUDA toolkit comes with an occupancy calculator that can help you determine based on your kernel launch parameters how busy your GPU will potentially be.

For more information: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#multi...

villgax · 2025-10-09T16:55:59 1760028959

you need to profile them, nsight is one even torch does flamegraphs

yfontana · 2025-08-27T07:57:39 1756281459

Open source models like Flux Kontext or Qwen image edit wouldn't refuse, but you need to either have a sufficiently strong GPU or get one in the cloud (not difficult nor expensive with services like runpod), then set up your own processing pipeline (again, not too difficult if you use ComfyUI). Results won't be SOTA, but they shouldn't be too far off.

yfontana · 2025-07-11T09:02:55 1752224575

I pay for chatgpt because, in my experience, o3 and o4 are currently the best at combining reasoning with information retrieval from web searches. They're the best models I've tried at emulating the way I search for information (evaluating source quality, combining and contrasting information from several sources, refining searches, etc.), and using the results as part of a reasoning process. It's not necessarily significant for coding, but it is for designing.

yfontana · 2025-05-20T07:52:06 1747727526

From the article:

> Besides protein folding, the canonical example of a scientific breakthrough from AI, a few examples of scientific progress from AI include:1

> Weather forecasting, where AI forecasts have had up to 20% higher accuracy (though still lower resolution) compared to traditional physics-based forecasts.

> Drug discovery, where preliminary data suggests that AI-discovered drugs have been more successful in Phase I (but not Phase II) clinical trials. If the trend holds, this would imply a nearly twofold increase in end-to-end drug approval rates.

yfontana · 2025-05-14T16:20:45 1747239645

> It’s insane to me that maybe every bank I use requires SMS 2FA, but random services I use support apps.

It never ceases to surprise me how much American banks always seem to lag behind with regards to payment tech. My (european) bank started sending hardware TOTP tokens to whoever requested one like a decade ago. They've since switched to phone app MFA.

yfontana · 2025-05-13T21:45:19 1747172719

I've been working on extracting text from some 20 million PDFs, with just about every type of layout you can imagine. We're using a similar approach (segmentation / OCR), but with PyMuPDF.

The full extract is projected to run for several days on a GPU cluster, at a cost of like 20-30k (can't remember the exact number but it's in that ballpark). When you can afford this kind of compute, text extraction from PDFs isn't quite a fully solved problem, but we're most of the way there.

What the article in the OP tries to do is, as far as I understand, somewhat different. It's trying to use much simpler heuristics to get acceptable results cheaper and faster, and this is definitely an open issue.

yfontana · 2025-05-12T15:35:14 1747064114

We don't know for sure that the universe is a closed system.

yfontana · 2025-05-05T11:58:16 1746446296

Primordial black holes are black holes that formed right after the big bang. Basically areas where gravity caused the extremely dense matter of the universe's first instants to collapse into black holes before expansion could pull it apart. Their existence has been hypothesized but not confirmed (or definitively rejected) so far.

yfontana · 2025-04-10T08:56:00 1744275360

It's absolutely not crazy, and I don't know how a car could only cost you 1k/year. This is just an example but has the average cost of motoring in Ireland in 2019 at almost 11k€/year : https://www.theaa.ie/motoring-advice/cost-of-motoring/.

One of the issues with trains is that people often severely underestimate the true total cost of car trips.