More

trentnelson · 2025-03-04T19:55:17 1741118117

I remember my first job in 2000, straight out of 1.5 years of college, getting to play directly with Digital UNIX and Alpha processors! The Alpha 21264 was a beast at the time.

trentnelson · 2025-01-27T17:00:32 1737997232

Based on an earlier comment, I think the person you're replying to is the author of aider.

trentnelson · 2024-12-10T21:08:31 1733864911

It’s insane how hard hovering is. I had about 35 hours of fixed wing time, and treated myself to a helicopter lesson for my birthday.

Hovering was so humbling! You’d be stable for a few seconds and then oops now we’re suddenly crabbing backwards whilst rolling laterally whilst exacerbating everything with pilot-induced oscillations in every conceivable axis of movement.

Having to constantly enter three inputs whenever the external environment changes (ie wind, gust), or any time any one of the three inputs change… it absolutely requires some new neural pathways to be forged!

I flew with Patty Wagstaff many years later and even she admitted hovering was so hard, to the point it looked like she wasn’t going to be able proceed with her rotor license (before it all clicked).

warner25 · 2024-12-10T21:23:53 1733865833

> so hard, to the point it looked like she wasn’t going to be able proceed with her rotor license (before it all clicked)

Yeah, I think we were all convinced that we were going to wash out of flight school in the first few weeks. Hovering was not something that you could see yourself gradually getting better at, so it felt impossible right up until it wasn't. It really did just "click" one day. Almost two decades later, I still have a vivid memory of the very moment that I realized I had full control of the aircraft when picking it up from the ground.

When my buddy and I were telling an instructor pilot one day how we felt (like we'd never be able to hover), he wisely pointed out that the flight school syllabus had a certain number of hours for a reason. It had been refined over the past 50 years, so they knew exactly how many hours were needed, and if things "clicked" for us ahead of that schedule it would mean that time and money were being wasted.

trentnelson · 2024-10-11T16:58:38 1728665918

Had any exposure to r=2 hypergraph implementations on the GPU? Ideally with an efficient way to determine if the graph is acyclic?

(The CPU algos for doing this work great on CPUs but are woeful on GPUs.)

lmeyerov · 2024-10-11T17:22:34 1728667354

Pretty good - r=2 is a regular graph afaict, and basically anything that maps to a frontier-based pattern works well. Ex: level synchronous bfs during topological sort.

For the 'easy' way we do in gfql, which is basically vector ops on bulk wavefronts, we can do massive cypher traversals like you're asking, like 100M edges touched in a result substep, and on a tiny GPU. There are other such bulk patterns we want to add such as Pregel style, which open other algorithms here. In practice we can often just call cudf/cugraph as building blocks so haven't had the pressure to do so yet.

The weak spot I find is more like small OLTP lookups. Ex: Imagine a taxi routing traffic service pinging for one car to do a couple hops out, where you just want a KV store in cheap RAM. But if you are batching those queries, like in a heavy city, and going deeper on them, maybe more interesting.

trentnelson · 2024-09-09T19:30:33 1725910233

None of the UNIXes have the notion of WriteFile with an OVERLAPPED structure, that’s the key to NT’s asynchronous I/O.

Nor do they have anything like IOCP, where the kernel is aware of the number of threads servicing a completion port, and can make sure you only have as many threads running as there are underlying cores, avoiding context switches. If you write your programs to leverage these facilities (which are very unique to NT), you can max perform your hardware very nicely.

trentnelson · 2024-09-09T18:46:16 1725907576

I should do an updated version of that deck with io_uring and sans the PyParallel element. I still think it’s a good resource for depicting the differences in I/O between NT & UNIX.

And yeah, IOCP has implicit awareness of concurrency, and can schedule optimal threads to service a port automatically. There hasn’t been a way to do that on UNIX until io_uring.

nullindividual · 2024-09-09T19:06:43 1725908803

Yes, please! And if you're interested, RegisteredIO and I assume you'd drop in IoRing.

In a nicely wrapped PDF :-)

trentnelson · 2024-09-09T19:26:06 1725909966

Yeah I’d definitely include RegisteredIO and IoRing. When I was interviewing at Microsoft a few years back, I was actually interviewed by the chap that wrote RegisteredIO! Thought that was neat.

trentnelson · 2024-07-30T17:04:48 1722359088

FWIW, on Windows, the ETW event instrumentation that captures dispatch (i.e. thread scheduling) and loader info (I think it's literally the DISPATCH+LOADER flags to xperf) solves this problem, which, inherently is: at any arbitrary point in time, given an IP/PC, what module/function am I in?

If you have timestamped module load/unload info with base address + range, plus context switch times that allow you to figure out which specific thread & address space was running at any given CPU node ID + point in time, you can always answer that question. (Assuming the debug infrastructure is robust enough to map any given IP to one specific function, which it should be able to do, even if the optimizer has hoisted out cold paths into separate, non-contiguous areas.)

I realize this isn't very helpful to you on Linux (if it's any consolation I'm on Linux these days too), but, sometimes it's interesting to know how other platforms handle it.

trentnelson · 2024-07-30T16:56:27 1722358587

Interesting... I've been lamenting the absence of .pdbs on Linux. It sounds like this would allow dissasociating symbol info from the build artifact itself?

(There's no other out-of-the-box solution to this right? i.e. having symbol info live somewhere else other than the .so/exe, that can be loaded on demand when debugging? Like .pdbs basically.)

mark_undoio · 2024-07-31T11:04:22 1722423862

GDB is happy to deal with separate debug info files from the executable code - and has been approximately "always", as far as I'm aware. But it's not particularly common / well-understood how to actually achieve it.

Some info here on how to configure GDB to use it: https://sourceware.org/gdb/current/onlinedocs/gdb.html/Separ...

The old-school way appears to be to extract the debug information from the binaries after compilation, then strip the binaries. As described here: https://stackoverflow.com/questions/866721/how-to-generate-g...

The new way is to use gcc's ability to generate split DWARF directly: https://interrupt.memfault.com/blog/dealing-with-large-symbo...

This will work with debuginfod but you don't have to have that running to use these - you can just supply the symbol directory when you want to debug.

trentnelson · 2024-07-30T16:54:02 1722358442

I like the idea of hacking the crap out of `compile_commands.json` and subverting it for your evil machinations outside of the normal build process. Such a hideously pragmatic tip.

binary132 · 2024-07-30T20:22:56 1722370976

Duh, just alias clang to a script that replaces its debug option and invokes clang, but only for the files you want

trentnelson · 2024-07-30T16:52:09 1722358329

That's neat. The modern equivalent to that these days, on Windows, is to leverage ETW and Windows Performance Analyzer. Potentially with a custom plugin that can visualize your specific perf data as a first-class WPA citizen (i.e. indistinguishable from any other perf data being analyzed, which means you can group/query/filter etc. just like anything else).

I wrote a plugin for a past employer to visualize our internal product event hierarchy performance as if it were a normal C/C++ call stack, it was pretty cool. ETW and WPA are phenomenal tools. I miss them both dearly when on Linux.