1. What languages do you think are good ones for HPC? 2. Re GAs, Koza, and cost:...

zackmorris · 2024-12-12T18:31:36 1734028296

> 2. Re GAs, Koza, and cost:

Ya that's a really good point about the linear scaling limits of genetic algorithms, so let me provide some context:

Where I disagree with the chip industry is that I grew up on a Mac Plus with a 1979 Motorola 68000 processor with 68,000 transistors (not a typo) that ran at 8 MHz and could get real work done - as far as spreadsheets and desktop publishing - arguably more easily than today. So to me, computers waste most of their potential now:

https://en.wikipedia.org/wiki/Transistor_count

As of 2023, Apple's M2 Max had 67 billion transistors running at 3.7 GHz and it's not even the biggest on the list. Although it's bigger than Nvidia's A100 (GA100 Ampere) at 54 billion, which is actually pretty impressive.

If we assume this is all roughly linear, we have:

  year  count   speed
  1979  6.8e4   8e6
  2023  6.7e10  3.7e9

So the M2 should have (~1 million) * (~500) = 500 million times the computing power of the 68000 over 44 years, or Moore's law applied 29 times (44 years/18 months ~= 29).

Are computers today 500 million times faster than a Mac Plus? It's a ridiculous question and the answer is self-evidently no. Without the video card, they are more like 500 times faster in practice, and almost no faster at all in real-world tasks like surfing the web. This leads to some inconvenient questions:

  - Where did the factor of 1 million+ speedup go?
  - Why have CPUs seemingly not even tried to keep up with GPUs?
  - If GPUs are so much better able to recruit their transistor counts, why can't they even run general purpose C-style code and mainstream software like Docker yet?

Like you said and Koza found, the per-unit speed only increased 100 times in the decade of the 2000s by Moore's law because 2^(10 years/18 months) ~= 100. If it went up that much again in the 2010s (it didn't), then that would be a 10,000 times speedup by 2020, and we could extrapolate that each unit today would run about 2^(24 years/18 months) ~= 100,000 times faster than that original lisp machine. A 1000 unit cluster would run 100 million or 10^8 times faster, so research on genetic algorithms could have continued.

But say we wanted to get back into genetic algorithms and the industry gave computer engineers access to real resources. We would design a 100 billion transistor CPU with around 1 million cores having 100,000 transistors each, costing $1000. Or if you like, 10,000 Pentium Pro or PowerPC 604 cores on a chip at 10 million transistors each.

It would also have a content-addressable memory so core-local memory appears as a contiguous address space. It would share data between cores like how BitTorrent works. So no changes to code would be needed to process data in-cluster or distributed around the web.

So that's my answer: computers today run thousands or even millions of times slower than they should for the price, but nobody cares because there's no market for real multicore. Computers were "good enough" by 2007 when smartphones arrived, so that's when their evolution ended. Had their evolution continued, then the linear scaling limits would have fallen away under the exponential growth of Moore's law, and we wouldn't be having this conversation.

> 1. What languages do you think are good ones for HPC?

That's still an open question. IMHO little or no research has happened there, because we didn't have the real multicore CPUs mentioned above. For a little context:

Here are some aspects of good programming languages;

https://www.linkedin.com/pulse/key-qualities-good-programmin...

https://www.chakray.com/programming-languages-types-and-feat...

  Abstractability
  Approachability
  Brevity
  Capability
  Consistency
  Correctness
  Efficiency
  Interactivy
  Locality
  Maintainability
  Performance
  Portability
  Productivity
  Readability
  Reliability
  Reusability
  Scalability
  Security
  Simplicity
  Testability

Most languages only shine for a handful of these. And some don't even seem to try. For example:

OpenCL is trying to address parallelization in all of the wrong ways, by copying the wrong approach (CUDA). Here's a particularly poor example, the top hit on Google:

Calculate X[i] = pow(X[i],2) in 200-ish lines of code:

https://github.com/rsnemmen/OpenCL-examples/blob/master/Hell...

Octave/MATLAB addresses parallelization in all of the right ways, through the language of spreadsheets and matrices:

Calculate X[i] = pow(X[i],2) in 1 line of code as x .^ y or power(x, y):

https://docs.octave.org/v4.4.0/Arithmetic-Ops.html#XREFpower

But unfortunately neither OpenCL nor Octave/MATLAB currently recruit multicore or cluster computers effectively. I think there was research in the 80s and 90s on that, but the languages were esoteric or hand-rolled. Basically hand out a bunch of shell scripts to run and aggregate the results. It all died out around the same time as Beowulf cluster jokes did:

https://en.wikipedia.org/wiki/Computer_cluster

Here's one called Emerald:

https://medium.com/@mwendakelvinblog/emerald-the-language-of...

http://www.emeraldprogramminglanguage.org

https://emeraldlang.github.io/emerald/

After 35 years of programming experience, here's what I want:

  - Multi-element operations by a single operator like in Octave/MATLAB or shader languages like HLSL
  - All variables const (no mutability) with get/set between one-shot executions like the suspend-the-world io of ClojureScript and REDUX state
  - No monads/futures/async or borrow checker like in Rust (const negates their need, just use 2x memory rather than in-place mutation)
  - Pass-by-value copy-on-write semantics for all arguments like with PHP arrays (pass-by-reference classes broke PHP 5+)
  - Auto-parallelization of flow control and loops via static analysis of intermediate code (no pragmas/intrinsics, const avoids side effects)
  - Functional and focused on higher-order methods like map/reduce/filter, de-emphasize Java-style object-oriented classes
  - Smart collections like JavaScript classes "x.y <=> x[y]" and PHP arrays "x[y] <=> x[n]", instead of "pure" set/map/array
  - No ban on multiple inheritance, no final keyword, let the compiler solve inheritance constraints
  - No imports, give us everything and the kitchen sink like PHP, let the compiler strip unused code
  - Parse infix/prefix/postfix notation equally with a converter like goformat, with import/export to spreadsheet and (graph) database

It would be a pure functional language (impurity negates the benefits of cryptic languages like Haskell), easier to read and more consistent than JavaScript, with simple fork/join like Go/Erlang and single-operator math on array/matrix elements like Octave/Matlab. Kind of like standalone spreadsheets connected by shell pipes over the web, but as code in a file. Akin to Jupyter notebooks I guess, or Wolfram Alpha, or literate programming.

Note that this list flies in the face of many best practices today. That's because I'm coming from using languages like HyperTalk that cater to the user rather than the computer.

And honestly I'm trying to get away from languages. I mostly use #nocode spreadsheets and SQL now. I wish I could make cross-platform apps with spreadsheets (a bit like Airtable and Zapier) and had a functional or algebra of sets interface for databases.

It would take me at least a year or two and $100,000-250,000 minimum to write an MVP of this language. It's simply too ambitious to make in my spare time.

Sorry about length and delay on this!

zackmorris · 2024-12-13T18:46:10 1734115570

After sleeping on this, I realized that I forgot to connect why the aspects of good programming languages are important for parallel programming. It's because it's already so difficult, why would we spend unnecessary time working around friction in the language? If we have 1000 or 1 million times the performance, let the language be higher level so the compiler can worry about optimization. I-code can be simplified using math approaches like invariance and equivalence. Basically turning long sequences of instructions into a result and reusing that with memoization. That's how functional programming lazily evaluates code on demand. By treating the unknown result as unsolved and working up the tree algebraically, out of order even. Dependent steps can be farmed out to cores to wait until knowns are solved, then substitute those and solve further. So even non-embarrassingly parallel code can be parallelized in a divide and conquer strategy, limited by Amdahl's Law of course.

I'm concerned that this solver is not being researched enough before machine learning and AI arrive. We'll just gloss over it like we did by jumping from CPU to GPU without the HPC step between them.

At this point I've all but given up on real resources being dedicated to this. I'm basically living on another timeline that never happened, because I'm seeing countless obvious missed steps that nobody seems to be talking about. So it makes living with mediocre tools painful. I spend at least 90% of my time working around friction that doesn't need to be there. In a very real way, it negatively impacts my life, causing me to lose years writing code manually that would have been point and click back in the Microsoft Access and FileMaker days, that people don't even think about when they get real work done with spreadsheets.

TL;DR: I want a human-readable language like HyperTalk that's automagically fully optimized across potentially infinite cores, between where we are now with C-style languages and the AI-generated code that will come from robotic assistants like J.A.R.V.I.S.