Compiling Java to native code via GraalVM is really amazing. The Quarkus framewo...

willsmith72 · 2024-06-11T00:04:09 1718064249

what are the cold starts like?

metadat · 2024-06-11T08:07:21 1718093241

Faster, because it's a native executable.

willsmith72 · 2024-06-11T16:01:48 1718121708

Right, but I mean there's faster than spring boot (30 seconds), then there's fast enough for a typical web experience (<500ms)

My experience on lambda is with node and python, and I've always needed to pay for concurrency. Cold starts were 1-3s, which is not good enough for a web api

shitlord · 2024-06-12T00:44:48 1718153088

Depending on how your program is written, you might be able to use Lambda SnapStart and eliminate most of the cold start overhead.

mike_hearn · 2024-06-11T16:58:09 1718125089

A GraalVM native executable can start faster than a C program. Think tens of milliseconds.

willsmith72 · 2024-06-11T18:17:54 1718129874

i know what graalvm is, but that doesn't mean the cold start response time of a lambda from the perspective of a user will be anything like that.

mike_hearn · 2024-06-12T07:47:20 1718178440

It does. This is one of those times where nothing is hidden from you. You don't have to believe me. Believe AWS:

https://aws.amazon.com/blogs/opensource/improving-developer-...

> The same Lambda function with 3008 MB of memory that took 3.6 seconds to start with the JVM, started in under 100 milliseconds once compiled to a native executable using GraalVMs native-image tool. We were able to achieve sub second response times as shown in the graph below, and as an added bonus were also able to reduce the memory size and cost.

They go on to describe the main caveat - you have to predeclare what will be accessed via reflection - and how some frameworks like Micronaut do work up-front at source compile time to ensure the needed metadata is generated. So if your app is compatible with native image the benefits are really there.

There are some other caveats:

• In some cases you may need config files to make libraries compatible with the process. There's a central collection of them, and libraries are increasingly including their own metadata. The biggest app compat problems are with apps built using old versions of frameworks like Spring where you can't afford to update them to the newest versions of things.

• Out of the box the native executable runs a bit slower. To get throughput that's competitive with HotSpot you'll need to do a C++ style workflow with profile guided optimization, which is obviously more runtime efficient but less devops-time efficient than what HotSpot does.

• The actual compile process is slow, so you'll be developing on HotSpot.

Disclosure: I work part time with the GraalVM team.

degaart · 2024-06-11T17:35:19 1718127319

Source and benchmarks please

mike_hearn · 2024-06-12T07:45:40 1718178340

There are lots. Everyone gets those sorts of results so you can just try it, or here are some experience reports:

https://debijenkorf.tech/speed-up-application-launch-time-wi...

> The app went from starting in 463ms to a whopping 7ms, awesome!

> As you can see the memory usage went from 215.924kB to 18.104kB

Or for Lambdas (this result is reported by the GraalVM team):

> The same Lambda function with 3008 MB of memory that took 3.6 seconds to start with the JVM, started in under 100 milliseconds once compiled using GraalVM

https://aws.amazon.com/blogs/opensource/improving-developer-...

Native Image is a fully independent JVM and compiler implementation that was written from day one for startup time and memory footprint as the only goals that mattered. What it sacrifices to get that is some semantic compatibility. The big differences are:

- It compiles all code ahead of time. As machine code is much bigger than the equivalent bytecode, it uses a dead code ("tree shaking") analysis to only compile code that's statically reachable or declared via config files. It's like a mandatory WebPack or ProGuard step if you're familiar with those.

- It runs (some) class initializers at compile time, not startup time. So if you do something like "public static final Thread thread = ...." then you'll need to exclude that class from build-time init, including if it's in libraries etc.

- It snapshots the post-compile heap into the binary.

So this is changing the normal Java semantics and that means some apps won't run on native image without some up front work. It's not an entirely free capability. You have to "port" your app to it. Fortunately, because the startup and memory footprint wins are so huge and definitive the JVM ecosystem is rallying around this approach and making frameworks and such compatible with it. For instance if you use the latest versions of any of the modern Java web frameworks (Spring, Micronaut, Quarks, etc) then you can easily run a single build system target to get a Docker container with a native executable inside, that has those startup times you're seeing here.

At this point the startup time bottleneck for (compatible) Java apps has shifted to the kernel; the container infrastructure itself takes longer to start than the Java program does.

mwcampbell · 2024-06-12T14:55:02 1718204102

Sorry if this is too far off topic for this thread, but I'm curious if you've done any work on packaging JVM-based desktop apps, whether using JavaFX, Compose, or something else, using GraalVM Native Image. The idea of bringing Native Image's minimal startup time to desktop apps is really appealing to me.

mike_hearn · 2024-06-12T16:55:21 1718211321

Yes there have been some experiments with that.

https://github.com/hydraulic-software/conveyor/discussions/6...

Gluon has a version of GraalVM that can compile JavaFX apps. They do indeed start impressively fast and use much less memory. It's still a road somewhat less travelled though. Someone also tried it with Compose but it didn't get further than a demo repo and a few comments on our Discord.

There are a few issues left to resolve:

1. General developer usability.

2. Native images aren't deterministic, which reduces the effectiveness of delta updates.

3. Native images can quickly get larger than the JVM+bytecode equivalent, as bytecode is quite compact compared to machine code. So you trade off startup time against download time.

mwcampbell · 2024-06-12T22:03:38 1718229818

Is bytecode still more compact than native code when you factor in the ProGuard-like optimizations that Native Image does as you said in an earlier comment? Also, how does native code compare to bytecode once you compress it?

mike_hearn · 2024-06-13T08:01:45 1718265705

A small native image will be smaller than a jlinked JDK+JARs, but it doesn't take long for the curve to cross and the native image to become bigger. ProGuard doesn't fundamentally change that.

The native code produced by native image compresses very well indeed. UPX makes the binaries much smaller. But then you're hurting startup time, so it's not a good trade.

The best way would be to heavily compress downloads, then keep the programs uncompressed on disk. Unfortunately most download / update systems don't support modern codecs, so you're very limited in how much you can reduce download times. Also codecs like LZMA often result in much slower decompression, so on fast internet connections it can actually be better to use less compression rather than more. Really modern codecs like Brotli or zstd are much better, but browsers don't have good support for downloads.

None of this is especially hard to fix but it's a quiet area of development. I think it'll need a bit of a paradigm shift to become a more popular way to do things on the desktop/cli space.

mwcampbell · 2024-06-13T08:33:33 1718267613

Interesting observations on compression. As a young programmer, I used to compress executables, and maybe some DLLs as well, with UPX without a second thought. Later I understood that executable compressors prevented the OS's memory-mapped file I/O and demand paging from working as designed, and moved to only compressing the installer and update packages (another of my misadventures as a young programmer was doing my own updater with its own package file format).

I guess the ideal solution would be if the download server offered a few compression options negotiable at download time, via Content-Transfer-Encoding or some other form of HTTP content negotiation, trading CPU time against bandwidth (the server would have to pre-compress or at least cache the compressed versions to scale), and then the download was stored as some kind of archive that could be mounted as a filesystem (this implies random access and therefore not "solid" compression). Then delta updates would be done against that filesystem image. That way, you wouldn't have the "installing" process of uncompressing and copying files. Of course, that would require platform support that we don't have on Windows and macOS. At least I can dream about desktop Linux.

mike_hearn · 2024-06-13T08:54:18 1718268858

macOS actually has the best support for that. DMG files are mountable disk images and the contents can be compressed with LZMA or some Apple-specific codecs that are quite good. Opening them mounts them into the kernel and then there's random access. Even code signatures are checked JIT during page faults.

The main problem with DMGs is the poor UX, and very slow mount/verification times. Users can start the app from the DMG and it will seem to work, but be unable to update. They forget to unmount the "drive" or don't know how. The format is also undocumented and a PITA to work with as it's basically a full filesystem, which also has to be signed and notarized independently and that's super slow too. So it makes the whole build process a lot slower.

There's quite some low hanging fruit here that I might experiment with soon. I have a design in mind already.