So I've been messing with Fn + Clojure + Graal Native Image and I'm seeing cold ...

jacques_chester · on Aug 15, 2018

There are a lot of low- and medium-hanging fruit in this space.

I take the view that there is a "warmth hierarchy" (an idea I spitballed with other folks in Knative before Matt Moore took it and filled it out a lot).

You have a request or message. You need to process it. The hierarchy runs CPU > Memory > Disk > Cold.

CPU-warm: immediately schedulable by the OS. Order of nanoseconds.

Memory-warm: in-memory but can't be immediately scheduled. Our reference case is processes that have been cgroup frozen. Order of tens of milliseconds.

Disk-warm: the necessary layers are all present on the node which is assigned to launch the process. Order of hundreds of milliseconds.

Cold start: nothing is present on the node which is assigned to launch the process. It will need to fetch some or all layers from a registry. Order of seconds to minutes.

Some of these problems can't be solved by Kubernetes (it doesn't know how to freeze a process), some of them can't be solved by Docker (it doesn't know about other nodes that might have warmer disk caches). Then there's the business of deciding where to route traffic, how to prewarm caches and based on what predictive approach, avoiding siloisation of processes due to feedback loops.

As these are knocked down they will pay dividends for everyone.

rdallman · on Aug 14, 2018

Hey there, fn dev here. Cool to hear about folks testing the graal images! We have done some experimental work and made some discoveries in the container cold start time dept. which we've documented https://blogs.oracle.com/developers/building-a-container-run... (see: Fast Container Startup).

The main things are optimizations around pre-allocating network namespaces, we've built some experimental code to skirt around this https://github.com/fnproject/fn/blob/master/api/agent/driver... (which can be configured & tested) and we're testing approaches like this to attempt to get start times down to as low as we can. <100ms still seems like a lofty goal at this point, but this is becoming a pain point for various FaaS platforms that are container based and some optimization work should come out of all this. A few layers up, it's a little easier to speed things when throughput increases for a given function, but devs do seem to want the 0->1 case to be almost as fast as the 'hot' case so we FaaS people need to answer this better in general, it isn't something we want users to have to think about!