*the speed you gain by being able to whip up a feature in 10 minutes is worth th...

sillysaurusx · on May 1, 2020

Not at all. I share your frustration.

Closures on a server are a powerful way of representing data flows. But they come at a cost: the links expire after some time. How do you strike a balance?

The simplest way is to put in the extra time to make everything into a persistent link. But, that's equivalent to removing all the benefits of lexical scoping. If you've ever created an inner function before, you know how powerful that technique can be. You can encode the entire state of the program into closures -- no need to store things in databases. Want a password reset link? Spin up a closure, email the link, done. Literally identical to storing a reset token in a database, except there's no database.

Another solution is to fix the root problem. Does the closure really need a random ID every time the page refreshes? The closure links die because they have to be GC'd periodically, to keep memory usage down. Even if you cache the page for 90 seconds, that's still 960 refreshes per day for logged-out users. Then if you have a few hundred regular users, that's at least another factor of two. And certain pages might create hundreds of closure links each refresh, so it quickly gets out of hand.

Ironically, the solution was emacs -- in emacs, they store closures in a printable way. A closure is literally a list of variable values plus the function's source code. That got me thinking -- why not use that as the key, instead of making a random key each time the page refreshes? After all, if the function's lexical variables are identical, then it should produce identical results each time it's run. No need to create another one.

That's what I did. It took a week or so, which is a week I'll never get back for building new features. But at least users won't have to deal with dead links anymore.

Clever readers will note a theoretical security flaw: an attacker might be able to guess your function IDs if they knew the entire state of the closure + the closure's source code (which is the default case for an open source project). That might give them access to e.g. your admin links. But that's not an indictment of the technique; it's easily solved by concatenating the key with a random ID generated at startup, and hashing that. I'm just making a note of it here in case some reader wants to try implementing this idea in their own framework.

The closure technique has nontrivial productivity speedups (that I think someone will rediscover some years from now). I hope the idea becomes more popular over time.

kazinator · on May 1, 2020

> How do you strike a balance?

How about a keepalive from the client side: little bit of JavaScript that somehow tells the server that the session is still alive, so don't blow away the closure or continuation.

sillysaurusx · on May 1, 2020

Since this is getting a surprising amount of interest, let me sum up the technique here. It's really not hard to implement it in Javascript using Express.

1. inside of your express endpoint, create a closure that captures some state. For example, the user's IP address.

(This is a dumb example, but the point is that you can capture whatever state you want. Some cases are mentioned here: http://paulgraham.com/road.html and here: http://ep.yimg.com/ty/cdn/paulgraham/bbnexcerpts.txt)

EDIT: I updated this to capture the date + time the original page was loaded, which is slightly more compelling than a mere IP address.

  app.get('/', function (req, res) {
    let ip = req.headers['x-forwarded-for'] || req.connection.remoteAddress;

    let now = new Date();
    let date = now.getFullYear()+'-'+(now.getMonth()+1)+'-'+now.getDate();
    let time = now.getHours() + ":" + now.getMinutes() + ":" + now.getSeconds();

    let fn = (req, res) => {
      res.send(`hello ${req.query.name}. On ${date} at {time}, your IP address was ${ip}`)
    }
    ... see below ...
  })

2. insert that closure into a global hash table, keyed by a random ID.

  g_fnids = {};

  app.get('/', function (req, res) {
    let fn = ...
    let id = <generate random ID>
    g_fnids[id] = fn;
    res.send(`<a href="/x?fnid=${id}&name=bob">Say hello</a>`);
  })

3. create an endpoint called /x which works like `/x?fnid=<function id>&foo=1&bar=2`. Use <function id> to look up the closure. Call the closure, passing the request to it:

  app.get('/x', function (req, res) {
    let id = req.query.fnid;
    let fn = g_fnids[id];
    fn(req, res)
  }

Done.

Congratulations, your closure is now an express endpoint. Except you didn't have to name it. You can link users to it like `<a href="/x?fnid=<function id>&name=bob">Say hello</a>`.

The reason this is a powerful technique is that you can use it with forms. The form target can be /x, and the query params are whatever the user types into the form fields.

I bet you already see a few interesting use cases. And you might notice that this makes scaling the server a little more difficult, since incoming requests have to be routed to the server containing the actual closure. But in the meantime, you now have "inner functions" in your web framework. It makes implementing password reset functionality completely trivial, and no database required.

If it seems slightly annoying to use – "I thought you said this was a productivity boost. But it's annoying to type all of that!" – lisp macros hide all of this boilerplate code, so there's zero extra typing. You can get lisp macros for Javascript using Lumen lisp: https://github.com/sctb/lumen

Even without macros, though, I bet this technique is shorter. Suppose you had to store the date + time + IP address somewhere. Where would you put it? I assume some sort of nosql database like firebase. But wouldn’t that code be much longer and more annoying to write? So this technique has tremendous value, and I’m amazed no one is using it circa 2020.

bollu · on May 1, 2020

This is really funny; Back in the warcraft 3 days, we used to do the same thing inside its scripting language --- to attach some data to a timer, we would exploit the fact that a timer is in fact just a 'void *' underneath: so the pointer address gave us the unique ID. We would stash data associated with the timer in a global hash table. Then, in the callback of the timer, we would read the data back from the global hash table!

Your exposition took me a trip down memory lane to middle/high school. Thank you for this :)

- References: https://www.thehelper.net/threads/jass-timers-and-how-to-pas...

awild · on May 1, 2020

H2I was such a cool function.

I completely forgot how terrible of a language JASS actually was. Have mostly fond memories. But oh my is it verbose

nivethan · on May 1, 2020

Well this is certainly the coolest thing I've read today. I've been trying to grok closures and this helps a bit. Is there a reason to not use the global hash table itself to store the state instead of a closure? This seems to be trading a database with memory. This also seems to be harder to interrogate, what if I want to go in and see what's currently outstanding, instead of going to Firebase, I'll need to go through the hash table and check the content of the function.

I think I may be missing something someone who's actually worked with lisp can see, to me closure, recursion and functional programming is cool but I can do everything its showing off using the standard fare of loops and databases.

lstamour · on May 1, 2020

The biggest difference is the “...” assignment to fn. The idea is similar to AWS Lambda — write functions, store those functions, and then call them later when you need them. I’ve minimal Lisp experience, but from my perspective, a closure is a function you can store in a variable that’s defined with a scope, or a set of arguments/variables used in your function, that often (but not always) includes variables from the parent scope it was defined in, (often only) if referenced by the function. Because closures need to have independent scopes, by default values (or variables) that can mutate need to be copied — alternatively, you can use immutable data structures which copy more cheaply. The big differences then between closures and other types of code often comes down to how frequently immutability is used, and whether you call functions that assume state or share state (more OO, or non-FP), or transfer functions with state to other functions (FP, though composability and other properties matter too when defining FP, this is a simplification). This is a bit of a vague answer, perhaps others can chime in with a better one. And if you’re not careful, FP can introduce problems too, though that happens more often with distributed, multi-threaded or recursive programs which can themselves be hard to write using non-FP also.

throwanem · on May 1, 2020

It's not all that similar to AWS Lambdas in concept or in execution. Those are stateless; to a very good first approximation, they're just a single-route web server with all the boilerplate abstracted away, and that starts up a fresh instance to handle each request and is shut down again immediately after.

What 'sillysaurusx describes is much more similar to what, in Scheme and elsewhere but these days mainly there, is called "continuation-passing style". It's a way of pausing a partially completed computation indefinitely by wrapping it up in a function that closes over the state of the computation when you create it, and calling that function later to pick up from where you left off when you're ready to proceed again.

I suppose you could maybe do that with an AWS Lambda, but because the technique relies strongly on the runtime instance staying around until the computation finishes, it would probably get expensive. Lambdas aren't priced to stay running, after all.

As a side note, it's worth mentioning that the "AWS Lambda" product, which whatever its virtues isn't actually a lambda, derives its name from the lambda calculus, where I believe the concept of anonymous first-class functions originates. I don't recommend reading about the lambda calculus itself unless you're up for a lot of very heavy theory, but it's worth knowing that, especially in the Lisp world and realms adjacent, you'll often see the term 'lambda' used in a sense which has nothing to do with the AWS product, but rather refers to a form of abstraction that relies on defining functions which retain access to the variable ("lexical") scopes in which they were created, even when called from outside those scopes. Javascript functions have this property, which is why they're capable of expressing the technique 'sillysaurusx describes, and it gives them a lot of other useful capabilities as well.

lstamour · on May 1, 2020

True. Good distinctions. To re-iterate the above, the approximation to AWS Lambda would require dynamic AWS Lambda functions -- as in code that creates a Lambda with specific state embedded in it -- then tracks each of those by their unique Lambda identifier and ... yeah, that's where this breaks down because it's not all that similar to Lambda if the best use for a Lambda is repeated invocations of the same code. And Lambda IDs presumably aren't based on a hash of their contents and variables the way this is. But dynamic AWS Lambda functions are possible, so there's that. You could write this in Lambda, it just might be expensive if API calls to create and destroy one-time Lambdas are expensive enough. It's a lot cheaper and faster to build functions and store references to them in a hash table in memory.

Another similarity to this use of hashing the scope of a function would be in memoization of a function, to cache the output based on the input, such that you hash a function's inputs and assign to that hash a copy of the output of the function when run with those inputs. Then you can hash the inputs and skip re-running the function. You have to be sure the function has no side-effects nor any changes in behaviour or inputs not specified in the memoization hash, though. "Pure" functions are best for this use case.

throwanem · on May 1, 2020

Memoization is usually preferable if you can do it, sure. But you can't memoize a continuation, because what it expresses is a computation that has yet to complete and produce the result you'd need in order to memoize. And the use of the g_fnid hash table doesn't qualify as memoization, either, because the keys aren't arguments to the function that produced the values; what it actually is is a jump table, cf. https://en.m.wikipedia.org/wiki/Branch_table#Jump_table_exam...

lstamour · on May 2, 2020

Thanks for your reply. I ended up looking for a bit more on continuations from the perspective of JS Promises and found https://dev.to/homam/composability-from-callbacks-to-categor... which was a pretty easy to follow read on this if you take the time to understand the JS, though there might be better references to continuations elsewhere, this was just one of the first I found.

throwanem · on May 1, 2020

It works a lot better in a proper Lisp, where the REPL and debugger are first-class citizens. In Javascript, you can do it, but it's a dancing bear at best; as you note, the observability is poor to nil without heroic effort, and scalability's a problem too.

throwanem · on May 1, 2020

I mean, I can tell you right now why I'm not using it circa 2020, nor do I expect I shall in future. For sure, it's clever and it's elegant, a brilliant hack - but it's not durable, and in my line of work that counts for more.

On the one hand, as you note, this can't scale horizontally without the load balancer knowing where to route a request based on the fnid, which means my load balancer now has to know things it shouldn't - and that knowledge has to be persisted somewhere, or every session dies with the load balancer.

On the other hand, even if I teach nginx to do that and hang a database or something off it so that it can, with all the headaches that entails - this still can't scale horizontally, because when one of my containers dies for any reason - evicted, reaped, crashed, oomkilled because somebody who doesn't like me figured out how to construct a request that allocates pathologically before I figured out how to prevent it, any number of other causes - every session it had dies with it, because all that state is internal to the runtime instance and can't be offloaded anywhere else.

So now my cattle are pets again, which I don't want, because from a reliability standpoint shooting a sick cow and replacing it with a fresh one turns out to be very much preferable to having to do surgery on a sick or dying pet. Which I will have to do, because, again, all the persisted state is wrapped up tight inside a given pod's JS runtime, so I can't find out anything I didn't know ahead of time to log without figuring out how to attach a debugger and inspect the guts of state. Which, yes, is doable - but it's far from trivial, the way Lisps make it, and if the pod dies before I can find out what's wrong or before I'm done in the debugger, I've got a lot less to autopsy than a conventional approach would give me. And that's no less a problem than the rest of it.

Yes, granted, the sort of software you describe is incredibly elegant, a beautifully faceted gem. It's the sort of thing to which as a child I aspired. But as it turns out, here thirty years on, I'm not a jeweler, and the sort of machine my team and I build has precious little need for that sort of beauty - and less still for the brittleness that comes with it. Durability counts for much more, because if our machines break and stay broken long enough, the cost is measured in thousands or millions of dollars.

That's not hyperbole, either! Early one morning last November, I ran two SQL queries, off the top of my head, in the space of two thirds of a minute. When all was eventually said and done, the real value of each of those forty seconds, in terms of revenue saved, worked out to about $35,000 - about $1.4 million, all told, or seven hundred thousand dollars per line of SQL. And not one of the people who gave us all that money ever even knew anything had been wrong.

Granted that a couple of unprecedented SQL queries like the ones I describe, written on nothing but raw reflex and years of being elbow deep in the grease and guts of that machine and others like it, constitute a large and blunt hammer indeed. But - because we built that machine, as well as we knew how, to be durable and maintainable above all else - in a moment which demanded a hammer and where to swing it, both were instantly to hand. In a system built as you describe, all gleaming impenetrable surfaces between me and the problem that needed solving right then, how could I have hoped to do so well?

Only through genius, I think. And don't get me wrong! Genius is a wonderful thing. I wish I had any of it, but I don't. All I know how to be is an engineer. It's taken me a long time to see the beauty in that, but I think I'm finally getting a handle on it, these days. It's a rougher sort of beauty than that to which I once aspired, that I freely concede, and the art that's in it is very much akin to something my grandfathers, both machinists and one a damned fine engineer in his own right, would have recognized and I hope might have respected, had they lived to see it.

Do you know, one of those grandfathers developed a part that went on to be used in every Space Shuttle orbiter that ever flew? It wasn't a large part or a terribly critical one. You wouldn't think much of it, to look at it. But he was the man who designed it, drew it out, and drew it forth from a sheet metal brake and a Bridgeport mill. He was the man who taught other men how to make more of them. And he was a man who knew how to pick up a hammer and swing it, when the moment called for one. He was possessed of no more genius than am I, and his work had no more place in it for the beauty of perfectly cut gemstones than does mine. But he was a smart man, and a knowledgeable man, and not least he was a dogged man. And because he was all those things, my legacy includes a very small, but very real, part in one of the most tangible expressions of aspiration to greater, grander things that our species has ever yet produced. Sure, the Space Shuttle was in every sense a dog, a hangar queen's hangar queen. But, by God, it flew anyway. It 'slipped the surly bonds of Earth, and touched the face of God' - and next time, we'll do better, however long it takes us. And, thanks to my grandfather's skill and effort, that's part of who and what I am - and there's a part of me in that, as well.

No gemstone that, for sure! It has its own kind of beauty, nonetheless - the kind that leaves me feeling no lack in my paucity of genius, so long as I have an engineer's skill to know when and how to swing a hammer, and an engineer's good sense to leave myself a place to land it. If that was ever in doubt, I think it can only have been so until that morning last November, when I saved ten years' worth of my own pay in the space of forty seconds and two perfect swings of exactly the right hammer.

There's a place for the beauty of gemstones, no doubt - for one thing, in seeing to it this very long comment of mine isn't lost to the vagaries of a closure cache. And I appreciate that, for sure! It'd be a shame to have wasted the effort, to say nothing of any small value that may cling to these words.

But there's a place for the beauty of hammers, too.

kragen · on May 2, 2020

The vast majority of websites don't need to scale beyond what a single computer can do, especially with an efficient runtime. You're right that if you're building Wikipedia or Amazon you need to scale horizontally. But most sites aren't Wikipedia or Amazon.

It's true that JS systems like Node aren't really designed for this kind of thing, although they could have been. Arc is.

specialist · on May 1, 2020

Yup. I somehow became a graybeard. I really didn't fit in at my last three gigs doing "backend" work.

I always play to win, so try to understand why & how I failed.

My current theory:

I had good successes doing product development. Shipping software that had to be pretty close to correct.

Today's "product development" is really IT, data processing. Way more forgiving of stuff that's not quite right. Often not even close to right. (Guessing that about 1/3rd of the stuff I supported didn't actually do what the original author thought it did, and no one was the wiser, until something didn't seem quite right.)

One insightful coworker said it best: "I learned to do everything to 80% completion."

My observation is that most team mates created more bugs than they closed. Maybe incentivized by the "agile" methods notions of "velocity". And they were praised for their poor results.

Whereas my tortoise strategies nominally took longer. So I had fewer, larger commits. Way fewer "points" on the kanban board. Created much fewer lines of code.

(When fixing [rewriting] other people's code, mine was often 50% to 80% smaller. Mostly by removing dead code and deduplication.)

I was able to bang out new stuff and beat deadlines when I was working solo.

I think the difference between solo and team play is mostly due to style mismatches. It's very hard for me to collaborate with teammates who are committing smaller, more frequent, often broken, code changes.

Any way. That's my current best guess at what's happening to this graybeard.

More optimistically...

I'm very interested in the "Test Into Prod" strategies advocated by the CTO from Gilt (?). It's the first QA/Test strategy (for an "agile" world") that makes any kind of sense to me. So I think I could adapt to that work style.

(I served as SQA Manager for a while. It's hard to let go of those expectations. It's been maybe 20 years since I've seen anyone doing actual QA/Test. I feel bad for today's business analysts (BAs) who get stuck doing requirements and monkey style button pushing. Like how most orgs functioned in the 80s.)

throwanem · on May 1, 2020

I see more gray in my beard every morning. And the thing about "...to 80% completion" is that the first 80% of value is captured in the first 80% of effort, and the last 20% of value in the other 80% of effort. It's important to know when to follow that ROI graph past the knee, for sure. But it's just as important to know when not to.

(I mind me of a time a few years back when I was surprised to learn that the right method of exception handling, for a specific case in some work I was doing on a distributed system, was none - just letting it crash and try again when the orchestrator stood up a fresh container. It felt wrong at first; ever before I'd have instead gone to a lot of painstaking effort to reset the relevant state by hand, and my first instinct was to do the same here. But crashing turned out to be the right thing to do, because it worked just as well and took no time at all to implement.)

specialist · on May 2, 2020

Ya. Sorry, I'm still struggling on how to phrase this.

I'm totally on board with 80% feature complete, MVP, biz case, etc.

What I'm trying to articulate, here and elsewhere:

implementations that apparently work, but not really

coders who don't check their own work

doing the bare minimum to get awarded the agile velocity victory points

futility of arguing definitions of "done"

lack of feedback loops

that cliche about managing to metrics leads to undesired outcomes

the personal cost, burden of craftsmanship in a "break things and move fast" world