Algorithmic Complexity of Left-Pad (2016)

kccqzy · on July 31, 2021

The fact that the Safari result appears to have two linear fits doesn't surprise me at all. Apple has a history of implementing multiple concrete classes for different sizes for the same abstract class. It has been known that NSArray and CFArray will switch the underlying data structure once it grows past a certain size: https://ridiculousfish.com/blog/posts/array.html Now I'm not saying this NSArray trick was what happened here: it clearly is not because otherwise we'd see a quadratic curve followed by a linear curve. I'm just saying the data structure seems to have changed at a certain size cutoff.

throwanem · on July 31, 2021

An optimized implementation (as "padStart") is in the standard library now: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

pwdisswordfish8 · on July 31, 2021

The C-influenced nomenclature is not needed here. By the nature of what you're referring to, it is not a library. "In the standard" will do.

mbrubeck · on July 31, 2021

Okay, let's see what “the standard” has to say…

“Clauses 18 through 28 define the ECMAScript standard library.”

—ECMA-262, 12th edition, June 2021, ECMAScript® 2021 Language Specification, §4.5

pwdisswordfish8 · on Aug 1, 2021

... so? The current incarnation of TC-39 has produced something that's a mess in more ways than one. The spec somehow managed to make it all the way to 5.1 and with that term nowhere to be found. With all the people who've got their hands in the spec nowadays, that someone playing it fast and loose with language also used the wrong terminology is not particularly surprising, nor does it make it any less grating to hear.

(Does the standard provide for a way not to run with the "standard library"? To opt for a different library of your choosing, for example, or no library at all? No? So it's a set of built-ins that are just an inescapable part of the language, then.)

bqa65 · on July 31, 2021

Er, C did not invent libraries by any measure whatsoever, and is completely appropriate to call a function available to code without imports “in the standard library”. It is a library function. You did not write it. Simple as that.

This is one of the more unnecessary pedantries I’ve ever seen, and that’s saying something on HN.

skrtskrt · on July 31, 2021

I always assumed just from how people used the terms that available without imports = “core language” and available via import without installing external packages = “standard library”

wk_end · on July 31, 2021

FWIW my intuition is that any functions, classes, constants, whatever - things that I could write myself if I had to - are “standard library”. “Core language” refers to syntactic or semantic features of the language that would require modifying the compiler or interpreter to introduce.

throwanem · on July 31, 2021

That may indeed be a more useful distinction to draw here. In it, Node could be said to provide a "standard library" (although not actually standardized, and I'm not sure how much it overlaps with eg Deno), while padStart and such would be core language features inasmuch as a compliant JS implementation is guaranteed to include them.

edit: It is not, however, an accurate distinction. https://news.ycombinator.com/item?id=28021135

pcthrowaway · on July 31, 2021

Even if you call the node built-in libraries 'the standard library' (for node), from context I think it was clear that GGP was referring to the ECMAScript standard library

throwanem · on Aug 1, 2021

Pretty sure GGP in your usage here was also me. I was so referring, but had been unaware that ECMA-262 actually defines "standard library" in a way with which my intuitive usage happened to correspond.

brundolf · on July 31, 2021

> But the principle remains, that it was actually completely impossible for us to analyze left-pad’s performance without a deep understanding of the specific underlying Javascript VM we cared about (or, in my case, resorting to brute experiment!).

And this is why we benchmark before optimizing

jonathrg · on July 31, 2021

It had been optimized (and deprecated) since then. See https://github.com/left-pad/left-pad/blob/master/index.js

ghj · on Aug 1, 2021

I've commented about this before, but their supposedly optimized implementation is still garbage: https://news.ycombinator.com/item?id=24125312

There's no reason to build the repeats up by doubling since the string concatenation of two same string is still linear time.

iamcreasy · on July 31, 2021

I am wondering if the graph that shows Safari performance is actually quadratic. I am referring to the the 2nd linear segment.

On that graph, 400,000 length required 2500 in time. If it's quadratic, 800,000 would require 10,000 in time. If you imagine the plot goes all the way to 800,000 - it's not hard to see that the line/curve hitting at 10,000.

Thoughts?

lilyball · on Aug 1, 2021

You're cherry-picking your data. If you pick any two points on a line, you can build a quadratic curve that meets those points, but doesn't match the rest of the line. Not to mention that in this case while the line is somewhat close to 2.5k, 400,000 actually has a plotted point at about 3k.

iamcreasy · on Aug 1, 2021

Lets try a different one then. 400,000 needs 2800 unit of time. If it's quadratic 600,000 would need 2800 * 1.5^2 = 6300. On the plot 600,000 takes about ~6300 unit of time.

I think the line is drawn incorrectly. You can always fit a straight line into a curve if you zoom in enough which is what I think is happening here.

I have not tried it myself but I strongly suspect that if you try to fit a linear and quadratic regression line in that part of dataset - the quadratic line would have better fit(coefficient of determination) compared to linear regression line.

Delk · on Aug 1, 2021

Would that quadratic formula generally work if you picked any two random points from the graph instead?

Anyway, the graph doesn't even try to suggest the performance behaviour shows a single linear relationship. The post (and the graph) explicitly says there seem two be two different linear segments. It's true that a single linear relationship would be a wrong model, but does that mean quadratic is right either?

I might be completely off but I wonder if the relationship could be generally linear but with different coefficients before and after the cutover point due to the data becoming larger than fits in the cache at once, or something.

iamcreasy · on Aug 1, 2021

> Would that quadratic formula generally work if you picked any two random points from the graph instead?

You need 3 points for a fit a polynomial of degree 2.

I am not talking about the first linear segment at all. My concern is that the 2nd part is not linear. Drawing a line though eye balling is very error prone. That's point I am trying to make.

Delk · on Aug 1, 2021

> You need 3 points for a fit a polynomial of degree 2.

I'm a bit confused. Yes, you need (at least) three points to try and fit a polynomial of degree two. But both of your earlier examples of why the relationship could be quadratic consisted of two points.

I'd be more or less happy to accept that if the same quadratic formula worked (within a margin of error) for several other other, randomly chosen pairs of points in the graph.

The second segment seems roughly linear to me. There are a couple of points in the graph that are somewhat off, but it's (at least as shown in the graph) only a couple of points, and we don't know the accuracy of the measurements, so there could be some noise.

It's true that you can easily get spurious relationships by trying to see multiple different segments or otherwise zooming to an selected level. I'm just not really sure the points as a whole look like a quadratic curve either, and it's not impossible there could be multiple segments, as e.g. cache behaviour could potentially introduce such non-linearity in what could otherwise have linear performance.

iamcreasy · on Aug 1, 2021

If you are trying to find the best polynomial of degree 2 you need 3 points - as you can not fit a quadratic with only 2 points. I tested it with multiple points. 400,000 to 800,000, and 400,000 to 600,000. I should have done it the first time though.

It does look linear but that's the problems with eye ball testing. I think if you fit a quadratic, you'll start to see the gradual curve. Its will be a very zoomed in area in a parabola.

If I had more time I'd generate the dataset from the graph and try to fit linear and quadratic regression line and compare the models using coefficient of determination. That would provide some strong for/against evidence.

Delk · on Aug 1, 2021

> Its will be a very zoomed in area in a parabola.

That's entirely possible.

Of course someone who knew how the algorithm worked (within the js engine) could just tell based on the code. :)

8bitsrule · on July 31, 2021

After pre-creating a pad-string constant of sufficient length, no loop is needed. For each target$, calculate how long the needed pad is, grab that many characters, and prepend.

keville · on July 31, 2021

(Circa 2016)

jwlake · on July 31, 2021

where to i submit the issue to have it reposted to substack?