> As of 2012, all modern browsers ship a mark-and-sweep garbage-collector. All improvements made in the field of JavaScript garbage collection (generational/incremental/concurrent/parallel garbage collection) over the last years are implementation improvements of this algorithm (mark-and-sweep), but not improvements over the garbage collection algorithm itself, nor its goal of deciding whether an object is reachable or not.
WebKit uses a constraint-based garbage collector that does not rely on reachability alone. This is an improvement over the classical garbage collection algorithm.
Nice article. Minor gripe about the static vs dynamic memory section. The requirement that the data sizes be known at compile-time for static memory (with the example of an array allocated to a user-inputted size), seems to be based on a past restriction of the C language. C has since remove the requirement that stack-based arrays are sized with a compile-time constant; there is nothing at the hardware/assembly level which prevents such arrays. So these stack-based non-compile-time-sized arrays don't fit into either the static or dynamic memory categories presented here.
I was bothered by saying that static memory is assigned on the stack (implied that it is only on the stack). As an embedded guy, at least in bare metal situations, local function statics, any const variables, and any globals are either in the read-only data section[1] or in the general data section[2], both before the heap and definitely not in the stack section.
[1] - .rodata section in gcc
[2] - I think this is in the .data and .bss sections, where .data is copied from the file and .bss is zero initialized before calling main by the crt. If you peak into a linker script, or dump one by passing --verbose to ld, you can even see where it puts all the C++ bits and pieces. A dump I did on Debian with g++ is here: https://gist.github.com/Phyllostachys/0682a3bda13ef9c6b49d04...
After looking at that default linker script dump, I noticed it didn't have the stuff I saw in the ARM linker script that I'm used to. So I attached one for a Silicon Labs EFM32 part. It's below the x86-64 linker script and is a little easier to read.
Yeah, of the four things listed, I think it's the one that would trip me up the most. I think the safety net here would be dead code elimination: `unused()` should be detected as never called and removed during compilation/transpilation so that this wouldn't be an issue.
I am sort of surprised that `unused` is not picked up by the garbage collector in the first place though. Since JS functions are objects, shouldn't it detect an object that's not referenced during the mark and sweep?
In general, I really hate having to debug memory leaks in JS or Python. The interpreter for both will randomly allocate additional memory as it runs, so using tools like Valgrind is next to impossible. The only reliable method I've found is to pepper my code with logging statements that show what the current memory usage is, run the code like 1000-10,000 times, and see the points between which the memory usage goes up without coming down on a consistent basis. Python's built in `gc` module seems nearly useless for determining what's actually stuck in memory, and having a billion libraries that can have their own memory leaks is also not fun. These are the times I miss C: when you leak memory in C you know it because it becomes painful fast and it's usually easy to find, if your code is sane.
The fix is to implement the closures properly, by only closing over individual variable slots. It looks like the engines are implementing closures by closing over entire windows of slots--that is, if two functions have the same scope, they inherit the _union_ of the variables they reference as a single window/block of variable slots.
Yes. The function's frame contains a reference to the object storing local variables of the parent frame. To do better you need to store a list of variables referenced (which you can only do if there's neither direct-eval nor a with statement).
> In general, I really hate having to debug memory leaks in JS or Python. The interpreter for both will randomly allocate additional memory as it runs, so using tools like Valgrind is next to impossible.
I don't know about Python, but for JavaScript can't you use the built-in devtools? Maybe try grabbing a heap snapshot, or recording a record allocation profile and go from there?
Would it be better to have to explicitly declare what variables you want to import into the closure like PHP or C++ do? C# also captures everything by default and reference which has tripped me up quite a few times.
That might help but it seems like either an implementation or language spec fix is in order. There doesn't seem to be a reason for a function without free variables to turn into a closure at all, thus preventing the issue.
The particular memory issue is new to me (though now I can watch out for it, yay) but I'm not surprised... JS lacking proper lexical scope causes many issues.
Lexical scope is a semantics concern - observable behaviour that is required for correctness. The case under discussion is an implementation concern - there's no necessity for it to leak by creating a linked list of activation records as further analysis could break the chain. The two are not related.
when you say "proper lexical scope" do you mean just block vs function level scoping of variables? If so, i wouldn't say javascript is wrong it's just different.
Who is this article written for? There's a whole section on "What is memory?". If you are optimizing to remove memory leaks I really hope you already know what memory is.
It was helpful for me. I don't have a CS background and I have been coding javascript for around 4 years now. There are a ton of other devs who don't have a proper knowledge of basics. If you are well versed then you can skip over to the next section.
In my experience there are (I don't know how many) some programmers who, given how they were taught/learnt, can do productive work but generally don't know computing/programming in the abstract e.g. Memory at the machine level, or type systems.
m with you pal, memory leaks are the last topic you touch when optimizing a function, usually you start lowering the execution time, followed by I/O blocking issues then proceed to server-related issues, network latency and after all that is covered you start looking for "memory leaks" so explaining memory is usually unnecessary at this point, since usually JR's devs are more focused on producing software rather than optimizing it.
IMHO memory leaks are important only on embedded software because you start there with a really low memory available for your software to run.
Well, you can run JS on the JVM through Oracle's Rhino (now Nashorn), but apparently perf is still largely worse than Sunspider or V8. The language doesn't lend itself to optimization as much as java does for JVM bytecode: https://blogs.oracle.com/nashorn/nashorn-architecture-and-pe...
Probably part of the problem is also the fact that JavaScript is a very dynamic language.
I think even the JVM team would struggle to improve on the state of the art in js vm tech. Their experience in making JVM might not be all that useful in the context of js.
I'm really glad that I use a garbage collected language. Unless I'm doing low level programming that requires controlling allocation and freeing, it's amazing. Yes, I still have to understand the basics of memory but I'm just very glad that most of the time, the basics are far more than enough.
I agree. I strongly dislike "magic" in programming. I prefer to call it "denial" because you need to know about the complexities anyway, and "magic" often means sweeping them all under the rug.
Meh, if you try and split hairs, even assembly has magic in it nowdays. Not all instructions take the same amount of time. Some flush caches, thus cause unexpected memory behavior, etc.
However, I think it is fair that most people learn roughly what the side effects are of each line at a local level.
Ironically, this is an argument against many functional languages. There are not side effects of the logic, per se. However, there are massive implementation side effects that are not necessarily easy to reason on.
The saving grace for the vast majority of people is that typically you can get by without knowing all of this. The people that care, do care. But statistically you are not one of them. :)
That said I think the issues with assembly you mention aren't magic as such, they're just consequences of the commands. They don't really hide much (if anything) behind the scenes that you'd have access to anyhow.
It's just that CPUs do so much more than they used to.
It's not that simple though. Most modern CPUs that support the x86_64 instruction set don't actually run them as instructions on the hardware. They do all sorts of magic to queue operations, increase pipeline throughput, manage register access, make branch predictions, etc...
You can think of assembly on those cpus as a high level language. It has little correlation with what's actually happening in hardware.
This is EXACTLY the same type of "magic" that is getting complained about above. The real implementation details are hidden and unknown, but the abstraction is useful.
I worded my post poorly. The "However, I think it's fair" was me agreeing with you. Pretty much completely.
I was just musing on how the arbitrary line is probably not as difficult to see as many other lines we have out there. I think this would fall into "systems languages" and related things.
It's certainly subjective, but for me, what I pejoratively refer to as "magic" is things that there's just no escaping knowing, yet are abstracted in a way that obfuscates what's going on. Often, it's presented as "it just works" which ends up being a hindrance since there's just no getting around the thing that it's hiding for you.
> To prevent these mistakes from happening, add 'use strict'; at the beginning of your JavaScript files. This enables a stricter mode of parsing JavaScript that prevents accidental global variables.
I don't think using strict will prevent accidental global variables, such as this.var in global scoped function calls. Strictness main goal is to prevent inadvertently misspelled variables from going unnoticed.
WebKit uses a constraint-based garbage collector that does not rely on reachability alone. This is an improvement over the classical garbage collection algorithm.