Wow! What a blast from the past. I originally submitted this article back in 2014 when it was hosted on my personal page hosted by my university. You can view the thread here: https://news.ycombinator.com/item?id=8222487
When I migrated to my own personal domain, I decided I would update the article to address the comments on that thread, and I think the end result was much better (but alas, I never had a reason to re-submit the article here).
Obviously I do not recommend using a GC like this in general, but as a learning resource for how memory works and some of the more obscure parts of executables, I think it works pretty well! The compliments I've received from this article have been pretty incredible. Seeing it here, uploaded independently, really warms my heart.
Oh, and this article is so old that I'm now a Rust guy. Crazy how that works!
Thank you. Your article helped me understand memory allocation and garbage collection algorithms. I successfully used this knowledge while developing my own language.
Stupid question, are there standard GC libraries that are used with C? Alternatively, what is the usual memory management practice? Is it typical just to keep track within the confines of the code you're writing and explicitly free memory when you're done with it?
The Boehm collector exists and is mentioned in the article, but is not part of the standard library. Typical memory management practice is to store on the stack as much as possible reclaming memory as it goes out of scope. Larger chunks of memory or those that outlast scope typically uses manual malloc + free and these are often batched to minimize overhead. It is fairly unusual to use a GC in C. This is likely due to multiple reasons: history/culture, language design (C only allows for conservative collectors), less precise control over runtime overhead and pauses, higher memory utilization, etc.
This pattern still exists in C++ in the form of custom containers that implement "small array optimization". The LLVM project is a heavy user of this pattern as far as I know and I have seen it in several proprietary codebases.
"The usual memory management practice" in C depends a lot an what you are doing. Malloc/free scattered through your code is...not uncommon. That does require discipline or ugliness ensues. Other common options are not allocating, which is kind of limiting but the only option under very tight memory constraints, and doing specific domain management options.
Manual malloc() + free() is the way. The language design makes it hard (if not impossible) to generate compilation-time GC routine. Runtime GC will be another thing, and you probably will lose raw pointer access, because we need some higher-level struct wrapping around the raw pointer to feed more information to the garbage collector.
If you want to keep your sanity, you don't want to be manually mallocing and freeing all over the place.
The most common patterns I often see are a "context" object of some kind (typically used by libraries) which handles all memory management internally and must be passed to every API call. So you only ever allocate and free that one object. (Internally they might be doing all sorts of crazy things, I've even seen a basic tracing GC!)
Applications typically use some combination of bump allocators (aka arenas) and pools. You put temporary allocations in a dedicated temp arena and clear it out when appropriate for the application (i.e. every frame)
> Is it typical just to keep track within the confines of the code you're writing and explicitly free memory when you're done with it?
Yes. It's common to write OOP style C by using an opaque pointer in a header file and handle all the allocation and deallocation within the code file. The header file will export a "constructor" and "destructor" function. The constructor and destructor are still called manually, but this method properly encapsulates state to being visible only within a single code file, which prevents some accidental misuses of a type. The destructor should have a free for every allocation in the constructor, but done in reverse order. If you follow this pattern consistently, then `malloc` will only ever appear inside a constructor and `free` will only appear in a destructor. All other object allocation and deallocation is done via the relevant constructor/destructor.
Another technique is to make use of GCC `constructor` and `destructor` attributes, which are called before `main` and after `main` loses scope (or `exit()` is called). For example, you might have a static container type and only expose methods `add`, `remove` and `get`, then all allocation and deallocation happens solely within the code file and consumers of this API don't need to concern themselves with allocating and deallocating. This is only really useful for singleton-like (process global) data structures, but you could perhaps utilize these for implementing a GC, since an allocator is usually global to the process. (Eg, the `GC_init` function in the article could be given the constructor attribute so that the programmer would not need to manually call it).
Useless? These algorithms may be simple but they're not toys. This stuff has been used in production. You can certainly improve it but it's not "useless".
Did you know Ruby has a garbage collector not unlike this one? People are running it in production right now.
> a toy GC
It does its job. These algorithms have been working for decades now.
> without threading support
It can just stop the whole world.
> without register scanning
Add some inline assembly to spill all the registers. I did this on my own language implementation not even a month ago. If this was a real project I'd submit a patch.
The author even acknowledged this limitation and hinted at how to solve it, he just didn't want to get into those gritty details in his article. I'm happy to demonstrate if you want.
> Imprecise also. How do you seperate large ints from pointers?
You don't. You collect conservatively. It will have false positives but not false negatives. In other words, it will mistake some ints for pointers but it won't collect memory that's in use. Correct if suboptimal behavior.
I can see you are annoyed from getting downvotes. Unfortunately, I do not know the point you are trying to make in your message. I would suggest trying to write out full sentences, so we can understand what you are trying to say.
Thank you. All my heart.
This made me understand how I could easily manipulated.
And sorry.
I would like to chat about language-s- but you - the community you represent - will percieve it as arguing.
I would like to to tell my intention but, it will perceived some kind of defense or offence.
What I'm trying to is, simply read a book "How to talk anyone" and practise.
I'm not a native speaker.I'm tired to this language barrier.
I'm just trying to simple enough to make some spark with the words alone.
We all know those words, we all know their meaning. And yes.
I was not aware that I'm being manipulated - say my ego, say what I perceived or offended -
And I understand, this is wrong. ( Not the way I talk, playing karma game and argue )
Again. We - non Eng. speakers - can not communicate with you - I still have hope but sad, -> , bad -> , offended --> defended -
Sorry for the shit.
Sorry for my language.
Sorry for my will to exist or communicate.
I'm sorry.
I'm sad - not sure -
And broken.
I do not really know -deeply- why I'm doing that.
But sorry.
I'll practice to talk anyone somewhere I feel free, accepted, - self petty ( joke lets say) -
Peace.
My heart long left with place, I'm sorry for all of us.
Peace.
And thank for another spark of showing me that, I'm not a pro. programmer, I'm just an ametour.
The man I know, will forget these and want to express itself.
But no.
Lets say good by.
Have a wonderful life, grow, progress, ...
I believe you have good intentions, so with respect I will try to help.
Maybe try write what you would like to say in your native language, into a translation site or app, and then copy/paste the English version here? Then you can do the reverse for replies you receive.
Something that may help the translator app work better, is to simplify your native writing, as if communicating with a young person.
Doing this regularly may also help you improve your English.
Or if you are doing that already, I suggest trying a different translator app/site, as the one you are using is not giving an accurate result.
sorry love, I passed that way long a go.
even further, I tried to use -our- besty LLM
and for my English,
I decided to continue the book I'm reading.
I got that this place is so so serio-U-s to me
And yet, even I -we - put our best intentions.
Yet I'm here, and yet I'm going same path again. -- SO you called --
I do not know why, but, love, I'm happy to talk as it is.
And , I let go my English language love and started to learn a new one.
Sorry.
But you - all - made my day.
Sea is flat. Sky is blue. And air is nice.
I'm happy to have a moment , final look this serious, lovely jungle
And yes, I'm so child not letting the elders final words.
Love
This is for my final carma.
Please, be carrefull what you wish for ?:)
Do downvote me, so I can stop talking without knowing why , how, when , ?
And do not let me waste your attention.
And please offer me babel fish if you have, if it is possible to communicatE
and yet, I know, this is shit.
yeah, ma, please clean my shit.
When I migrated to my own personal domain, I decided I would update the article to address the comments on that thread, and I think the end result was much better (but alas, I never had a reason to re-submit the article here).
Obviously I do not recommend using a GC like this in general, but as a learning resource for how memory works and some of the more obscure parts of executables, I think it works pretty well! The compliments I've received from this article have been pretty incredible. Seeing it here, uploaded independently, really warms my heart.
Oh, and this article is so old that I'm now a Rust guy. Crazy how that works!