Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You might as well classify allocating without freeing as a form of garbage collection. If the implementation can fail to collect cycles and the language can create them, thus leaking memory, lumping in reference counting with real garbage collection is a bad definition because it smears together a crucial distinction.

I get that some people, or many people, or some book, have defined it that way, but others have used it in the meaning as distinguished from plain refcounting. Many developers have always understood GC and refcounting (in languages with mutation) to be mutually exclusive concepts. That definition is used in practice, and when people say, "I'm going to implement garbage collection for my programming language," what they mean is one which doesn't leak memory.

A reason people use this other definition may be that plain reference counting and "having a garbage collector" are mutually disjoint, even if you define "garbage collection" as including reference-counted implementations. Making garbage collection a distinct concept from the set of acts a garbage collector might do will always be fighting against how language usually works.



>You might as well classify allocating without freeing as a form of garbage collection.

Of course, Java even explicitly supports this garbage collection strategy:

https://openjdk.java.net/jeps/318

The best definition I've read comes from Raymond Chen, which is that garbage collection is a system to simulate an infinite amount of memory:

https://devblogs.microsoft.com/oldnewthing/20100809-00/?p=13...

This definition is very elegant in that it allows one to reason about program correctness across all garbage collectors without having to know any of the implementation details, be it reference counting or the many flavors of mark and sweep. Basically garbage collection is not an implementation detail, nor is it a specific algorithm or even class of algorithms, garbage collection is a goal, namely the goal of achieving an infinite amount of memory. Different algorithms will reach that goal to different degrees and with different trade-offs.


They get lumped together because when some people say "garbage collection" they mean "whatever Java specifically does", and when other people say "garbage collection" they mean "automatic dynamic lifetime determination" (as a contrast to C-style "manual static lifetime determination", or Rust-style "automatic static lifetime determination").


RC need not leak memory. If it's done right, it doesn't.

That Objective-C exposed the innards of the system badly enough that it was possible to leak, that's not intrinsic to RC. It's just yet another piece of the awkward legacy puzzle that is Objective-C.

As for cycles.. Cycle detection in RC has been a solved problem for over two decades. I implemented it once, based on a paper from some IBM researchers, if I recall.


Cycles in RC is a fundamentally unsolvable problem because there are situations when you in fact want an unreferenced cycle exist in memory at least for a while.

Imagine a slow async operation such as a network request, where a GUI object is locked in a cycle with a self reference in a callback. Before the network response arrives the GUI object may be removed from the view hierarchy (e.g. the user has navigated away), however, whether you want the cycle to be broken or not depends on your needs. You may want to wait for the results, update stuff and break the cycle manually, or you may choose to completely discard the results by declaring weak self.

Therefore, solving the problem of cycles with RC I think means giving more manual control. Explicit weak pointers solve part of the problem, but the "I know what I'm doing" scenario requires great care and is obviously error prone. To my knowledge this hasn't been solved in any more or less safe manner.


I think in the scenario you describe, you always want to hang onto a reference to the unfinished network activity because you will need to be able to interrupt it and clean it up. You never(?) want to leak active running code in the background, especially not code with side effects, which isn't described by some kind of handle somewhere, in part because there's probably a bug and in part because people can't understand how your code works if there isn't a handle to the task stored in a field somewhere, even if you don't use it.


Of course there's a reference to the network resource and that's the reference that creates a cycle via its callback defined in a bigger object (e.g. GIU view) that awaits the results. There may be reasons why you want to keep the cross-referencing pair of objects alive until the request completes. Not saying there aren't alternative approaches, but it may be easier to do that way.

I came across this while coding in Java, when I realized a network request may be cancelled prematurely just because the GC thinks you don't need the network resource anymore, whereas in fact you do. There's no easy solution to this other than reorganizing your whole request cycle and object ownership. Classical abstraction leak.


I mean, the GUI object with its reference to the network request should never be detached like that. The network request handle should be moved to something else that takes care of it, if the GUI object can't, and it should not be calling a callback tied in a strong way to the GUI object's existence. Granted, I wasn't there when you had this, and I'm not saying there's an easy solution. What I am basically saying is there should be a chain of ownership to some local variable on the stack somewhere, instead of a detached GUI view. I'd view the resource leak you describe as hacky, and a class of design decision that has caused me a lot of pain in gui software. Well, maybe I had to be there.


Yes, you can move your references around to find a suitable place, but how does that solve the problem of lifetime control? How do you guarantee a network resource is alive for as long as you need it under GC, and under ARC?


By holding a reference to it that is grounded to a GC root i.e. an an ownership chain to a local variable that will dispose of it. Also which can be used to interrupt the background task. So the network connection, callback, etc., are not in a detached cycle.

With the design you're describing -- without the ability to interrupt the background task -- I would bet you tend to get race conditions of one kind or another.


How come python works then? It uses rc with cycle detector, doesn't it?


Yes it does.


Can you link to some resources/papers detailing how to solve this problem? I’m only aware of “RC+GC” solution (i.e. adding a full tracing GC in addition to RC to find and collect cycles).


David F. Bacon and V. T. Rajan, Concurrent Cycle Collection in Reference Counted Systems

https://www.cs.purdue.edu/homes/hosking/690M/Bacon01Concurre...


That's the one I was thinking of, and I once implemented that one myself in C++ back in, uh, 2001, 2002, during a bout of unemployment in the .com crash.

I guess I just assumed that there's been some advancement in the world since then.


So many words, so little clarity.


Are you referring to the response by SamReidHughes? I understood it perfectly, and indeed, that was my own understanding as well.

Could you elaborate on what you feel should be explained with more clarity?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: