Go: Severe memory problems on 32bit Linux

wtallis · on April 6, 2012

Wow. That third message, with suggestions for avoiding the bug, reads like a twisted joke. Highlights:

"avoid struct types which contain both integer and pointer fields"

"avoid data structures which form densely interconnected graphs at run-time"

"avoid integer values which may alias at run-time to an address; make sure most integer values are fairly low (such as: below 10000)"

I understand that this isn't a completely brain-dead garbage collector, but warnings like that really scream "I'm just a toy language". It doesn't seem wise to call such a fragile programming tool production-ready or 1.0; the 32-bit implementation should be tagged as experimental, if only to lessen the damage to Go's reputation.

moilolita · on April 6, 2012

Unfortunately you are right. For me this is a very unpleasant surprise after I put aside D and rewrote my little framework in Go. Everybody said that the gc is not final, that there are some performance issues and they are working on it but I never imagined that such catastrophic bugs are not solved by now.

eternalban · on April 6, 2012

I also looked at D/Go -- I predict the reign of JVM is over (quote me).

I wouldn't bother with the 32 bit hiccups. Go hits the sweet spot pretty well. You have chosen well. Hang in there.

moilolita · on April 6, 2012

D has a nice allocation management(i loved the manual+gc approach) and in my benchmarks, optimized D was slightly faster than Go at almost anything and consumed up to 70% less memory (I assumed that was because the Go gc kicked in later and was a bit lazy) but D was weak at threading/synchronization and the documentation of the standard lib was quite messy and lacking. So I decided that in the long run Go will be better (cheap goroutines, channels, good stdlib /documentation + support from google and the prospect of a better GC, all indicated a clear winner).I really hope they fix this cause I can't throw away my atom box and ARM is becoming more and more important.

el_muchacho · on April 6, 2012

D is pretty strong at concurrency today. See for instance http://www.informit.com/articles/article.aspx?p=1609144 and http://ddili.org/ders/d.en/parallelism.html

moilolita · on April 6, 2012

It's strange that with all those threading examples, they didn't notice the need for a WaitGroup primitive. I know that you could implement it yourself. You could simulate the goroutines, implement channels and SCGI/FCGI and so on, but why bother when there is Go ?

el_muchacho · on April 6, 2012

Probably because there is no need for it ? If WaitGroup waits for the end of all tasks to continue, there is map/reduce which should do the trick.

complexmath · on April 6, 2012

Did you mean something like core.sync.barrier?

moilolita · on April 7, 2012

barrier is not a suitable primitive for a WaitGroup. The ideea is simple. You accept sockets in a loop and handle the connections in parallel threads.At one point you want to stop this loop and the main thread must wait for all the active threads to finish before exiting, otherwise some clients may receive "connection reset by peer". With a WaitGroup, every starting thread increments a counter, and every finishing thread decrements it;when the counter is zero -> all the threads finished.The main thread calls WaitGroup.Wait and it remains blocked until all the worker threads finish the jobs. I guess you could simulate it with core.sync.condition

complexmath · on April 7, 2012

At the program level, this is built in (D has daemon and non-daemon threads like Java). Or you could use ThreadGroup in core.thread. It would be pretty trivial to do this at the user level with messaging as well.

he_the_great · on April 6, 2012

std.concurrency "is a low-level messaging API." The language provides low-level concurrency primitives. It is possible a higher level library providing "goroutines" might be made, though there is not effort or plan.

nakkiel · on April 6, 2012

I wrote a couple of D programs long ago and I still have a taste of an unfinished language. If I recall correctly, arrays manipulation is pretty weird and the language feels a bit the same PHP feels: a bunch of different things pieced together with no coherence. Compared to Python or Go, it's a whole different world.

moilolita · on April 6, 2012

The Phobos standard library is quite unfinished indeed and poorly documented in some areas but the D2 language is rather complete. It has sh*tloads of features but this also makes it a bit harder to master. Go is lighter (very easy to take on) and has an impressive library. Long story short: with D2 I needed ~1 month to get a good grasp of the language and the std library (the library was the hardest part) while with Go I needed around a week.

el_muchacho · on April 6, 2012

Agreed, D is a much bigger and complex language than Go.

Comparing the standard libraries, Phobos doesn't look far behind Go in scope. There are big holes though, like crypto, which is entirely missing, and a complete SQL driver (was in development, but we haven't heard from it for a while now), although there is a binding for SQLite3 and several drivers for major RDBMSs (not in the standard lib, tough). Logging will be included soon. Most of the rest is included (networking uses libCurl), and Phobos quality is continuously improving, some parts of it being excellent both in terms of functionality and performance, like the new regex library. On some other parts, like containers, Phobos seems much more advanced than Go. OTOH, there seems to be more 3-party libraries for Go than for D, but we can't comment on their quality. And of course, both languages allow to bind C libraries.

rubashov · on April 6, 2012

> The Phobos standard library is quite unfinished indeed

How so? I think it's got pretty much what belongs in the standard library.

drey08 · on April 6, 2012

What did you find weird about array manipulation in D?

moilolita · on April 6, 2012

If Ceylon/Rust are not ready today it means they will be immature for at least 2 years from their release date. Without a large community behind them, they will slowly fade away like D did.In order to get community, they have to have a good/clean/pragmatic design and a library as comprehensive as possible. If they don't have a good library they better have excellent interop with other platforms or they will be doomed. IMO Go is a solid step in the right direction and if the authors will resist the temptation to complicate it with (too many) new features&paradigms I think it will fare well.

he_the_great · on April 6, 2012

> Without a large community behind them, they will slowly fade away like D did.

This is a rather strange assessment. The history of "successful" languages has been a mixture of "cool jump onboard" and "who can stay alive the longest to get a community." D is in the latter camp.

It seems to me, unreasonable to expect a language to be coming out the gate with guns blazing. People expect Nukes now!

moilolita · on April 7, 2012

The market will decide. As someone said on HN: the best language in the world would be a mix of Go and D and would be called GoD. When there were few options, yes, it was enough to survive long enough to get a community. But when you have plenty of similar options, some of them backed by some major actors (at least in the early stages), i think no one will sacrifice his productivity (other than for hobby projects) for the sake of one language. I really hope D will develop into something, but looking to various sources (abandoned projects for D1 on dsource.org) on the net and google results, it seemed to me that D peaked somewhere around 2007-2008 with a slight revival in 2010 when D2 was released.

el_muchacho · on April 8, 2012

The drop after 2007/2008 probably corresponds to the Phobos/Tango debacle, and the fact that Walter decided to fork with D2.

One of the main problems was, he was almost the sole compiler developer, and could hardly keep up with the tasks of maintaining 2 parallel branches and developing new ideas at the same time. People complained that they couldn't get involved as much as they wanted. It's understandable that many people thought that D didn't have a solid future with such uncertainties.

Nowadays, these problems are mostly overcome with a much better organization: there are several committers for the compiler, and several committers for the standard library. Phobos is the standard library, it's maturing, D2 has shown its strengths over D1 and the community is united again, because not only it is deeply involved with the design of the language and standard lib(through the m-l), it is also involved with the implementation of essential parts of it. 2011 has been a very good year for D, and I think that more than ever, the whole project feels like it's going in the right direction.

edit: I guess another reason D isn't gaining as much traction as it could is, it has been removed from the Alioth computer language shootout. For a language which is aimed at raw speed (and was brilliant at that when it was still on the shootout), it's a severe blow.

silon3 · on April 6, 2012

I strongly prefer ceylon/rust... But they're not ready yet.

BarkMore · on April 6, 2012

Go 1 defines and the backwards-compatibility guarantees one can expect as the language matures. It's more a statement about the language specification than it is about the implementations.

I know from experience that the gc 64-bit version is production ready. It's unfortunate that the limitations of the 32-bit version are not called out clearly on the website.

nakkiel · on April 6, 2012

Clearly, there was a need for "Hey, things won't change as much from now on". Now, I'm still afraid little work will appear on the GC, which is IMO Go's biggest weakness to date.

dchest · on April 6, 2012

Now, I'm still afraid little work will appear on the GC

No need to guess, look at golang-dev https://groups.google.com/forum/?fromgroups#!forum/golang-de...

nakkiel · on April 6, 2012

Oh yeah thanks. I guess I'm doing it wrong :) Glad to see some work on that front.

batista · on April 6, 2012

>I know from experience that the gc 64-bit version is production ready.

Is it really production ready though, or is it the same half-arsed implementation as the 32-bit one, just taking advantage of the larger availability of virtual memory on a 64 bit system?

ghusbands · on April 6, 2012

On 32-bit systems, the problem is that there's a very high chance of non-pointer data looking like pointers to existing data, which, in turn might have its own pointer-like data. This means that you can end up keeping an excessive amount of unreferenced data.

However, a 64-bit address space is so much larger that you can't really suffer the same issue. Unlike on 32-bit systems, neither high entropy data nor text will look like valid pointers.

Therefore, the technique can be validly described as production-ready for 64-bit systems.

jerf · on April 6, 2012

Hmmmmm... I'm smelling memory-based DoS if I, say, upload carefully crafted files to a webserver. What accidentally happens in 32-bit may be deliberately triggered in 64-bit.

agentS · on April 7, 2012

This is impossible. I don't like to make absolute statements, but I'll stand by this one.

In order to pull of an attack, you'd need to know what address range the program in question has been allocated, then figure out the smaller range that the runtime is actively using, then give it data with integers in that range. This is impractical.

If you think you can pull it off play.golang.org lets you upload text to a Go program on Appengine, then it compiles that program and runs it. This gives you 2 programs to attack, the playground binary, and the one compiled from your source. If you can do it, you'll have a way to kill machines inside Google.

Good luck.

pwf · on April 6, 2012

Is it that you can't suffer the same issue, or that you can't 'really' suffer the same issue?

Even if there's a smaller chance of it happening, any language that has ANY chance of killing your system when running as expected is a language I'll never bother to learn.

Is there any sort of analysis tool they might be able to put into the compiler to tell you if a data structure you created has a high chance of looking like a pointer?

ArbitraryLimits · on April 6, 2012

Can someone help me out here with a TL;DR? This is my first exposure ever to any technical aspect of Go, and this discussion makes it sound from this discussion like it doesn't use any tag bits in its pointers, but has its garbage collector run heuristics on the data it examines to see whether it "looks like text" to decide whether to collect it? I know that can't possibly be correct.

Arelius · on April 6, 2012

It's much simpler than that, it's called a conservative collector, It looks at every bit of data, at just pretends it is a pointer, if anything points to a valid allocated address, that address is retained. Otherwise, just like every collection when there are no references, the object is collected.

petrohi · on April 6, 2012

Good explanation on wikipedia:

http://en.wikipedia.org/wiki/Garbage_collection_(computer_sc...

oskarth · on April 6, 2012

If Google is, and has been for quite some time, using it for large-scale systems in production, I would call it production ready.

thrownaway2424 · on April 6, 2012

Is there any reason to believe that Google uses Go for large-scale production systems?

4ad · on April 6, 2012

Yes, it's used by YouTube, in a very critical path, large scale enough :)?

https://groups.google.com/forum/#!msg/golang-nuts/U5ilNZpXzN...

http://code.google.com/p/vitess/

0xe2-0x9a-0x9b · on April 6, 2012

I am sorry if the post puts Go in a bad light. That wasn't my intention. You shouldn't read the post as "32-bit Go is unusable" - read it as "If you encounter a garbage collection issue in your 32-bit program then you can use the rules mentioned in the post to fix the issue".

sp332 · on April 6, 2012

We know. But that does put Go in a bad light. If your garbage collector falls down if you use "big numbers", it looks bad.

singular · on April 6, 2012

Keep in mind that the comments on that thread are open to all people and thus that doesn't entail an 'official' response to the problem, though it does unfortunately seem to entail sound (but obviously crazy given the restrictions it puts you under) advice for avoiding the issue.

jedbrown · on April 6, 2012

However, it does not address the base issue, which is that Go uses a conservative garbage collector, and more values look like pointers in a 32-bit world.

The only real fix would be to improve the garbage collector's understanding of which values are pointers and which are something else (e.g., floating point numbers that happen to look like pointers). And that is not an easy fix.

How does this GC work? Is it literally just marching through the heap looking for pointer-sized values in the range that has been mapped to the process?

smanek · on April 6, 2012

Yep, that's basically it.

It sounds crazy (and it is!) - but it often works reasonably well in practice. SBCL (one of the most performant Common Lisp implementations) has an 'imprecise gc' that works the same way - and I've seen reasonably heavily stressed processes with uptimes in the weeks/months.

Remember, even a few years ago (before fastthread, etc) a reasonably loaded Ruby on Rails app couldn't stay up for more than ~10 minutes w/o memory leaks forcing a restart (DHH said 37Signals as doing ~400 restarts/day per process, IIRC) because the runtime was such a piece of crap. Yet, many people still used it to solve real problems and make real money.

At least Go's memory leaks are much slower than Ruby's ;-)

jsnell · on April 6, 2012

The SBCL garbage collector doesn't work like described in the grandparent comment. It's precise for the heap, and only conservative for the registers and stacks.

I can't say for sure how the Go GC works, but I would assume it likewise isn't fully conservative. E.g. the "avoid struct types with both integer and pointer fields" advice would be pointless for a fully conservative GC, but does make sense if structs containing no pointers are allocated in a separate memory region that's not scanned for potential pointers.

4ad · on April 6, 2012

I don't know anything about SBCL, but in Go the GC is used only for the heap, data allocated on the stack disappears as soon as the function returns, and registers have nothing to do with any of it.

microtherion · on April 6, 2012

Both data on the stack and registers may potentially contain the sole live pointer to a particular heap object, so unless you only run GC from some sort of main event loop, it's generally necessary to treat the stack and registers as potential roots as well.

jsnell · on April 6, 2012

Yes, but there are going to be pointers to heap objects in registers and on the stack.

joe_the_user · on April 6, 2012

But Ruby is more or less a "rapid development" environment.

You pay for quick development turn-around with machines.

Go is not in that kind of space as far as I know since, as I understand it, Go is sold as low-ish level language competing with C++ and C for system development. For something that would replace those languages, unexplained, systematic leaks would be bad (I mean, my C++ programs might leak but it's reasonably easy to discover why. That matters).

agentS · on April 6, 2012

From experience, development in Go can definitely be classified as rapid. The type system stays out of your way, and compile times are insignificant. Try compiling on play.golang.org, or tour.golang.org to see what I mean.

About this issue: yeah, its unfortunate, but its a property of the class of garbage collector that Go uses atm. My servers are all 64-bit, so doesn't really affect me. But I do feel bad for those who are trying to run Go on ARM, or on 32-bit servers.

mkup · on April 6, 2012

Probably Golang developers/implementers should change their compiler to insert "magic" constant before each 32-bit pointer in the GC memory, so all such pointers could be quickly discovered during full linear scan of the heap during GC, regardless of target address.

This "magic" constant should be non-valid value in 32-bit single precision float point format, far away from usual mmap()'ped address ranges, and far away from small integers.

Maybe this "magic" value should be randomized to prevent DoS attacks on Go runtime library.

rand_r · on April 6, 2012

Would that gc scheme potentially cause the following (pseudo) code to break?

  x = malloc(1); // allocates block 'a' in memory
  int i = (int) x;
  x = 0;
  i = i - 1;
  // gc runs here and frees 'a'
  *( (int*)(i + 1) ) = 123; // failure

pcwalton · on April 6, 2012

You'd have to insert a call to "malloc" to get the GC to run, but yes, it can break in that situation. (The classic example is an XOR-packed linked list being corrupted by the Boehm GC.)

But note that isn't really fair to conservative GC, as no GC, precise or not, can cope with hidden pointers like that. So your example isn't really a strike against Go.

ianlancetaylor · on April 6, 2012

When Go gets a precise collector, simple implementations of this will work. Doing this kind of thing in Go requires importing the "unsafe" package, and any memory allocations done by code importing "unsafe" could be marked as possibly a pointer.

However, it would probably be possible to write code involving two packages, one of which does not import "unsafe", to lead to dangling pointers and eventual crashes. That is why you should be careful about code that imports "unsafe".

pcwalton · on April 6, 2012

Actually, the code sample in the grandparent comment obfuscates the memory address by subtracting 1. Even a conservative GC will be confused in this case...

ianlancetaylor · on April 6, 2012

Ah, yes, missed that. Nothing a GC can do about code like that. At least it remains true that this can only happen in Go if you explicitly import "unsafe".

smanek · on April 6, 2012

I don't think any GC scheme could work with that. Languages that allow pointer arithmetic like that basically can't be GC'd.

The problem with an imprecise GC is that having an integer that looks like a pointer, could prevent an object from being freed.

pcwalton · on April 6, 2012

Almost — C# has a cool feature whereby you can do pointer arithmetic if you pin the objects in question first, so the GC won't collect or move them. (The language statically enforces this by forbidding you from taking the address of a value until you pin it.)

derleth · on April 6, 2012

That's cool; I think another way of doing it is to allow programs to request a sandbox where the GC doesn't go so they can do all of the pointer-arithmeticking they want as long as all their pointers fall within the sandbox by the time they're written through or dereferenced.

singular · on April 6, 2012

In Go, you'd have to be using the unsafe package and some gnarliness to cast between the allocated pointer and int, so you just wouldn't be able to do this in vanilla Go.

eblume · on April 6, 2012

That's really interesting - is there a place where I can read more about these early Rails performance issues?

smanek · on April 6, 2012

There was some back-and-forth between Zed Shaw and DHH that I remember being interested by at the time. Zed deleted all his posts - but here are some places to get started:

- Shaw's opening volley: http://web.archive.org/web/20080103072111/http://www.zedshaw... [lots of stupid personal flames - but his technical points are consistent with what I remember from the time]

- DHH's response: http://david.heinemeierhansson.com/posts/31-myth-2-rails-is-...

- I can't find a copy of shaw's actual response, but here's the relevant HN thread: http://news.ycombinator.com/item?id=364659

The salient point (quoting Zed):

"""

Now, DHH tells me that he’s got 400 restarts a mother fucking day. That’s 1 restart about ever 4 minutes bitches. These restarts went away after I exposed bugs in the GC and Threads which Mentalguy fixed with fastthread (like a Ninja, Mentalguy is awesome).

If anyone had known Rails was that unstable they would have laughed in his face. Think about it further, this means that the creator of Rails in his flagship products could not keep them running for longer than 4 minutes on average.

Repeat that to yourself. “He couldn’t keep his own servers running for longer than 4 minutes on average.”

"""

I've never been much of Ruby/Rails guy - but my understanding is that everyone had to restart Rails a few times an hour cause the memory leaks were so bad in those days.

flomo · on April 6, 2012

I certainly wasn't anywhere in the loop, but at least on my site, FastCGI didn't work, and Mongrel did.

Zed's rant/flame goes into so much personal detail that he doesn't really articulate the point. Rails was receiving this enormous amount of hype, but there wasn't even a working application server yet.

sanderjd · on April 6, 2012

I started using rails after all of that, but regardless of how it was back then, I think it's worth pointing out for people who aren't aware - there are really good app servers for rails now.

_3u10 · on April 6, 2012

No one cares how the sausage is made. They care that they have sausage. I write code to make money, not for uptime competitions. If something that makes money needs to be restarted every 4 minutes and customers are willing to pay for it why should I give a shit?

flomo · on April 6, 2012

FCGI problems were very customer-visible. Prior to Zed coming along, the issue wasn't being discussed much anywhere, making it unclear if it was your code, ruby issue, framework issue, or appserver. Now it is your problem to fix!

sliverstorm · on April 6, 2012

Sometimes I wonder if the incredible fall in DRAM prices was really a good thing... ;)

krakensden · on April 6, 2012

The heap AND the data section. Although apparently, gcc-go doesn't do that for the data section.

jedbrown · on April 6, 2012

That would seem to also present a DOS vector (even on 64 bit) if a user can get the program to store data (of any type, e.g. char or floating point) that happen to be binary-equivalent to pointers to large allocations.

agentS · on April 6, 2012

This is realistically not an issue. First, on a 64 bit machine, the range of actually mapped addresses is small relative to all the possible values that can fit into 64 bits. Second, from an attacker's perspective, the values corresponding to mapped memory are extremely difficult to predict, and the values binary-equivalent to large allocations are impossible to predict, even with access to the source.

If you think you can still do it, all of *.golang.org and golang.org are running Go on Appengine, with the source code being freely available. This is your opportunity to get a back door into Google's servers.

jedbrown · on April 6, 2012

If you make a huge allocation (many pages), isn't the Go runtime very likely to call malloc()? For large allocations, malloc() is going to get you a bunch of fresh pages and you will generally get the address of the start of a page. The offset of the pointer within the page is then likely to be deterministic, so you probably only need one unit of pointer-equivalent data per page. If you have enabled huge pages (e.g. 2MB, not uncommon), then you have already soaked up 21 bits of the 48 bits of address space that are actually used by x86-64 implementations, leaving only 27 bits for a collision. The stack grows down from 2^46 and typical heap values on x86-64 are still well within 32 bits. Finally, a collision need not be frequent to be a serious DOS concern.

4ad · on April 6, 2012

The Go runtime does not call malloc(3) for heap, it reserves address space at known high locations (over 2^32) with mmap(2) using the MAP_FIXED flag, and it does so in 16GB increments (or is it just one 16GB allocation? can't remember).

I won't comment on the DOS concern until I've investigated further.

ianlancetaylor · on April 6, 2012

You need more than that. You need a server that regularly allocates large amounts of memory and then leaves them unreferenced so that the garbage collector can collect them. Then you also need the program to store data that you control, and to also keep references to that data--after all, if that data is collected, then the faux-pointers no longer pin the other allocations. Overall this does not sound like a common allocation pattern for servers.

_delirium · on April 6, 2012

This is the relevant bug: http://code.google.com/p/go/issues/detail?id=909

As the discussion there and in the thread says, the root problem is that Go uses a conservative garbage collector, and on 32-bit a lot more values look like pointers than on 64-bit, so many more things don't get freed in long-running processes. Seems not to be easy to fix.

papaf · on April 6, 2012

Can anyone knowledgeable about compilers/GCs say why GO went with a conservative garbage collector? From any GO source code its trivial to pick out pointers from values. Is this information hard to preserve at runtime?

bhurt · on April 6, 2012

Probably ease of implementation. You can just grab the Boehm conservative GC, slap it in, and bang- you've got GC. You can even add GC this way to C/C++ programs.

Unfortunately, Boehm has draw backs- because it's getting no help from the compiler, it can't tell integers from pointers. So it has to treat everything that looks like it might be a pointer as a pointer, even if it's an integer (or floating point number). Which means that it's possible for garbage to not be collected, because there is an integer that happens to have the same value as the address of the garbage object. And, of course, once you can't collect that object, you can't collect all the objects it refers to (including false pointers), and so on.

The odds of this happening are a function of what percentage of the virtual address space is in use- once some critical threshold is reached, the amount of garbage that can't be collected due to false pointers just explodes. On 32-bit platforms, I've seen this happen with heap sizes of only a few hundred megabytes. And the advice to work around this is exactly what the responder said- use less memory, don't use large ints (which are more likely to be mistaken for pointers), etc. Also, the problem goes away (for the time being) on 64 bits, because the percentage of memory used drops. A terabyte of memory on a 64-bit system is the same fraction of the total address space as a kilobyte of memory is on a 32-bit system.

ianlancetaylor · on April 6, 2012

A conservative GC is easier to implement. No other reason.

riffraff · on April 6, 2012

I recall reading something along the lines of it being hard to use a precise GC due to the the "unsafe" package, but I am not knowledgeable at all.

_delirium · on April 6, 2012

Here's an early LtU thread where someone predicted it'd probably have to use a conservative GC due to some of the addressing features: http://lambda-the-ultimate.org/node/3676#comment-52560

PPGualtieri · on April 6, 2012

Isn't this issue also going to be a problem with Linux upcoming X32 ABI?

cpeterso · on April 6, 2012

Yes, because x32 ABI's pointers are still 32 bits, so pointer values are in a small enough range that they can fool the GC into thinking they are legitimate int values.

r00tbeer · on April 6, 2012

I know early JVM garabage collectors were also "conservative", but I don't recall JVMs running into these sorts of problems. Maybe folks are just using more of the 4G address space these days than back in the day of conservative GC JVMs?

akeefer · on April 6, 2012

Within the JVM, object references are strictly separate from integers. Object references are stored within a structure they call an OOP, short for "ordinary object pointers", and the gc works by inspecting the contents of oops and potentially changing them if objects are moved around in memory (as happens if the gc is moving things between generations, or doing heap compaction). If you're curious about it, you can get some information about OOPs on the page discussing the CompressedOops flag implementation: https://wikis.oracle.com/display/HotSpotInternals/Compressed...

kevinpet · on April 6, 2012

I've never heard of a JVM that uses this kind of GC. As far as I know, every Sun JVM has used explicit "this is a pointer" to identify things for GC.

r00tbeer · on April 6, 2012

Ah, you're right. Its only the thread stacks that were conservatively scanned, not all of memory (see "Mostly Accurate Stack Scanning" for some indirect evidence of such). Its not clear to me if modern JVMs are still conservatively scanning stacks or not. But clearly its much less of a problem than conservatively scanning all of memory.

sounds · on April 6, 2012

Publicizing the bug may speed the fix along. Here's the real test: is hitting the front page of HN going to make a difference?

_delirium · on April 6, 2012

I don't think this one is lacking interest, but an obvious fix. It looks like Russ Cox did commit a fix for some portion of the cases. But a better fix would either need to significantly modify the way things are laid out to reduce the odds of false positives, or else move to a precise GC.

(Precise GCs are possible in C-like languages, but trickier to implement. Here's a recent paper on one: http://www.cs.utah.edu/~regehr/papers/ismm15-rafkind.pdf)

eternalban · on April 6, 2012

Von Neumann architecture falsely lead us to the path with the singular notion of "memory". We need to distinguish at the hardware architecture level between "working memory" and "persistent memory".

I doodle all the time when working. Pristine set of diagrams emerges from the chaos. I wish my computer would play along with this regime ..

eternalban · on April 6, 2012

I haven't mentioned downvotes to date, but the down votes on this is puzzling. Would the down voters care to comment as to why the comment is deemed down vote worthy? (I would think those who work/grok on memory managers do get the point.)

kevingadd · on April 6, 2012

I didn't downvote it, but it mostly read like rambling nonsense to me, so that might be why. Maybe write it so it's clear to people who don't work on memory managers all day?

scott_s · on April 6, 2012

I've worked on memory allocators, and I can't make sense of it.

FeepingCreature · on April 7, 2012

I think the idea is to differentiate between mutable and immutable data, ie. "worked on" and "final". I don't know what that would give you with regards to GC though.

dchest · on April 6, 2012

"Resist complaining about being downmodded. It never does any good, and it makes boring reading." http://ycombinator.com/newsguidelines.html

scott_s · on April 6, 2012

Brief HN discussion on that paper: http://news.ycombinator.com/item?id=586858

blinkingled · on April 6, 2012

I suspect the issue exists on 64-bit platforms as well, it's just that it doesn't impact as easily. In theory it is possible/common to have 64-bit machine with less than a ton of physical memory (VPS) and running 64-bit Go program which triggers this bug would result in similar impact as the 32-bit version.

Why yes - Issue #909 essentially confirms this - running 64-bit doesn't fundamentally change anything - it just buys more time for impact because of larger address space and hope of more physical memory. Which is sad on many levels - just mind blowing that the language designers did not think of this upfront! (Oh and Go's built in packages trigger this problem too - per #909 commenting Unicode package makes the program run!)

4ad · on April 6, 2012

Physical memory does not have anything to do with it, the virtual address space is all that matters.

It might be useful to consider that 2^64 is 2^32 * 2^32. This means the problem becomes important on 64 bit only if an application will use 10 orders of magnitude more memory. By considering the historic growth in memory capacity and usage, this will only happen around 2060.

blinkingled · on April 6, 2012

Check out http://code.google.com/p/go/issues/detail?id=909#c32 and the following comment that agrees with it.

This isn't just a address space leak - it is a real memory leak. On 64-bit the GC may not be so easily fooled as on 32-bit but it can still be fooled and that is a fundamental problem that will result in memory leaks - if I have 2GB RAM VPS - it doesn't help to have 2^64 bits of address space (actually it is more like 2^48 (http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_de... ) - if the GC leaks memory sooner or later my process will be killed by the OS.

4ad · on April 6, 2012

Again, this has nothing to do with physical memory. It only has to do with the virtual address space.

Of course if you bump into this, it's a real leak, who said otherwise? And yes, it's possible to artificially generate the collision on 64 bit as it's the same mechanism as with 32 bit. It's about whether it happens frequent enough under normal usage patterns to be a concern. Youtube, and everybody who tried Go in production say it isn't, and that's because of reasons outlined in my first reply to you.

blinkingled · on April 6, 2012

Care to explain why? You keep insisting without explaining. Have you checked #C32 and how it clearly says it is a memory leak on both 32 and 64 bit platforms?

What is your explanation as to why this is not a memory leak and only a address space leak?

[EDIT POST YOUR UPDATE] Ok - so we are on the same page. I wasn't arguing about the likelihood at all - just the fact that it is possible troubled me as a bad GC design. Sure people use lots of crappy software on servers - doesn't mean it's a sound idea :)

4ad · on April 6, 2012

But it is a real leak, it's just an artificially created leak. These might be interesting to investigate for DOS potential, but they don't happen under regular usage because you are searching a needle in a haystack.

[edit after your update]

Each GC strategy has its drawbacks, for example the one used by Go has the least overhead in extra memory usage, and it's also simple to understand and implement. Mono got its precise GC only last year, it survived 8 years with a conservative GC. Go is only two years old.

luriel · on April 6, 2012

It is possible to artificially fool any GC into leaking memory, only question is whatever it happens in practice.

In practice the Go GC never leaks memory on 64bit systems and for most programs never on 32bit systems either.

luriel · on April 6, 2012

gccgo is better on 32bit systems: https://groups.google.com/d/msg/golang-nuts/qxlxu5RZAl0/NS71...

Note that work on the garbage collector is ongoing post-Go1, a faster parallel GC is in process being merged: http://codereview.appspot.com/5279048/

But ultimately 32bit systems don't seem to be a big issue for Google or anyone else using Go in production (and there are quite a few big organizations using it: http://go-lang.cat-v.org/organizations-using-go ).

Most people moved to 64bits a while ago so the amount of attention the 32bit port gets will never be the same.

bad_user · on April 6, 2012

You know, that kind of sucks ... there are many machines around that are still on 32 bits.

I keep my own Ubuntu laptop (dual-boots to Windows) on 32-bit builds (both Linux and Windows), simply because I have less problems that way (mostly with hardware drivers, but also with software). I keep the Amazon EC2 instances I maintain on 32-bit images, simply because they are cheaper. My Android phone is also 32-bit and will be so for a long time. My other phone, an older iPhone 3GS, is also 32-bit. My servers, prior to Amazon EC2, built with ARM processors, were also 32-bit.

And when I was playing with MongoDB, do you know what I did when I discovered that the 32-bit build was basically unusable? I ditched it and never looked back.

4ad · on April 6, 2012

I've been exclusively using 64 bit computers and operating systems, both Windows and Linux, for about 7 years now, never had the reported driver problems, well, never had any issue, really.

For me at least, 32 bit is only important for ARM. On the other hand the issue is very much blown out of proportion, most people haven't seen it, even if they run 32 bit servers. Most usual servers written in Go, like web servers, use very little memory. I process 4k requests per second using 7MB of resident memory. There are many memory intensive applications, but you usually don't run those on 32 bit.

sixbrx · on April 6, 2012

I do wish the desktop world would just move wholesale to 64 bit, but do note that Ubuntu still marks the 32 bit install the "recommended" one, so I would guess there must be some sort of problems that are lingering in the 64 bit versions. Maybe just flash support or something like that?

el_muchacho · on April 8, 2012

I don't think there are problems with 64 bit versions, but many computers are still 32 bits. The PC I use right now is 32 bits and I don't feel the urge to buy a new one. So I suppose 32 bit machines will stay around for another decade or so.

stiff · on April 6, 2012

The problem itself seems quite serious and I would seriously reconsider using Go if I was interested in it in the first place, but after seeing the way this is treated by the Go "community", I am pretty sure I will never ever even think about using Go for anything.

dchest · on April 6, 2012

the way this is treated by the Go "community"

What do you mean?

stiff · on April 6, 2012

Well, since the people behind Go did not up front say anything about the language/compiler being specifically targeted at 64-bit platforms, I would expect someone more mature from the Go team to step up and say something like "We are sorry, we didn't foresee the consequences of some design decisions and hence screwed it up." and either "We will fix it ASAP" or "We cannot fix it because XYZ". They might not take money for their work directly from the users, but there is still some moral obligation if you create something, release it to the world, praise its virtues and persuade people to use it. As it can be seen in the thread, lots of people already invested lots of time into building things with Go and now they're in serious trouble. Instead, many people in this thread try to somehow downplay the problem, advocate changing hardware (that's something quite new in the programming language world) or following some pretty absurd guidelines. This might or might not be representative of the whole community around Go, but it surely leaves a bad taste, hence the slight irony.

dchest · on April 6, 2012

There is an "official" response: the post references a bug # where Russ Cox said "the rest of the issue will have to wait until after Go 1".

Note that Go 1 is a "language freeze", not the implementation freeze.

Now that it's known there's a bug that will be worked on later, people proposed possible workarounds, the easiest of which is to switch to 64-bit platform. I agree that downplaying the issue is wrong, but only one person did that.

There is occasional rudeness, mostly caused by strong opinions, but overall I think Go community is pretty good.

stiff · on April 6, 2012

The very next sentence in this response from Russ Cox is: "Or maybe all the 32-bit systems will be replaced by 64-bit ones.". So, they do not admit this is a serious problem that needs attention, it is not clear whether they will fix it, when they are going to fix and whether anyone cares about fixing it at all.

I couldn't find the adequate words to express this so far, but the reason I at all find this situation worth commenting on is that is reassembles to me a very common pattern of denial I observe among many professionals in various professions in cases where a problem appears that is very hard to tackle or even to analyse in the first place. Often a doctor who has troubles identifying a disease will tell you it's probably just something in your head, a programmer who has trouble reproducing a difficult bug will tell you it's you who probably did something wrong at some time, even a guy who I called to repair my washing machine that was stopping the washing at random told me to "keep it under observation" when he wasn't able to tell what's wrong. Many people simply do not want to put in the work needed to solve an unexpected and difficult problem, and thus, perhaps even subconsciously, try to handle it by pretending it doesn't exist. If you want to be a real professional and a leader in what you do, you can not behave like that, you can not repress a problem when someone reports one to you, you have to have the patience to examine the issue, the experience necessary to know when you can be certain that you have the complete picture of it, then sometimes the courage to admit there really is a problem and then finally you have to solve it, or people will not respect you.

agentS · on April 7, 2012

I don't think Russ denied it was a problem, so I don't think this is a "pattern of denial". I think you're incorrectly interpreting that statement as his "fix" for this bug.

The fix is well-known (a precise GC), just implementing it hasn't happened yet.

ezyang · on April 6, 2012

Funnily enough, I understood why they went the conservative GC route. It has to do with the overall Go philosophy, which is that they really do not want features to affect data representation. This has meant no boxing (and no easy polymorphism), and a decision like that has logical consequences for GC too.

Here's to hoping they find a cool solution! It's been a problem for GC's since forever, and if they find a general way of handling the problem I'm sure it will be picked up by many other runtimes.

pcwalton · on April 6, 2012

In Rust we're working on a solution for this problem. Essentially, the plan is to have RTTI on garbage-collected data (this is already completed and is used in the cycle collector) and precise stack information for every root on the stack. The latter is in a fork of LLVM: https://github.com/pcwalton/llvm

But aside from the shameless plug, C# and D (I believe), have had precise garbage collectors for quite a while now, and they have similar memory management to Go. It's well-known how to implement it (but that doesn't make it any less hard — I can totally understand why Google opted for conservative GC in the first version).

ezyang · on April 6, 2012

I know how to do precise GC if you allow me to add a (pointer-size) header to all data living in the heap; i.e. to maintain the RTTI. I don't know how you do that if you're not allowed a header. Do C# and D have headers?

pcwalton · on April 6, 2012

I'm sure they do. There are a few things to note here:

(1) In order for malloc to work, you need a header anyway (at least, unless your allocation fits in one of the fixed-size bins).

(2) You can get around the header to some extent by sorting the fields of your objects so that pointers come first, and then all you need to do is to store the number of pointers (or a sentinel value). This is what Haskell does. Of course, this prevents low-level control over data representation.

(3) You can tag (or NaN box) all your values. This is what most MLs do, as well as JS, many Lisps, etc.

(4) You can use a map on the side from pointer to type info to avoid a header. This is what Rust in its early days did. It's worse than a header for memory consumption though, so it doesn't really buy much.

ezyang · on April 6, 2012

So, the thing that always gets Haskell folks when dealing with an implementation (2) is that you can't get uniform data representation when dealing with things like arrays. It means you have to unbox things. Arguably, the situation is not much better in malloc land; if you malloc a large multiple of your object size, you're explicitly saying, "I want this to be unboxed", but by this point you've wandered into generics land.

(3) is annoying. Who likes 31-bit integers? Not I!

pcwalton · on April 6, 2012

Yeah, I hate 31-bit integers too. It's not the only tagging scheme though; I prefer NaN boxing (used in SpiderMonkey among others). NaN boxing allows unboxed doubles and 32-bit ints, at the cost of increased register pressure and memory usage on 32-bit systems.

zvrba · on April 6, 2012

Interesting piece of information below in the thread: in reply to "Go being advertised as a systems PL", David Symonds replies with: "It's not. It used to be, but it's not any more."

binarycrusader · on April 6, 2012

Except, it still is advertised as one (or at least implied to be):

  "Go is a general-purpose language designed with systems programming in mind."
  http://golang.org/ref/spec#Introduction

You could nit-pick and say that it says "in mind", but the point still stands that they're trying to imply that it's suitable for systems programming. It is not; certainly not at the level of an operating system kernel, etc.

el_muchacho · on April 6, 2012

What is the definition of a system PL ?

zvrba · on April 6, 2012

The ability to access the raw memory underlying any object.

talentdeficit · on April 6, 2012

deterministic memory usage is a good start

ungerik · on April 7, 2012

After switching to 64 bit, we haven't had a single crash. So I would describe the 64 bit version as production ready. But the problems on 32 bit systems should have been documented in the release notes.

Beside this hiccup (and a week without sleep with the website on life support), Go has been a real joy to work with.

ww520 · on April 6, 2012

Are most Androids 32-bit? Doesn't this issue impact the GO Android SDK adoption?

masklinn · on April 6, 2012

> Are most Androids 32-bit?

By virtue of there being no AArch64 implementation announced so far, all androids are 32-bit because there is no 64b ARM core.

> Doesn't this issue impact the GO Android SDK adoption?

Any Go application should be able to run into this issue if it has a similar usage pattern.

dchest · on April 6, 2012

...if there was the Go Android SDK.

0xABADC0DA · on April 6, 2012

It's amazing how nonchalant they are about this 'oh sure we'll fix it in a year or so' when the garbage collector is basically all there needs to be in a Google Go runtime

Some comments were asking why early Java's GC wasn't this bad even though it was conservative. The reason is that Java can't take references to fields of an object, so the data mistaken as a pointer has to actually point to an object header. In Google Go you can take a reference to a field, locking the whole object, so the faux pointer can point to any field as well (or in this probably any location in the object). Not exactly the wisest choice in semantics, as they are seeing now that it complicates the GC.

4ad · on April 6, 2012

It's not Google Go, it's simply Go, there's not a single Google reference on the web page and that's intentional. Looking at your previous posts here and on Reddit I see you are a notorious anti-Google troll that adds "Google" to "Go" so that your negative posts will be associated with both Google and Go.

It's absurd to call it Google Go when there's a first class GNU implementation that has been in development from the beginning, there's at least a closed source implementation and another distinct BSD license implementation in its infancy.

It makes even less sense than saying Apple LLVM, Juniper FreeBSD, or AT&T C.

burgerbrain · on April 6, 2012

The number of times people have needed to tell this guy this is staggering.

rand_r · on April 6, 2012

Any recommendation on how we should google for Go related pages?

4ad · on April 6, 2012

Golang or Go programming works fine. If searching for Google Go produces meaningful results, by all means, use it. I was merely complaining about referring to the project as such.

There's also this: http://go-lang.cat-v.org/go-search

markatto · on April 6, 2012

'golang' seems to be what people are using.

aflott · on April 6, 2012

go lang <thing-I'm-after> works quite well for me

0xABADC0DA · on April 6, 2012

I suppose you fail to see the irony in complaining about me writing "Google Go" because "Go" is an unsearchable name and then answering how to search for it with a custom search engine, needed because the name is unsearchable. Not to mention that ctrl-f doesn't support regular expressions so no ctrl-f "\<Go\>".

I only use that construction with ambiguous or unsearchable names (for instance I would also say Google Maps). I don't think I am along in using this form and I feel it is appropriate to refer to Google Go this way.

Frankly I fail to understand why this bothers you so much, as it clearly does. I would expect Google Go advocates to be delighted to have Google's good reputation for engineering imparted onto this language.

burgerbrain · on April 6, 2012

The "clearing up ambiguity" thing is bullshit. Nobody is going to fail to use context clues to figure out what is meant by "Go" in this discussion. Where such disambiguation is actually needed, "Go (language)" or simply "go-lang"/"golang" is preferred. You know, since they are not incorrect.

"Frankly I fail to understand why this bothers you so much, as it clearly does."

It is annoying because it is incorrect, because you have been corrected multiple times, and because what you are attempting to do is transparent as hell. I, and I imagine many others, do not have a strong appreciation for botched attempts at subtlety.

0xABADC0DA · on April 6, 2012

For instance:

http://www.cnn.com/2011/TECH/mobile/05/12/google.maps.androi...

Or:

http://en.wikipedia.org/wiki/Google_Maps

> It is annoying because it is incorrect

Maybe some English professor, technical writer, or journalist reading this can chime in and explain how so. It seems to be the standard practice and I intend to use the best grammar and construction that I am capable of, as bad as that may be. This isn't Twitter.

4ad · on April 6, 2012

It's incorrect for the same reason Apple LLVM or Juniper FreeBSD is incorrect, and it's bullshit because in a thread about Go, people know what Go refers to.

FWIW, the Windows, Plan 9, OpenBSD and NetBSD ports were done entirely by the community.

AT&T C? That's even less specific than Go. What about gccgo? Google/GNU Go? What about the commercial implementations?

agentS · on April 7, 2012

An answer, by way of analogy.

When referring to a friend named "Edward" in a text to another friend planning Edward's surprise birthday party, you can probably refer to him as "Ed" or "Edward". You definitely don't need to refer to him as "Edward (Parent's SSN:12345...)" or "Edward (Philip's Son)". While these latter forms are less ambiguous (and more searchable, to boot), the context is more than sufficient to disambiguate.

burgerbrain · on April 6, 2012

> http://en.wikipedia.org/wiki/Google_Maps

Yes, and then we see:

http://en.wikipedia.org/wiki/Go_(programming_language)

Give me a fucking break. You are clearly not worth engaging in discussion; I'm done.

canop_fr · on April 6, 2012

I think the reasons they don't care more are

1) it mainly affects long time running applications on 32 bits machines. Fact is most production servers are (or should be) on 64 bits

2) most people (me for example) never observed such a phenomenon (and all my dev computers are 32 bits)

So, that's an important problem but far from a game stopper.

sunyc · on April 6, 2012

to go back to 32 bit hardware, you will be running very old machines that you can buy off ebay for <200$ and the power cost offset the new hardware cost in several months.

wtallis · on April 6, 2012

Running a 32-bit operating system on 64-bit hardware is still pretty common. And this problem afflicts ARM as well.

jergason · on April 6, 2012

This is really disappointing. I was hoping to do some stuff with Go on ARM, but it looks like they are not too concerned about this bug.

luser001 · on April 6, 2012

I would run a 32-bit OS (even if the hardware supported 64-bit) if I wanted to squeeze every last bit of usable memory out of a VPS.

derleth · on April 6, 2012

This sounds like precisely the use-case for x32: Running userland programs that only use 32-bit pointers on x86-64 bit hardware in 64-bit mode; you only get to address 4GB of RAM, but your pointers fit two to a register and you have all the x86-64 registers, opcodes, and special hardware.

Sadly, it isn't here yet.

https://sites.google.com/site/x32abi/

http://en.wikipedia.org/wiki/X32_ABI

(To forestall the obvious objection: No, there's no way Torvalds will allow this to re-introduce the 2038 Problem into a new ABI. None.)

Aissen · on April 6, 2012

x32 has been accepted(and merged) for Linux 3.4. Kernel support is here, now it's userspace's turn.

riobard · on April 6, 2012

This is what I thought. Go is primarily targeted at server-side applications, which are almost exclusively running on 64bit hardware/OS. Memory is so cheap now that it's silly not to have more than 16GB plugged in the chasis.

Then I realized there are many small VPS/EC2 instances with <1GB memory (I have two right now running on 64bit) and there are people who try to squeeze every last bit out of them by going 32bit...

Avshalom · on April 6, 2012

I addition to arm and 32bit OSs, you're going to run into a lot of 32 bit hardware if you try to distribute a program written in Go to the masses (or at all really). I'm using some now.

masklinn · on April 6, 2012

Yeah, who's ever heard of those ARM things?