Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I still just don't understand why they insist on building their own toolchain. It just doesn't make sense to me.

When you set out to build a programming language, what is your objective? To create a sweet new optimizer? To create a sweet new assembler? A sweet new intermediate representation? AST? Of course not. You set out to change the way programmers tell computers what to do.

So why do this insist on duplicating: (1) An intermediate representation. (2) An optimizer. (3) An assembler. (4) A linker.

And they didn't innovate in any of those areas. All those problems were solved with LLVM (and to some more difficult to interact with extent GCC). So why solve them again?

It's like saying you want to build a new car to get from SF to LA and starting by building your own roads. Why would you not focus on what you bring to the table: A cool new [compiler] front-end language. Leave turning that into bits to someone who brings innovation to that space.

This is more of a genuine question.



> I still just don't understand why they insist on building their own toolchain. It just doesn't make sense to me.

To quote rsc from https://news.ycombinator.com/item?id=8817990:

"It's a small toolchain that we can keep in our heads and make arbitrary changes to, quickly and easily. Honestly, if we'd built on GCC or LLVM, we'd be moving so slowly I'd probably have left the project years ago."

"For example, no standard ABIs and toolchains supported segmented stacks; we had to build that, so it was going to be incompatible from day one. If step one had been "learn the GCC or LLVM toolchains well enough to add segmented stacks", I'm not sure we'd have gotten to step two."


Which is of course no answer at all.

Their own explanation for wasting hundreds of thousands of man-hours on a "quirky and flawed" separate compiler, linker, assembler, runtime, and tools is because they absolutely needed an implementation detail that is completely invisible to programs and which they are now replacing because it wasn't a good idea in the first place (segmented stacks). And it's worth writing out a 1000 word rationalization that doesn't bother even mention the reason that implementation was necessary in the first place, to better run on 32-bit machines. In 2010.

Or they say that they had to reinvent the entire wheel, axle, cart, and horse so that five years later they could start working on a decent garbage collector. Never mind that five years later other people did the 'too hard and too slow' work on LLVM that a decent garbage collector needs. What foresight, that.

That's not sense, that's people rationalizing away wasting years of their time doing something foolish and unnecessary.


The replacement to segmented stacks is copying stacks, which as far as my knowledge of LLVM takes me, would be very difficult to add. You need a stack map of pointers to successfully move pointed-to objects on the stack from the old region to the new.

There is a great deal of work going on in LLVM on this issue for precise GC of other languages, and (from the outside) it looks like more hours have been spent on it than on the entire Go toolchain. As Go developers don't have the resources or expertise to make such wide-ranging changes to LLVM, it would have blocked Go development.

GCC is similar. Those working on gccgo are trying to work out how to add precise GC and copying stacks. It is much more complex than it was on the gc toolchain.

There is great value in having a simple toolchain that is completely understood by the developers working on it. In fact, that very idea, that code you depend on should be readable and widely understandable, is one of the goals of Go. Applying the goal to the toolchain is a case of eating our own ideological dogfood.


> The replacement to segmented stacks is copying stacks ...

Which again is not an answer. Why are segmented stacks necessary? Why are copying stacks necessary?

This reasoning, which is their best apparently, amounts to saying that they had to implement their own compiler, linker, assembler, and runtime because they decided they had to implement their own compiler, linker, assembler, and runtime.


> Which again is not an answer. Why are segmented stacks necessary? Why are copying stacks necessary?

Because Go wants to provide very lightweight goroutines for highly concurrent and scalable services.


Lightweight goroutines depend on small stacks. General API design in Go depends on lightweight goroutines.

In particular, Go style is to never write asynchronous APIs. Always write synchronous blocking code, and when you need to work concurrently, create a goroutine.

You cannot do this in C with pthread, because OS threads are too heavyweight. So you end up in callback-based APIs that are harder to use and harder to debug (no useful stack traces).

This small feature has a surprisingly wide-ranging effect on the use of the language. It is a very big deal.

Go is very much about reinventing these low-level things.


Threads and stacks are orthogonal. You can have coroutines with contiguous stacks, and threads with non-contiguous stacks.

Furthermore it's extremely easy to use non-contiguous stacks in C just by knowing the stack amount used by functions, which the compiler already knows.

This is a totally absurd reason to reimplement an entire toolchain.


I'm not sure what about my comment is worth downvoting, but to try one more time:

If you came to me tomorrow and said "I want to build a language just like C but with non-contiguous stacks" I agree, I would use LLVM or GCC. But that's not what happened.

The history is three engineers decided to see if they could do better than C++ for what they did every day. That meant trying lots of things. One of the many was goroutines, but they needed a flexible platform on which to try lots of ideas that didn't make the final cut.

It just so happens, two of them had worked on a toolchain before. Ken's from Plan 9. (Which long predates the existence of LLVM.) And as he knew his compiler well, it was very easy to modify it to try these experiments.

In the end the language stabilized with several unusual features, several of which would be difficult to add to other compiler toolchains they were not familiar with. Is that the point they should switch to using LLVM?

Building on a toolchain you know that lets you experiment makes a lot of sense. Knowing a toolchain means you get to work quickly.

The end result still has useful features that LLVM does not. For example, running ./all.bash does three complete builds and runs all the tests. It takes about 60 seconds on my desktop. Last time I tried LLVM, it took minutes. Go programmers love fast compilers.


> If you came to me tomorrow and said "I want to build a language just like C but with non-contiguous stacks" I agree, I would use LLVM or GCC. But that's not what happened.

Except that is exactly what happened. Russ says: "segmented stacks; we had to build that, so it was going to be incompatible from day one."

That's the rationalization though. It wasn't about features that you can all but do in plain ANSI C being 'too hard'. We all know what really happened is that they were comfortable with their Plan 9 toolchain and made a demo using it... which is fine. Then they continued to develop their demo for 5 years instead of throwing it out and doing it right, and now they are stuck having to make excuses for why their compiler and assembler and linker and runtime and tools are sub-par.


I don't think the tools are sub-par. My programs are compile and run quickly on a variety of platforms. My toolchain builds quickly too.

And now it is written in Go, the preferred language of the compiler engineers.


Most of the toolchain already existed. When Ken Thompson started writing the Go compiler he based it on his Plan 9 C compiler implementation.


LLVM is a C++ monstrosity that takes hours to compile. Other programming language projects have to maintain a "temporary" fork of LLVM to achieve their goals: https://github.com/rust-lang/llvm/tree/master


Rust doesn't do this because of the length of compile, it's because we occasionally patch LLVM, and then submit the patches upstream.


What's your counter-proposal?

If you're building a new language, you need a new AST. You can't represent Go source code in a C++ AST.

There are alternate compilers for Go, in the form of gccgo and llgo. But those are both very slow to build (compared to the Go tree that takes ~30s to build the compiler, linker, assembler and standard library). And the "gc" Go compiler runs a lot faster than gccgo (though it doesn't produce code that's as good), and compilation speed is a big part of Go's value proposition.


> There are alternate compilers for Go, in the form of gccgo and llgo. But those are both very slow to build (compared to the Go tree that takes ~30s to build the compiler, linker, assembler and standard library).

For any non-Gophers reading this: I write Go as my primary language, and have for the past two and a half years. I just timed the respective compilation speeds on a handful of my larger projects using both gc and gccgo (and tested on a separate computer as well just for kicks).

gccgo was marginally slower, though not enough to be appreciable. In the case of two projects, gccgo was actually slightly faster. The Go compiler/linker/assembler/stdlib are probably larger and more complex than the most complex project on my local machine at the moment, but I think my projects are a reasonable barometer of what a typical Go programmer might expect to work with (as opposed to someone working on the Go language itself).

The more pressing issue as far as I'm concerned is that gccgo is on a different release schedule than gc (because it ships with the rest of the gcc collection). That's not to say it's not worth optimizing either compiler further when it comes to compilation speed, but it's important for people considering the two compilers to understand the sense of scale we're talking about - literally less than a second for most of my projects. Literally, the time it takes you to type 'go build' is probably more significant.


Thanks. That's good data. I haven't seen any measurements for a few years. It's good to see that gccgo has caught up. Which version of gc did you test?

Yes, the release schedule is another important reason for building our own toolchain. Being in control of one's destiny is often underrated.


On this machine, gc 1.4.1 vs. gcc 4.9.2 (with no extra flags). My other machine has gc's tip, but runs Wheezy so it's probably an older version of gcc... it wasn't much different either way. I would barely have noticed it if I hadn't been timing it.


I would never set out to build a language I wanted people to use and not build it as a front-end for LLVM. I don't want to write an optimizer or assembler.

I don't doubt for one second that llgo takes a longer time to compile. And in exchange for slower compile times you benefit from many PHDs worth of optimizations in LLVM. And every single target architecture they support.

It's easy to build something faster when it does less. I'll admit there's no blanket right answer to that tradeoff.


Yes, that's why there's both gc and gccgo (llgo came later). Apart from the rigour of having two independent compilers, they are seeking different tradeoffs. gc is very interested in running fast, and gccgo benefits from decades of work that have been put into gcc's various optimisations.

Does that answer your original statement that you didn't understand why we build our own toolchain?


Well I still don't understand. Russ says it was for segmented stacks, but doesn't explain why those were necessary. You say it was for compile speed, yet gcc and llvm can crank out millions of lines a code a second at similar optimization levels as the Go compiler. Neither of these are convincing explanations.


Then you will be stuck with the C view of world of what a linker is supposed to do.

Just look at Modula-2 and Object Pascal toolchains as examples of compile speeds and incremental compilation features that could run circles around contemporanean C compilers.

Or the lack of proper module system, which requires linker help.


I was impressed by the toolchain when I first peaked at Go because it was dead simple to get up and running on any platform, especially Windows.

For gcc you have to deal with MinGW. Isn't LLVM just now getting to the point where it can build native Windows applications?

This is one area where I hope Rust makes progress. MinGW/Msys2 is just kind of gross stuff to deal with.


Care to explain what's gross about MSYS2/MinGW-w64? I'm genuinely interested in making it less gross.


Want to make it less gross? Make it completely go away.

Installation of Go or Python is just like any other Windows install. You download an installer.exe or .msi, run it, and you're done. Things compile or run immediately, and you don't have to start using a "special" terminal just for it to work.

My experience with MinGW is very different. Especially for dependent languages. "Step 1: Install MinGW" what does that even MEAN?:

"Ok, I ran this installer, and it brought up the MinGW Installation Manager. Is it done? What am I supposed to do here? Which one do I choose? How do you even select a package? What even ARE packages? OK, so I select something then go the Package menu and select Mark for Installation. It's not doing anything. Is it done now? Close window. Nope that didn't work. Open it back up. Oh, so after marking a package I have to go to the Installation menu and choose Apply Changes. ..."

This actually happened to someone I was trying to help over the phone. Heaven forbid they get lost in All Packages and get confused by the dozens of packages each with half a dozen versions and each with three different, non-descriptive "classes".

Installation needs to be braindead simple. During installation it should show a list of extra languages that can be installed, where you can't uncheck 'base' (with an "Advanced Options" button in the corner that opens up the standard installation manager instead). It should set up any environment variables, including PATH, (and including restarting explorer to refresh the env) and it shouldn't require the use of any terminal other than cmd.exe (despite it being terrible).

If you're installing something else entirely that depends on MinGW, their installer should be able to bundle the MinGW installer, and it should install without having to make any choices. It should detect if MinGW is already installed and install packages there instead, still completely automated.

Make it go away.


I was asking about "MSYS2/MinGW-w64".

You seem to be confusing mingw.org (http://www.mingw.org/wiki/getting_started) and MinGW-w64 (http://mingw-w64.sourceforge.net) and your rant seems entirely directed at mingw.org.

The software distribution I was asking the parent poster about is MSYS2 (http://msys2.github.io/ and http://sourceforge.net/projects/msys2/), do please come back with constructive criticism on that project if you are interested enough to investigate further.


You're correct, my complaints were towards mingw.org, not MSYS2. My apologies for not reading carefully enough. I may actually take a look, thanks for directing me.


There was a recent discussion on Reddit where some MSYS2 users chimed in that which may cover some useful ground:

http://www.reddit.com/r/cpp/comments/2v6vlg/decission_which_...


Yes, I'd be happy to, but I'm not sure they are "solvable" issues, because they seem to be more of architectural mismatches.

Part of the issue with most software that uses MinGW is that it is written with posix-y operating system in mind. That is, operating systems that can very efficiently fork processes and quickly deal with many small files. Unfortunately, Windows does neither well. Process creation is slower, and NTFS is a very lock-happy filesystem.

Why do I consider this gross as a user? Things like Git that utilize msys are slow on Windows. As in, I notice the UI hanging. Things like autoconf are terribly slow on Windows due to all of the small processes that are created to detect the environment. Antivirus tools will lock files that are created and generally slow things down due to the nature of lots of quick-running processes creating and deleting small files.

These are just realities of most software written for non-Windows platforms. So whenever I see a program that requires MinGW, I'm always very hesitant to use it. The user experience tends to be terrible. I can still remember an issue trying to compile subversion on Windows using gcc and having it take well over an hour. Turns out with all of the processes being forked and temp files being created, the antivirus program was adding a delay to every command. After completely disabling antivirus it compiled in 15 minutes.

So, in one sense, this ins't a problem with MinGW or msys, but it typical of software that relies on it.

The other issues I have with them is that they don't integrate well with the native tools on Windows. For instance, Pageant is a good, graphical SSH agent on Windows. You have to mess around with environmental variables and plink and junk to get it so you don't have multiple formats of SSH keys on your machine. Trying to deal with SSH through bash and msys is not a user friendly experience. PuTTY is the gold standard of SSH clients on Windows.

Using msys/MinGW is like running X programs on OS X, Windows programs through Wine on Linux, or Java GUIs on any OS. It has enough strange warts and doesn't quite fit the feel of the rest of the OS.

That is where Go was awesome. I downloaded go and there were 3 exes on my machine. I ran "go.exe build source.go" and out popped an exe.


Thanks, sure, POSIX is a round hole and Windows a square peg, but I think that the excellent work done by Cygwin over the last few years has done a great deal to file down the edges of the Windows peg. MSYS2's hacks on top usually function ok.

With Git, I can clone very large projects almost as quickly with MSYS2 as I can ArchLinux. We did begin to port msysGit (the native, non-MSYS executable, yeah, go figure) to MSYS2 and found very little speed improvement so stopped since the msys2 version is much more functional and always up-to-date.

Using Autotools on MSYS2 isn't significantly slower than on GNU/Linux. You can try building any of the many Autotools based projects we provide to see this for yourself. Besides, for software which relies on Autotools for it's build system, there's no choice but to use it (outside of cross compilation).

That NTFS (and the Windows filesystem layer) isn't fast is independent of MSYS2 vs native Windows anyway.

An anti-virus will slow down all Windows tasks to an unusable crawl, just run your scan overnight and take care about what you click on. MSYS2 isn't hit worse than, say Visual Studio. Fundamentally MSYS2 is software distribution who's end product is native Windows applications aimed at the user. The POSIX stuff exists just helps us get there (this is why we don't provide X Windows; if you want that, use Cygwin), so for example using Qt Creator as supplied by MSYS2 should give an experience that's roughly the same as using Qt Creator supplied by the Qt Project (but much easier to maintain).

Apart from for installing and updating packages, you can avoid the MSYS2 console and just run programs in C:\msys64\mingw64\bin.

The security advantages we bring via shared packages (e.g. libraries) are very worthwhile.

> Trying to deal with SSH through bash and msys is not a user friendly experience. PuTTY is the gold standard of SSH clients on Windows.

Since on MSYS2, things are shared, your SSH keys are shared between all applications that use them in ~/.ssh, as you'd expect. I use mklink /D to unify my Windows User folder and my MSYS2 HOME folders (be careful not to use our un-installer if you do this though, if follows the symlink :-(). We do have putty but I haven't checked that it doesn't use %AppData% or worse, the Windows registry to store keys. If it does that's a bug we'll fix. To install putty:

$ pacman -S mingw-w64-x86_64-putty


The Go team does want to innovate on the toolchain. A key factor in the design of Go is the belief that once a language is "good enough", developers are better served by a superior toolchain (and specifically faster compilation) than by a fancier language. They want to own the toolchain so they can optimize it for Go and make their own tradeoffs about speed versus features.


I read somewhere (but I can't think of the keywords to find it now) that they found the greater flexibility in owning their toolchain was worth the cost. For example they changed their data layout for GC purposes and changed the segmented stack approach over the course of their development and had they been tied to LLVM or gcc they'd have spent much of their time fighting against those implementations, or politicing to convince the maintainers to add additional complexity to their systems for an unproven langauge. (My example is weak because I am trying to retell their reasons and my recollection is vague.) I think they still haven't succeeded in bringing gcc up to par with their current approach.


LLVM supports precise GC now via the late safepoint placement infrastructure [1]. This infrastructure should be sufficient to support both the copying stacks and a precise GC.

This is a recent addition and did not exist at the time Go was created, however.

[1]: http://llvm.org/docs/Statepoints.html


Are you thinking of this comment?

https://news.ycombinator.com/item?id=8817990


That's the one, thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: