Hacker News new | past | comments | ask | show | jobs | submit login
Go compiler internals: Adding a new statement to Go (thegreenplace.net)
234 points by soheilpro on July 5, 2019 | hide | past | favorite | 52 comments



The go compiler is surprisingly easy to modify! I made a modification that allows it to issue warnings instead of errors for unused imports, variables, etc [1]. It was a fun little exercise :)

[1] https://github.com/kstenerud/go


This is the commit in question: https://github.com/kstenerud/go/commit/334e03f12c9c94fe6aded.... Cool mod!


Now if only this got merged into mailine ;)


Why? The idea that warnings are errors is pretty important actually.

Now, unused imports are mostly harmless (alright, they are not because of init), but they keep unrelated commits clean without needing ad-hoc linting or IDE functionality.


It's explained in the use case: https://github.com/kstenerud/go#use-case

Use Case

The use case for warnings is the exploratory development or debugging phases, where you really don't care about leaving unused things lying around for the time being (for example, temporarily commenting something out), and would rather that the compiler just got out of your way until you've got something ready to compile normally and commit.

Usage

    go build -gcflags=-warnunused somefile.go
    go test -gcflags=-warnunused
When you're done with your exploratory/debugging phase, simply build or test without the flag:

    go build somefile.go
    go test
Compiling without the flags will fail on unused things as normal.


Other comments are missing out the other key point: go packages are sometimes imported for side effects, and this pattern is common enough in the community (although albeit slightly discouraged now). So unused imports can still impact your code due to the init() functions


Unused imports that are imported solely for side effects should be imported this way to avoid that problem:

  import (
    _ "github.com/some/package"
  )
Given that this is the only way to have unused imports with mainline Go — in other words, it's impossible to accidentally remove one once it's been declared this way — I'm not sure I see what the argument is.


> I'm not sure I see what the argument is.

That there are other reasons why unused imports are a compilation error in go, and it isn't as simple as turning them into warnings


> go packages are sometimes imported for side effect

Sounds like another source of errors.


Fatal errors would probably be bubbled up via a panic, in which you can cover in a recover() block


"I really need to get this deployed to production. I'll remove the warning debugging flag and build it... AH MAN look at all these unused import error!! Screw it! I don't have time to fix them all. I'll just add it bad and fix it later"


When your devs do these kinds of things, the problem isn't the tools.

I like tools that serve me, not the other way around.


That is a valid position, but imo at odds with the implicit position of Go's creators. Arguably, the Mommy regime of Go's compiler is very much reflective of Go's creators' appraisal of the software maturity of its intended users.

Regarding the point in general, the notion of "broken windows" is applicable.

(Great github showcase, btw!)


It's the problem of "he knows what he's doing" vs "I know what I'm doing".

In the old unix days, tools were built under the assumption that "he knows what he's doing", meaning that if you typed "rm -r /" you probably had a good reason to do so. We've since learned that just blindly trusting potentially fat fingers is probably not the best approach, and so the more dangerous of those tools have been modified to require you to add additional flags to do the most dangerous things.

It was the same in C, where the compiler blindly did exactly as told until we realized that software developers have fat fingers too. Unfortunately, they got it backwards, issuing warnings that by default don't halt compilation ("he knows what he's doing"), which led users and managers to believe that warnings aren't serious enough to deal with.

What they SHOULD have done is made those warnings halt compilation by default, and only allow compilation to continue if the user had invoked an additional opt-in ("I know what I'm doing") flag.

There's also the issue of inconsistent warnings across compilers and the subtleties of UB in C that contributed to the warnings problem, but we don't have that in Go.


People often say this sort of thing about the constraints imposed by various systems. It's a bit of a rhetorical party trick, though, designed to avoid engaging with the design rationale of the constraints by appealing to the operator's vanity.


> Go's creators' appraisal of the software maturity of its intended users.

More like immaturity:

“The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt.”


"I really need to get this deployed to production. I'll build it... AH MAN look at all these compilation errors and failing tests!! Screw it! I don't have time to fix them all. I'll just remove them and fix it later."


I used to want that too, but I've paid pretty close attention over the last couple years and that error has caught a nonzero number of what would otherwise have been runtime bugs.


For production I agree 100% with you, to the point I think it's just sloppy to leave unused variables and imports hanging around.

But during development that's just a needless obstacle. I may comment large blocks of code and I don't want to scan each import to make sure they are no longer used.


I assume at this point that at least a plurality of Go developers are using goimports and having this stuff handled automatically. That's probably another reason my opinion on this shifted.


If IDEs did the same for unused variables, I'd probably shift as well, but right now, commenting out a line quickly degenerates into a game of whack a mole, because it causes a previous calculation to become "unused", and commenting out that line causes two more things to become unused, etc. The last thing I want during debugging is to have to fight the compiler at every step.

Debugging is the main reason I wrote this mod.



That was a fun read. Very interesting.

I have noticed that the AST nodes don't have a parent reference. I wonder how it knows for example when encountering a "continue LABEL" that the code is nested maybe deeply in a loop with that label. The only way I would think of is traversing the tree up but I think there is no way of doing this. How do they do it?


Based on [1] (where continue is handled) and [2] (the branch statement type), looks like the target is simply passed to the function. This is a recursive descent parser, so passing the target down the statement handling stack is trivial (and indeed, this is what happens for for[3]).

[1] https://github.com/golang/go/blob/master/src/cmd/compile/int... ("ctx" is a "targets" struct containing two entries, one for which statement 'break' targets and another for which statement 'continue' targets).

[2] https://github.com/golang/go/blob/master/src/cmd/compile/int...

[3] https://github.com/golang/go/blob/master/src/cmd/compile/int... (the targets{s, s} passed to innerBlock is then passed to blockBranches - s is the for statement node itself).

As a sidenote, it is interesting how easy Go reads even though i never wrote a single line of Go myself (though i do write a lot of C and Object Pascal and the code patterns look similar).


Now that you have pointed where it is, it really looks easy to understand how it works. Thank you!


Context from higher up the tree can easily be passed down when recursively processing, as others mention.

As to why do that, there are several reasons.

It's fiddly to construct trees when you need to patch the children with parent references. The most natural thing to do is create the children (usually via recursive parse) then construct the parent. If you then need to patch the children with a reference to their parent, it's more work. It also means your nodes can't be immutable (but they're often not wholly immutable anyway for other reasons, like annotating during passes that add semantic information).

It's easier to reason about tree manipulations when you only have downward pointers. For example, maybe you want to rewrite a common subexpression with a reference to a temporary, and reuse one of the common trees as the RHS on the assignment to the temporary. It's more effort if you need to patch both ways on the link, rather than just grab the tree and slot it into the assignment.

(It's possible that you have DAGs rather than trees, and have children with shared parents, but I think this isn't worth any extra representative or compression that it gives you because passes will want to mutate those nodes, and meeting the same nodes more than once makes invariants more complex.)

Finally, more interesting traversals, following control or data flow, can cut across and jump between tree branches, so parent links don't necessarily help you there either.

It helps that most languages don't have parse trees which would stress the runtime stack when processing recursively, outside of machine-generated code (and correspondingly, it's not that hard to get a stack overflow error or equivalent "too much nesting" error if you generate code targeting that failure mode).


I don’t know how the Go compiler works but one solution is for each scope level to have a symbol table with a reference to the parent level’s table.

So you can traverse up for symbols.


i recently hacked together my first simple compiler and the way i handled stuff like break/continue is that every loop pushed a "context" onto a stack, so when the compiler encounters a break/continue it just looks into the context to find the label it should jump to. could be similar


Question: does the Go compiler run meaningful optimization passes? I didn't really see it mentioned here, and given that I hear that the compiler is super fast I'm not sure if it does…


Check out part 2 (http://eli.thegreenplace.net/2019/go-compiler-internals-addi...) which talks about the compiler backend - there are many optimizations done on SSA


>many optimizations

Not many, mostly simple peephole optimizations.

https://github.com/golang/go/blob/master/src/cmd/compile/int...


There is a complete list of passes here[0]. It does some optimizations but much less than a mainstream C or C++ compiler.

[0] https://github.com/golang/go/blob/master/src/cmd/compile/int...


From a quick look at the repository it looks like there are optimizations performed: this code[1] seems to perform function inlining at the AST level, this code[2] converts the AST to SSA form which is generally meant to be used for optimizations, the SSA directory[3] contains a bunch of optimizations like common subexpression elimination[4], deadcode elimination[5], useless branch elimination[6] and other stuff (just browse around the files, they have comments about what they do). The SSA seems to contain both generic and machine-specific opcodes (basically instructions for the latter) with the former being converted to the latter using a bunch of rule tables[7] which seem to contain both generic-to-native transforms as well as peephole optimization rules (these are converted to go source code, most likely the huge files in the parent directory). Finally there seem to be a few optimizations in the machine-specific code generator (e.g. [8]).

Of course it really depends on your definition of "meaningful", but i think that as long as it isn't equal to "as many optimizations known to humankind as possible, everything else be damned", the compiler looks to perform a decent amount of them. At least for me it passes the subjectively vague "meaningful" check :-P.

[1] https://github.com/golang/go/blob/master/src/cmd/compile/int...

[2] https://github.com/golang/go/blob/master/src/cmd/compile/int...

[3] https://github.com/golang/go/tree/master/src/cmd/compile/int...

[4] https://github.com/golang/go/blob/master/src/cmd/compile/int... https://github.com/golang/go/blob/master/src/cmd/compile/int...

[5] https://github.com/golang/go/blob/master/src/cmd/compile/int...

[6] https://github.com/golang/go/blob/master/src/cmd/compile/int...

[7] https://github.com/golang/go/tree/master/src/cmd/compile/int...

[8] https://github.com/golang/go/blob/master/src/cmd/compile/int...


What about generics?


Why didn't they use LLVM?


This has been answered twice[0] by Russ Cox himself[1]. Some of the reasons were familiarity and the need for segmented stacks, but he goes more on the details in his comments.

[0] https://news.ycombinator.com/item?id=8817990

[1] https://news.ycombinator.com/item?id=1509700


Does https://golang.org/doc/faq#What_compiler_technology_is_used_... answer your question (I interpreted it as "why doesn't the Go compiler use LLVM?")?


LLVM is very slow.


it is?


It seems to be fast enough for SQL JIT-compiler in PostgreSQL 11+ and shader compilation in the *nix OpenGL stack (mesa), both of which are sort-of-realtime systems.



SQL queries and GL shaders tend to be much shorter than a typical program.

There are exceptions of course, but it's rare to find shaders with thousands of lines of code or more.


There is an ongoing work to move Mesa away from LLVM because LLVM is slow.


Another HN thread on projects that found LLVM to be slow.

https://news.ycombinator.com/item?id=16956589


That's the best way to learn — post a stupid thing on the internet and let the internet prove you wrong. Thanks to everyone who replied!


Can't say anything about shaders but I do know a thing or two about database queries:

1. Queries are often repeated, i.e. most of backend DBs get the same requests over and over again. Even a slow jit compiler is fine here as things just get cached. 2. Queries usually take some time to complete, and this offsets the jit-related latency.

Also, Postgres has a very limited kind of jit compilation, i.e. for expressions only.

Notice that javascript jit compilers usually have a multi-tiered compilation. That's because proper compilation takes time, and sometimes it is more efficient to just do some basic template jiting (or no additional compilation) instead of firing the heavy guns.

None of them use LLVM, btw.


I'm not sure I follow your reasoning, Go isn't JITed, is it?


Language != Implementation.

And yes, there are Go interpreters as well.

https://github.com/go-interpreter


Rust compilation is slow, and a large part of that is due to the use of LLVM.


LLVM is slow but swapping for CraneLift only improves compile times 33%. Other work is needed for a dramatic reduction in compile times: https://github.com/bjorn3/rustc_codegen_cranelift/issues/133...


Because it is good to have options, and we don't need compiler monocultures.


That's not a huge problem now that we have GCC and LLVM. The real issue is that the generic-ness, and optimisations, of them makes them pretty damn slow for non-performance critical code.

Writing a basic backend isn't a huge project for a language backed as thoroughly as Go. D, for example, has a non-GCC/LLVM backend which isn't as fast but still does some advanced optimisations (but compiles at warp speed)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: