Hacker News new | past | comments | ask | show | jobs | submit login

We sure expected that generating idiomatic C code was important, and this informed a lot of early design choices in our toolchain. We were surprised, however, by how closely Mozilla reviewed and manually inspected the generated C code.

Yes, yes, yes! Just because one is automatically generating/translating code, that doesn't mean it can't be pretty! When automatically translating code, the matching engine needs to be done with the full syntactic expressiveness of the source language, and what is matched and translated need to be idioms in each language! (As opposed to fine-grained syntactic elements. When the translation is done below the level of idioms, what results is non-idiomatic. It sounds pretty obvious when put like that.)




When compiling one language to another, or to assembly, or just straight to object code, the most important things are that a) you produce interfaces (APIs / ABIs) that are easy to use, b) you generate good code.

No one is going to demand that GHC generate readable assembly. Why should they demand that GHC generate readable C if it were generating C instead of assembly?


> Why should they demand that GHC generate readable C if it were generating C instead of assembly?

Debugging in programs that are mixed C and other code. If you're in the middle of debugging Firefox (or any composed system) and start stepping through unstructured gobbledegook, you'll end up cursing the people who did this to you.

This becomes less of an issue if/when the non-C language becomes so ubiquitous that there is proper debugging support for it (viz Python or C++ support in gdb), but as long as a bug may be triggered, propagated or otherwise be interacted from the non-C side, you want that code easily readable.


I remember a friends cracked out rant after a 48 hour all hands debugging binge to find a bug deep inside auto-generated JavaScript and HTML. They were using unicode text strings to tie the gobbledegook back to the source inputs and then trying to reason about what the code was supposed to do.

Comparison debugging mixed C source and assembly in a debugger is trivial.


Because the generated C is expected to be integrated into the project as source code: "Along the way, we learned what it takes to deliver quality C code that can be taken seriously and integrated into an actual source tree."

The code they generate is supposed to live on as source code, along with other code in the project. We should expect people to read and try to reason about code that exists in source control.


For code generators whose output you must or are expected to tinker with, I completely agree.

For compilers, however, I do not. For example, one codebase I help maintain is Heimdal, and it has its own ASN.1 compiler that generates C (and also something of an interpreted bytecode, as an option), and it's output is not even properly indented -- nobody minds because it works, and on the rare occasion I have to inspect that code, I use tools like VIM or clang to format it.


It isn't just tinkering that means the C must be readable. People need to be able to read it and understand what it does without learning F*.


It doesn't matter if the C code you are generating only gets seen by the C compiler. It does matter if humans need to see the generated C code, understand it and debug it.

This happened to me at work too. A large majority of the codebase is manually written C, but some of that is too tedious/error-prone to be written so they are generated. We invested a great deal of effort to generate extremely readable code, even with comments and all. The reason is that other people need to read this C code—both interface and implementation—as they work.

Speaking of GHC, the Haskell ecosystem actually has great tools to generate readable code. The various Wadler/Leijen libraries like ansi-wl-pprint make it a breeze to generate readable code. Indeed not many other languages have so many good, if a bit idiosyncratic, libraries to choose from just for pretty printing!


Debugging and profiling tools, which understand C. You don’t want a mess of auto generated function names showing up at the top or your profile. It’s better if the code generator produces something roughly human readable.

Also, crash reports produce stack traces. I’m positive that Microsoft and Mozilla are both heavy users of automated crash reports.

Abstractions are leaky. There are many tools for debugging assembly as well.


I've been playing around with nimlang...it compiles to C. It does create long identifiers, and consecutive ones too. Haven't had to, but running it in a C debugger would be madness.

Identifiers like:

TM_iLzrQjTMtHjOSlNDU8lfsw_2

TM_iLzrQjTMtHjOSlNDU8lfsw_3


You should check out Nim's gdb support. It will demangle these identifiers for you.


I would prefer generated C be readable because it’s quite likely I will need to step through it with a debugger.


At least in the web development world we have sourcemaps for that, html/css/js can all be inspected and debugged in their original form.

Is there anything like this for C?


Meanwhile I find it troublesome to rely on source maps even in web development. The JavaScript ecosystem doesn't produce nice enough abstractions that you can totally forget about the generated code and work in the original code. Just read the generated JavaScript actually run by browsers. Debugging sometimes really requires cutting through abstractions.


Same here. I did a bit of CoffeeScript development and source maps rarely worked properly. Arguably we just didn't have them set up properly, but it's a condemnation in its own right if our smarter-than-average engineers can't configure the technology properly.


I'm not sure if it's standard c/c++ but there are the #line directives (https://gcc.gnu.org/onlinedocs/cpp/Line-Control.html), bison and flex use them to good effect and I've used them when piping code into gcc. It'll give compiler errors in the right places and make gdb behave when stepping through code, not sure if it runs into limitations or not.

According to the oldest docs I could quickly find (https://gcc.gnu.org/onlinedocs/gcc-4.0.4/cpp/Line-Control.ht...) it goes back to at least gcc 2.95 in 1999, but I wouldn't be surprised if it predated javascript and css themselves.


> Is there anything like this for C?

I've been fixing up some ~15 year old code to compilable state and the flex/bison bugs were showing up with the flex/bison line numbers where the errors originated -- which somewhat helps but it turned out none of the errors were actually from flex/bison but because of how their API changed over the years.

I'd get some WTF compilers errors and track them down to running the output of flex through sed in a perl script to do something that it now does out of the box and didn't like being messed with, fun times...well, it actually is fun times since I'm just doing this on my own so I can play with the software.


C preprocessor supports the `#line` directive that can specify a custom source file/line combination. The compiler (or the runtime code with `__FILE__` and `__LINE__` macros) can then use it to report the location of the error.


I haven't used sourcemaps enough to know for sure that the following problem isn't accommodated for: It's sometimes extremely opaque that you finally hit something arbitrary like an int32/uint32 mismatch or situations where you are dealing with a second class language to normal getting an interface and running into some weird quirk of how the interpreter passes data into the FFI.

There is also the other potential issue that the final code is doing something strange because of some #define magic, which is also very hard to trace without being able to walk through code.


Probably the majority of generated C code is where the input language lives in an entirely different domain (e.g. parsers), so sourcemaps wouldn't help.


I don't do web dev but you can debug mixed source and assembly. You can switch back and forth between stepping though source or assemble and inspect registers and variables at the same time. It starts to break down with heavily optimized code.

Part of the reason C still exists is enormous amounts of work went into the tooling.


At that point just look at the disassembly. In many cases you'll have to anyways, so you might as well.

Also, the code generator should include tracing and debugging support. That seems much more important to me than generating idiomatic and readable C.


It probably depends how likely the consumer is to look at the code. In the article, they’re expecting code review of copy-pasted C code produced by their system, so the bar is higher than generated assembly that almost no one will ever look at.

TypeScript is an interesting example, the generated JS is pretty close to the original TS, largely just with types removed, so for someone with lots of JS experience it is easy to get confidence in the compiler as the output is pretty close to what it would have been in the first place.


Not exactly the same things, ts doesn’t generate anything really, coffescript or better reasonml compiled to js would be better comparison. Ts/flow design goal was specifically that if you replace type annotations with white spaces it is precisely normal js. There are no static or dynamic/runtime transforms.


Off-topic but somewhat related: BuckleScript transpiles to very readable and understandable Javascript.


Agreed, if you're going to be editing the output, then it must be readable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: