Hacker News new | past | comments | ask | show | jobs | submit login
Compiling dynamic programming languages (eatonphil.com)
91 points by eatonphil on Sept 3, 2018 | hide | past | favorite | 36 comments



Hmm, I think the performance results for the last example aren't actually valid. The example/tco.js program is a linear-time algorithm since it's only making one recursive call, not two, so at n=50 you should expect it to be pretty much instant. Much, much less than a millisecond to evaluate.

I think the "0.080 total" and "0.087 total" are just from node startup time (and the variability within that), not time actually executing the function. I just ran node on an empty file and the running time was "0.087 total".

I think when doing these sorts of perf measurements it's best to make the code run for multiple seconds to better account for startup time, JIT warmup, caches, and the various other subtle factors that come into play.


I cannot agree more here. I strongly recommend benchmark.js (https://github.com/bestiejs/benchmark.js/)


Totally fair, and thank you for the link! To focus too heavily on performance right now would be a mistake. Even longer-term, jsc's most likely application is in simplifying deployments and packaging of Node applications. I included the programs in this article because I wanted to show _some_ example of where jsc stood a few weeks in.


In V8, crossing the boundary between the public C++ API and "js land" is actually quite expensive. In most cases you will get more perf from writing your code in JS. This is why we write a lot of Node.js's core in JS instead of C++.

Operations you see in the compiled output like `Local<Function>::Cast(global_3->Get(String::NewFromUtf8(isolate, "Boolean")));` are extraordinarily expensive, and should pretty much be avoided at all costs.


Thanks for the info. jsc right now is little more than PoC and I don't have a ton of hope for competing with V8 for performance long term (except perhaps when type inference or hinting enter the story).

The most immediate advantage I can think of for having a project like jsc long-term is for packaging/deployment. we use zeit's pkg tool at work today but a robust/mature Javascript-to-native compiler is much more compelling.


> competing with V8

The early python community went for C monoliths, but there was an alternative of python objects carrying C pointers. So python provides a form of api/plugin runtime dynamic linkage for smaller-grainsize C libraries. I don't know if anyone has explored that for javascript. If you've a problem domain of interest in need of almost-native speed (pointer call rather than static linkage) with 'compute-intensive runtime assemblages of C chunks' (eg, graphics or scientific), it might be a possibility.

> [...limited subset of javascript...?]

I don't quickly find the comment I intended to reply to, but fwiw, note that the ecmascript spec is written in sort-of pseudo-code English. Many years back I framed spec implementation as a semi-manual textural database cleanup and transformation exercise, and in a couple of days massaged spec into code. It helped that the target language had label gotos and permitted arbitrary identifiers, so the transformation was simple, and the resulting code looked like just like the spec. (Kind of ironic to see a comment elsewhere on this page dis'ing regexps.)


I'm doing this for Javascript - finding it a lot of fun! https://github.com/timruffles/js-to-c

I'd recommend it as a project to any wanting to learn more about a particular language (implementing a language teaches you how it works in painstaking detail), and get a better 'feel' for how languages and compilation works in general.


Ironic that your first initial and last name is truffles considering bridging between C and JS (and other languages) is possible if they're implemented via Graal's Truffle library: https://github.com/oracle/graal/tree/master/truffle


It doesn't have to be negative to get downvotes. Sometimes irrelevant discussion items are downvoted to keep them in the periphery.


Me too, except with Visual Basic 6. (It's a work thing.) Besides error handling, I think the gnarliest thing I've come across is this: VB has built in file management commands it inherited from GW-BASIC. Built in, in the sense that they're first class statements built into the language. One of these, for renaming files, goes "NAME <old> AS <new>", where old and new can be arbitrarily complex expressions. Name isn't a reserved word, so you can have a variable or subroutine called "name". So consider these three statements:

name (...) as (...)

name (...) = (...)

name (...)

...where (...) represents an arbitrarily complex expression. The first one is a "name as" command, the second is an assignment to an array element, and the third is a procedure call, and you can't tell which until you've fully and accurately parsed the first arbitrarily huge expression.

Also, the following:

name:

In BASIC a colon separates two statements on the same line, and an empty statement is valid, so is this a label or a subroutine call followed by an empty statement?

Edit - just remembered this gem: You can use a "With" block to save having to type the name of an object when referring to its members...

With SomeObject

    MsgBox .Name
End With

...will pop up a dialog showing SomeObject.Name. Now go ahead and tokenize that second line. If you're like me you got three tokens: [MsgBox][.][Name] The problem is that's indistinguishable from...

MsgBox.Name

You could say that ".Name" should be one token. Hmm. So what if it's something like...

MsgBox .Name.Substring(1, 3).ToUpper()

Still one token? I think my hair's going to be white by the time I finish this project.


Sounds like quite the challenge!

> Built in, in the sense that they're first class statements built into the language.

:O

> 94 year old hacker in the great state of Texas

Wow, I hope I'm coding at 94!


> :O

I get the same icky feeling when I look at some of the things they've shoehorned into JavaScript over the years, though, like regex.


I wonder if it's possible to just get the AST out of V8 and not have maintain your own lexer/parser? I also wonder if abstract interpretation might be the way to go to get type information out of the js code which should simplify the generated C++.

That said, reading TFA and the linked BSDScheme one gave me some ideas for the next time I get around to playing with minischeme. Too many toys and not enough time...


I'm not using a parser I wrote. Though of course someone must maintain it.

I considered writing this in C++ to get more tooling (JS parser, C++ AST libraries, etc.) but in the short-term I do not see myself switching. I'd be a little more likely to switch back to D though because I find data structures in Rust annoying.


It’s striking how unreadable the V8 output is. Is that due to V8 itself or just this particular use of it?


It's the product of whipping together some basic examples for proof of concept in a few weeks. :)

To some degree it will always be "messier". 30-60% of generated code will just be converting between C++ and V8 types.

But really, I just need to prioritize using more tmp values so I'm not shoving crazy amounts of logic in one line.

It can definitely be difficult to debug. Things would be easier if I were generating C++ ASTs and pretty-printing it (I should probably do this in the future) but for the PoC I'm just emitting strings of C++.

Chicken Scheme's generated C is more along the lines of what I'm aspiring to.


I don't think the output really needs to be readable since they're going for speed over anything else.

Who is normally reading this stuff?


Depends on the context, but FWIW, CoffeeScript has an explicit design goal of producing human-readable output (even though it's just JS to run in the browser) so that people learning the language can try out examples and build intuition and trust in what JS is going to be produced. It also helps when using a debugger without source maps.


The devs who are working on the compiler. Not the end user.


I'm disappointed. From the title it sounded like someone had created a language specifically for dynamic programming (https://en.wikipedia.org/wiki/Dynamic_programming).


Plenty already exist which support dynamic programming "natively." For example, the 'memoize' function in clojure.


I bet there are ways a programming language could support dynamic programming better than just having a memoize function in the standard library.

Examples:

* For a given DP algorithm, you might evaluate it top-down (recursive memoized call) or bottom-up (filling in the table values in topological order). Top-down is easiest to implement, but bottom-up can avoid stack depth limits and I think can be more efficient if implemented well. A language specifically crafted for it might let you write the simpler recursive implementation and efficiently evaluate it bottom-up.

* Sometimes you might want to run the same DP algorithm in different contexts, e.g. operating on different data sets (where the data set is constant for a recursion tree). Keeping the data set (or even a pointer to it) in each cache entry is wasteful if your cache has 100 million things in it, and passing it down as a parameter is also not as efficient as it could be. I guess closures may solve the problem, but maybe there are smarter ways that a language could help here.


Slap an @memoize decorator on a python function and you're good to go.


One might argue that this guy actually wrote a transpiler and not a true compiler as the target language is D and not assembly or machine language.

I don't want to diminish the work this guy has done, but one of the biggest challenges, unless you want to take credit for someone else's work, is to do all the various kind of optimization that a mature compiler would do, at various levels.

So yeah, that's cool, but I was expecting more given the use of the word "compiling".


I think the use of the word is fine. Whilst "compile" is often used to refer to programs whose target language is assembly/machine code, it generally refers to any translation from one computer language to another [0].

A "transpiler" is therefore a specific type of compiler.

[0] http://www.compilers.net/paedia/compiler/index.htm


Just going off of Wikipedia[1], "A source-to-source compiler translates between programming languages that operate at approximately the same level of abstraction", so at least by that definition, you probably shouldn't call it a "transpiler" if you're compiling from dynamic-typed code to static-typed code.

I view "transpiler" as a recent informal term to be a little more precise/honest when describing compilers targeting JS, but not one that people should feel obligated to use in place of the more generic term "compiler".

[1] https://en.wikipedia.org/wiki/Source-to-source_compiler


Yeah, that is what I was taught as well. From one abstract machine to another.

If we think about it javac would not be a compiler otherwise :-)


The reason I'm comfortable calling BSDScheme in particular a compiler is because ultimately you get a small binary you can bring to another machine without a runtime. Javac may get you bytecode, but you're still dependent on a JVM. This is the most useful distinction for most users, not if it compiles straight to assembly or not.

In the case of JSC, a binary is produced currently but it must be launched by a Node app. So it doesn't exactly meet my criteria. But being able to produce embedded V8 code will not be significantly more difficult for these examples. (Recreating Node's stdlib would of course be difficult but that's separate.) I'm hoping to have a Node-independent target soon.


I depend on the JVM in my mind traslates to "my target is the JVM"

You are "dependent" if you want on the RunTime (rt.jar). The way it was explained to me, and the model which I sticking to in my mind is:

Abstract Machine/language + RunTime = Level of the onions of a computer.

Assembly/CPU + libc Bytecode/JVM + rt.jar

The language manipulates the resources provided by the machine (registers, stack-machine etc etc).

My recollections are fading though - but I found this model to be good enough to explain me well how a computer works.


You always need something on which to run your program. If you compile your program on openbsd, then you need another openbsd machine to run it, you can't run it on windows. How is something like the jvm different from an os?


The majority of comercial JDKs also compile to native code.


I still think it would have been much clearer had the title stated that transpilation was its mechanism.

I don’t know D enough to know whether GC is enabled in this project, which is a major weakness in many dynamic languages that one might imagine aot compilation could eliminate.


I consider the term "compiler" to be very generic, and to cover a wide spectrum of tools ranging from those which compile from one high-level programming language to a slightly lower-level language (e.g. TypeScript), right down those which produce machine code (e.g. GCC/Clang+LLVM). In fact several major compilers (such as GHC and the first C++ compiler, cfront) started out by using C as their backend.

"Transpiler", which is arguably somewhat of a buzzword, typically refers to compilers where the source and target language are either the same, or at a very similar level of abstraction; I agree with n4r9 in considering them a particular type of compiler. Even so, the implementation described in the article compiles from a high-level language, JavaScript, to a considerably lower-level language, C++; so even if we exclude transpilers from our definition of what constitutes a true compiler I still think it's a valid term to apply in this case.


And the first C compilers actually generated Assembly, they weren't capable of generating .o files directly.


In gcc, the compiler proper still generates assembly instead of machine code. It then calls the system assembler (which is usually from gnu binutils) to generate machine code.


I don't remember much argument about the term "compiler" when compilation to C started appearing. That goes back at least as far as Scheme->C [1] and f2c [2]. (f2c was "just" a re-targeting of the first Fortran 77 implementation to a C backend.)

1. http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-89-1.pdf

2. http://www.netlib.org/f2c/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: