Hacker News new | past | comments | ask | show | jobs | submit login
Lfortran: Modern interactive LLVM-based Fortran compiler (lfortran.org)
165 points by zaikunzhang on Aug 28, 2023 | hide | past | favorite | 59 comments



This is probably fantastic from a maintainability perspective, but I'm curious if some performance is left on the table by using LLVM IR instead of compiling directly to machine code. I know there are a number of optimizations that can be made for Fortran that can't be made for C-like languages and I wonder if some of those C-like assumptions are implicitly encoded in the IR.


The original author of LFortran. Great question.

We designed LFortran to first "raise" the AST (Abstract Syntax Tree) to ASR (Abstract Semantic Representation). The ASR keeps all the semantics of the original code, but it is otherwise as abstract/simple as possible. Thus by definition it allows us to do any optimization possible, as ASR->ASR optimization pass. We do some already, we will do many more in the future. This optimizes all the things where you need to know the high level information about Fortran. Then once we can't do any more optimizations, we lower to LLVM. If in the future it turns out we need some representation between ASR and LLVM, such as MLIR, we can add it.

We also have a direct ASR->WASM and WASM->x64 machine code, and even direct ASR->machine code, but the ASR->LLVM backend is the most advanced, after that probably our ASR->C backend and after that our ASR->WASM backend.


How does LLVM cope with the array semantics? I was under the impression that the noalias attribute in the IR was not activated in such a way as to enable the optimizations that make Fortran so fast.


My experience with LLVM so far has been that is possible to get maximum speed as long as we generate the correct and clean LLVM IR, and do many of the high level optimizations ourselves.

If LLVM has any downsides, it is that it is hard to run in the browser, so we don't use it for https://dev.lfortran.org/, and that it is slow to compile (both LLVM itself, as well as it makes LFortran slow to compile, compared to our direct WASM/x64 backends). But when it comes to runtime performance of the generated code, LLVM seems very good.


LLVM is awesome if you emit what it expects (basically IR as close to what clang emits).

The further you stray from that, the worse is it.


Rust drove the fixes needed in llvm to support noalias. They went through a couple reverts before seemingly fixing everything. If lfortran emits noalias, llvm can probably handle it now.


> We designed LFortran to first "raise" the AST (Abstract Syntax Tree) to ASR (Abstract Semantic Representation) …

I’ve actually never heard of an ASR before… sounds very interesting! Do you happen to have any resources I can read about this?


WASM -> x64 as in a full WASM AOT compiler? Don't those already exist? What's the benefit to making one specifically for LFortran? Unless I'm misunderstanding

Super cool stuff though


Yes, we could make the WASM->x64 standalone. The main motivation is speed of compilation. We do not do any optimizations, but we want to generate the x64 binary as quickly as possible, with the idea that it would be used in Debug mode, for development. Then for Release mode you would use LLVM, which is slow to compile, but good runtime performance. And since we already have ASR->WASM backend (used for example at https://dev.lfortran.org/), maintaining WASM->x64 is much simpler than ASR->x64 directly.


I think the question was why maintain your own WASM->x64 at all though? Couldn't you just run an existing WASM->x64 tool (using -O0 or equivalent if the goal is quick compilation for debug mode)?


As someone who's only played with Fortran, and never done anything too serious with it, can you explain an optimization that can be done in Fortran that can't be done in a C-like language?

I'm not being argumentative, I'm actually really curious.


A simple example is returning an allocatable array from a function, where the Fortran compiler can decide to allocate on a stack instead, or even inline the function and eliminate completely. While in C the compiler would need to understand the semantics of an allocatable array. If you use raw C pointer and malloc, and use Clang, my understanding is that Clang translates quite directly to LLVM and LLVM is too low level to optimize this out, depending on the details of how you call malloc.

Of course, you can rewrite your C code by hand to generate the same LLVM code from Clang, as LFortran generates for the Fortran code. So in principle I think anything can be done in C, as anything can be done in assembly or machine code. But the advantage of Fortran is that it is higher level, and thus allows you to write code using arrays in a high level way and do not have to do many special things as a programmer, and the compiler can then highly optimize your code. While in C very often you might need to do some of these optimizations by hand as a user.


Fortran has true multidimensional arrays in a way that C doesn't have--if you know an array is 5x3, you know that A[6, 1] doesn't map to a valid element whereas in C, it does map to a valid element. This turns out to make a lot of loop optimizations easier. (Also, being Fortran, you tend to pass around arrays with size information anyways, which C doesn't do, since you typically just get pointers with C).


Is the size info compile-time or runtime in Fortran?


It can be both. If you know the dimension at compile time, it is compile time, if you don't it will be runtime.


I don’t think such an optimization exists.

The nice think about Fortran it that is does the sensible thing by default for the type of scientific computing codes that are inside it’s wheelhouse (the trivial example, it assumes arguments don’t alias by default).

C can beat anything, assuming unlimited effort. Fortran is nice for scientists who want to write pretty good code. Or grad students who are working on dissertations in something other than hand-tuning kernels.


C can be as good as Fortran if make sure to declare pointers "restrict". That is a C feature added in C99 though. For Fortran it has always been the default.

For a long time Fortran was actually unbeatable and C did not suffice to specify all possible uses of assembly...


This is the correct answer. They almost entirely compile to the same machine code for the computationally intensive parts. (Even Julia does that these days.) But the limitations of Fortran prevent a lot of difficult-to-debug C bugs, while not affecting typical scientific and numerical capability.


Here’s a link to a StackOverflow answer that gives a good example: “Is Fortran easier to optimize than C for heavy calculations?” [0]

[0]: https://stackoverflow.com/questions/146159/is-fortran-easier...


The most significant distinction is that dummy arguments in Fortran can generally be assumed by an optimizer to be free of aliasing, when it matters. Modifications to one dummy argument can't change values read from another, or from global data. So a loop like

  subroutine foo(a, b, n)
    integer n
    real a(n), b(n)
    do j = 1, n
      a(j) = 2 * b(j)
    end do
  end
can be vectorized with no concern about what might happen if the `b` array shares any memory with the `a` array. The burden is on the programmer to not associate these dummy arguments on a call with data that violate this requirement.

(This freedom from aliasing doesn't extend to Fortran's POINTER feature, nor does it apply to the ASSOCIATE construct, some compilers notwithstanding.)


> The burden is on the programmer to not associate these dummy arguments on a call with data that violate this requirement.

What happens when the programmer pass aliasing a and b? Will it cause UB, like in C if you violate the restrict keyword?


Fortran's standard doesn't use the term Undefined Behavior; instead, it states a requirement that an object modified via one dummy argument must not be modified or referenced via any other name. When a program violates that requirement, it's no longer conforming to the standard, and the language doesn't define what happens afterwards. In practice, you'll get bad results and a bad day of debugging.


> When a program violates that requirement, it's no longer conforming to the standard, and the language doesn't define what happens afterwards. In practice, you'll get bad results and a bad day of debugging.

This is just UB by another name

That's.. disappointing


Rewind your disappointed expectations back to Fortran II in the 1950's, the first Fortran with subprograms. The value proposition was this: if a programmer was willing to help the Fortran optimizer out by avoiding aliasing, the compiler would generate code that was usually fast enough to avoid the need to write in assembly. It was a great deal for the programmer at the time. Fortran succeeded not just because it allowed one to write more productively in a "high-level" language but because its compilers could optimize well enough that there was no significant performance cost to doing so. (And then its popularity allowed one to port Fortran code from one vendor's hardware to another.)


This can be done in C, but not C++, though in practice all C++ compilers support it. It's the `restrict` keyword


Suppose A and B are fixed length arrays. C = A + B - A is easily optimized in fortran, to a no op if C is not modified afterwards, and to a copy if it is.

In C this is pretty much impossible.

To be fair, there are C like languages (ispc, glsl) which makes this work with heroic compiler efforts.


LFortran is not necessarily using LLVM IR to compile. It's building up an ASR [1] structure that's already being used in the LFortran-specific backends. Potentially it can make full use of Fortran semantics!

[1] https://docs.lfortran.org/en/design/


> there are a number of optimizations that can be made for Fortran that can't be made for C-like languages

That used to be true a long time ago but since the restrict keyword was introduced in C99 it's not really true any more.


As an indirect answer, consider Julia, which is based on LLVM and seems to be competitive with Fortran on large scale, numerically intensive calculations.


Can Julia pre-compile to a binary executable now? If not, they can't replace Fortran.



That is not a true binary. Making julia truly compile into binaries is now the number 1 goal of the language according to Tim Holy and the julia team.


LFortran can translate your Fortran code to Julia via our Julia backend. Once Julia can compile to a binary, it will be exciting to do some comparisons, like speed of compilation and performance of the generated binary. As well as the quality of the Julia code that we generate, we'll be happy to improve it to create canonical Julia code, if at all possible.


Related ongoing thread:

Fortran - https://news.ycombinator.com/item?id=37291504 - Aug 2023 (193 comments)

(current thread is better because less generic)


I wanted to see a comparison with flang, which I thought is the main LLVM Fortran front end.


I am the original author of LFortran. What kind of comparison would you like to see? If you have any specific questions, I am happy to answer.


What are the relative strengths and weaknesses? If LFortran is newer, what problems with flang does it aim to address? How should someone choose between them, and how does the decision process change in different circumstances?


There is old Flang, which motivated me to start LFortran. The new Flang, which presumably you are referring to, started possibly in the same month as LFortran, but we didn't know about each other.

It's best if you ask Flang developers what they see as the advantages of Flang over LFortran. From my biased perspective, LFortran can run interactively, it is fast to compile the compiler (30s on my laptop) and LFortran compiles your code very quickly (especially with our direct x64 or C backends). It runs in a browser: https://dev.lfortran.org/. We have many backends (LLVM, C, C++, Julia, WASM, x64). We plan to add Python and Fortran (the latter could be used to modernize your old Fortran code). It is easy to add new backends, and it is also easy to add new frontends, so we have LPython and LFortran as two thin frontends, to our intermediate representation that we call ASR (Abstract Semantic Representation). The internal design is simple, so a small team can develop LCompilers at a fast pace. New contributors without any prior compiler experience get up to speed very quickly (typically a few weeks or even less).

We are still in alpha, which means it is expected to break for your code (and when it does, please report all bugs!). To choose between them, I recommend to test them out and pick the one that you like the most, based on your criteria. Note that the most mature and widespread open source Fortran compiler is GFortran.


> LFortran is structured around two independent modules, AST (Abstract Syntax Tree) and ASR (Abstract Semantic Representation), both of which are standalone (completely independent of the rest of LFortran) and users are encouraged to use them independently for other applications and build tools on top.

Modern frontend architecture comes to Fortran! Awesome.


It's a little bit embarrassing that a Fortran compiler provides a shell for interactive use, yet the Rust project, with at least two orders of magnitude more development effort, does not.


LFortran error messages style got inspired by Rust, and the Fortran Package Manager got inspired by Cargo. And in return Rust can get inspired by LFortran to do the interactive prompt. :)

In all seriousness though, Rust got many things right and showed how a modern language, compiler and a package manager should behave. I think it genuinely moved the state-of-the-art. And we are trying hard to improve on the state-of-the-art as well. I think a modern compiler should compile fast in Debug mode, and generate high performance code in Release mode. And it should compile to a binary as well as work interactively. Etc.


Rust used to have an interactive prompt of sorts with MIRI, I'm not sure if that's still possible. I don't think I've heard anyone talk about it for 5 years.


How does this compiler compare with Flang? I saw it shouted out on the main page but didn't really see any comparisons for why you'd pick one or the other.


It's hard to have a fair comparison, and both compilers are also moving targets. I tried to do some comparison in a sibling comment: https://news.ycombinator.com/item?id=37300279

The best is to mention both (as well as GFortran), and users can decide. For LPython (https://lpython.org/) we list all of the about 30 Python compilers at the webpage, but beyond that it's very hard to have a meaningful comparison.


Was I the only one hoping that Lfortran was something for music synthesis (LFOs...) using Fortran? I can't not see LFO - like that keming joke.


Is this the same outfit that did LPython? It looks like the same/similar web design.


Yes, both LPython and LFortran are our two thin frontends to ASR (Abstract Semantic Representation). Not just the website is reused, but the internals are reused, so LPython runs your code at exactly the same speed as LFortran would, since ASR and all optimizations and backends are shared.


Cool cool cool, I just wanted to make sure I wasn't hallucinating. I'll try out a Hello World!


Awesome! Report all bugs. :)


10-4 boss.


When installing clang I am amazed at the size of some of the LLVM shared libraries. Gigantic.


Is/will OpenMP be supported?


I misread LLVM as LLM and wondered what on earth language models have to do with Fortran compilation. (They don't. LLVM is a compiler framework.)

Anyways, great work to the team! It's fun to see such a flurry of articles from the Fortran community today


Thanks! I know, LLM came much later after LLVM. But if you are interested in LLM that LFortran can compile, check out: https://github.com/certik/fastGPT/.


Would be cool for there to be a `llama2.f`, like https://github.com/karpathy/llama2.c, to demo its capabilities


Yes indeed, we'll do it next (unless somebody beats me to it). First I am focusing on compiling fastGPT with LFortran, we can do it, but have a few workarounds that I want to fix. Then we'll do llama2.


I was talking to someone on Grinder yesterday who made the same mistake. I wonder how common it is to misread "LLVM" as "LLM".


> I misread LLVM as LLM and wondered what on earth language models have to do with Fortran compilation.

You gave me a good laugh this evening!


Turns out Fortran is a great fit for LLM, I am not joking: https://github.com/certik/fastGPT/.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: