Scientific computing’s future: Can Haskell, Clojure, or Julia top Fortran?

beloch · on May 10, 2014

A lot of people coming from a CPSC background view Fortran as a dinosaur. However, it's probably better viewed as a Crocodile in the sense that, even if it is ancient, it has evolved to fill a specific niche very, very well. Fotran allows complex math (especially linear algebra) to be expressed more compactly than most low level languages (e.g. C) are capable of while still offering excellent control over how hardware is utilized. Just as it's relatively easy to dance around a crocodile out of the water, it shouldn't be too difficult for languages like Python to challenge Fortran when ease of use matters but performance is of a lesser concern. However, going into the muddy water to wrestle with crocodiles is a different matter entirely! I would not be surprised if people rehash this conversation with an entirely new set of prospective croc-slayers another twenty years from now.

m_mueller · on May 10, 2014

I think the single biggest reason Fortran is so successful in HPC is its implementation of multidimensional arrays. They are at least as fast as pointer arithmetic style C, they have powerful splicing syntax and they have a simple but good because length checkable file output format. Teach a scientist all the imperative control flow constructs along with this and he knows how to program a cluster for data parallel tasks, the most common one in HPC. Concerning Haskell and Co: Get back to me when it's simple to do a performance analysis, e.g. roofline model. Julia on the other hand is promising, but it's still going to be an uphill battle w.r.t. the work gone into Fortran compilers.

cabinpark · on May 10, 2014

I think people forget that Fortran was designed to only do mathematics. I don't know of any other languages that are designed just to do number crunching and nothing else. Even though C is very low level, you can write almost anything in it. Even MATLAB/Julia are able to do more general purpose programming which Fortran doesn't really allow.

The more domain specific you are, the better you are at that one task. This is where Fortran excels and why it won't be replaced any time soon.

walshemj · on May 11, 2014

Well of course real programmers can do anything in Fortran :-)

I wrote billing systems for a major Telco which where mostly Fortran.

cabinpark · on May 11, 2014

Absolutely! Has anyone written a server in Fortran?

adamnemecek · on May 11, 2014

Also AFAIK, a Fortran compiler can, due to the way the language works, detect pointer aliasing which can result in major speedups as non-aliased code can be optimized to allow for fewer memory accesses.

dekhn · on May 10, 2014

What's CPSC?

gshubert17 · on May 10, 2014

ComPuter SCience?

kunstmord · on May 10, 2014

I've dealt with a lot of legacy Fortran (and Pascal) code, and while the I agree with the article in that Fortran has to go, Haskell and Clojure seem VERY weird and pointless choices in the area of computing (see the comments on the ars website, a lot of valid points there). But the biggest problems that I've encountered in the Fortran code I dealt with were not exactly Fortran-related: 1) Terrible variable naming (aaa = eee / ccc + 1. and so on) 2) goto's, huge chunks of code with no structure (or, even worse, structured with goto's) 3) disregard for numeric accuracy and overflow/underflow

And this, imo, has more to do with the way CS is taught to scientists - I had a two-year course in C/C++, and we spent those two years writing all kinds of trees, lists and stuff like that. Needless to say, that is good and all, but that didn't exactly help us with writing scientific code later on - a lot of people wrote terrible code to get their AVL trees working, for example, just to get a passing grade. No one taught coding style, working with CVS, computer arithmetics and such. The same goes for the MATLAB course I took.

In my opinion, it would've been a lot wiser to teach people scientific computing using Python. It has tons of scientific libraries (a lot of people that I know who are involved in scientific computations often neglect to re-use code, use publicly available libraries; teaching people how to use third-party packages/libraries is important), forces programmers to indent (the amount of unindented C code I've dealth with makes me shudder), and makes them realize what makes a program fast or slow. Besides, using Numba/Cython/Theano/multiprocessing), it is possible to give a more or less painless introduction to the world of parallel/optimized computing. And only then start teaching C/C++/OpenMP/MPI/Fortran.

Now, I'm judging from my personal experience and from what I've seen at my university (which is the second-biggest research university in the country), there's a huge difference between how CS is taught to CS students and science students (physics, mechanics). The knowledge that science students receive is subpar, and, unfortunately, enough to start writing computational code.

cabinpark · on May 10, 2014

I was going to write a long post but you really summarised my thoughts exactly. Haskell and Clojure don't even make my list of scientific programming languages.

My old university switched from C++ to Python for teaching the physicists, which I think is a good move. For many scientists, Python has most of the tools they need to do their research effectively and there is no need to go into the more work-horse languages of C/C++/Fortran. If they need the more work-horse languages of C/C++/Fortran, there are plenty of resources available.

The supercomputing consortium at my old university were also making a big push towards Python (over MATLAB) and, I think this is good, teaching scientists how to write and maintain code. Software Carpentry regularly came through and gave weekend sessions on tools like version control (which I have seen more and more scientists use) and how to write readable code.

I think people are recognising the need to teach the basics of software engineering to scientists and it is catching on in Canada based on what I've seen from Compute Canada (the group in charge of all the academic supercomputers in Canada). I think as the current generation, who are now being taught to use these tools early on, becomes professors, we will see even more of this. Unfortunately it will take time but it is changing.

jnbiche · on May 10, 2014

"Scientific computing" is such a broad term that it's not terribly useful.

For numeric computing, Fortran, C, and C++ will likely remain at the top for years to come.

For statistical and exploratory data analysis, R has long been king (and closed-source tools before R), but Python is rapidly coming out on top here. Clojure could challenge here, but it's far from having the popularity of R or even Python right now.

For machine learning, Matlab along with its open-source analog Octave, have long been de rigueur, but Python is rapidly gaining ground here, too. I think here is where Julia is hoping to gain ground, at least initially.

So it's a bit odd to lump different areas of scientific computing together, but even odder to neglect the one language that has a chance of topping more than one of these areas. And I say that as someone who is moving from Python to Go and Rust for a lot of my software (but still stay with Python for data exploration).

Nonetheless, not a bad introduction to the languages in question.

gammarator · on May 10, 2014

As the Ars comments make clear, with the exception of Julia none of these languages has any chance of wide adoption under the broad umbrella of "scientific computing."

For a defense of the numerical/scientific computing tradition of which FORTRAN is the ne plus ultra, see this article: http://www.evanmiller.org/mathematical-hacker.html

It's telling of the Ars author's lispy blinders that he gives recursive examples for computing Fibonacci numbers. As the linked article makes clear, this is ridiculous because there's a closed form solution:

""" long int fib(unsigned long int n) { return lround((pow(0.5 + 0.5 * sqrt(5.0), n) - pow(0.5 - 0.5 * sqrt(5.0), n)) / sqrt(5.0)); }

No recursion (or looping) is required because an analytic solution has been available since the 17th century."""

alecdbrooks · on May 10, 2014

The author of the article is aware of the article you linked to and wrote a response: http://lee-phillips.org/lispmath. The problem with the closed form is that once it exceeds the built-in data types, it no longer takes constant time to compute, so it's not actually always faster than doing it iteratively.

Apparently, the ideal method is "none of the above" but to use matrix exponentiation (or a formula derived from it) instead: http://nayuki.eigenstate.org/page/fast-fibonacci-algorithms.

walshemj · on May 10, 2014

Julia appears to be to slow for a lot of large real-world uses - this is where the power cost of running the cluster becomes important.

There is a job posting outstanding near me for over a year looking for some one to port CFD Fortran to C++ - I have never had to stifle giggles when talking to a recruiter before.

They would better of training their new staff in Fortran after all that's what I did at BHRA.

I suspect its ARA out at twinwoods - one would hope my old employer isn't so silly.

KenoFischer · on May 10, 2014

> Julia appears to be to slow for a lot of large real-world uses

I'm curious what kind of application you are referring to and where you get that impression. We are always looking for examples where we don't get good performance, so we can optimize, so I'd love to hear your experiences.

walshemj · on May 11, 2014

well I was going off stack overflow

http://stackoverflow.com/questions/20613817/julia-julia-lang... where even with hand tuning the Julia code its a lot slower than Fortran.

I as thinking of large scale CFD work ie simulating two-phase flow in a nuclear reactor or a simulation of airflow over aero systems - which is where I suspect that job I mentioned is based.

Must have a look at Julia after I have finished teaching myself Java (Spit!) for hadoop

KenoFischer · on May 11, 2014

Ok, looking at the code is seems like you could probably get another 2x by switching to views rather than slices which will be the default sometime in the julia 0.4 timeframe. That would probably put julia perforamance at 1.2-1.5x Fortran, which while there of course is always more optimizations to be done is at least pretty good.

haddr · on May 10, 2014

I was missing R and Python in the article. And who says we need a king? Maybe the rich ecosystem of many coexisting languages is better?

pekk · on May 11, 2014

R and Python aren't "hip" like Haskell, Clojure, and Julia. There is no buzz, they are old news.

bayesianhorse · on May 11, 2014

In my experience there is a substantial subset of programmers/computer scientists who don't even consider any dynamically typed language as worth their time. In their view, these are at most for beginners or small projects. This bias is like a huge blind spot...

reitzensteinm · on May 11, 2014

I think there's also an equal but opposite bias, where programmers disregard static typing by falsely equating it to what's in Java and C#.

In both cases, I think the blub paradox is solidly at work.

bayesianhorse · on May 11, 2014

No, actually, proponents of Python or R don't generally claim that all statically typed languages are useless in practice. In fact I don't remember any evidence of that.

Also there doesn't seem to be any evidence that dynamic typing is detrimental to programming.

And I don't see all non-computer-science-scientist-programmers learning Haskell any time soon.

jey · on May 10, 2014

Julia is definitely going to be the winner. I've been using it for a few weeks and it is just so natural for scientific/technical computing and effectively covers a wide range of use cases. The type system and syntax allow for code to be expressed in terms of the domain's natural objects and notation, without having to do awkward translations between the math and the code. It has the mathematical features of MATLAB without compromising on speed (Python) or expressivity (C and FORTRAN).

quanticle · on May 10, 2014

Julia will really take off when some of the libraries and toolkits available on MATLAB get ported to it. From what I've heard, very few people genuinely like MATLAB. It's just that MATLAB has toolkits with optimized functions for almost everything under the sun, so if you need to get results quickly, you're better off sucking it up and using a MATLAB toolkit that does half the work for you than reimplementing everything from scratch in a less insane language.

AFAIK, this is why Python has caught on so quickly in numerical computing circles. NumPy isn't up to the level of MATLAB's toolkits in terms of having functions for specialized applications, but it is a comprehensive numerical computation library with fast C implementations of a wide range of common functions.

That's the core lesson that I think the article misses. What matters isn't the language itself. What matters is the collection of libraries available for that language. Research and scientific computing is not like typical software development. In "normal" software development, the maintenance costs of a particular piece of code will easily swamp the cost of writing the code, so it makes sense to write the code in a more maintainable language. But for a research project, once the paper is written, it's fair to say that the code will never be looked at again. The situation is changing, slowly, as things like software carpentry and more data-driven research projects spread modern software engineering principles into the research computation community. But, by and large, research computing is still defined by one-off projects where the speed of initial implementation (which directly affects the time to publication) matters a lot more than the long-term maintainability of the codebase. It's this tradeoff, which is radically different from commercial software, that explains the persistence of FORTRAN and MATLAB in scientific computing.

jey · on May 10, 2014

Julia has a sophisticated and performant mechanism for calling C libraries. It ships with common packages like SuiteSparse already built into the standard library's sparse matrix types.

I do agree that Julia isn't ready for prime time "cookbook" style uses, and is more useful to those writing computational routines.

bayesianhorse · on May 10, 2014

I don't know if you are confusing NumPy with the entire "scientific python stack", or you just don't know anything beyond numpy, but in terms of implementing all the specialized toolkits available for matlab, Python is a lot closer to that than Julia...

Xcelerate · on May 10, 2014

I do molecular dynamics simulations using LAMMPS on HPC systems. LAMMPS is written in C++. I'm not normally a fan of object-oriented languages, but this seems to work well for a system where you have an abstract base class (like an atomic pairwise potential) that allows users to easily derive their own potential class from it.

I wouldn't say LAMMPS is super-optimized for a particular application compared to some other MD codes, but it is very good for a wide-variety of situations, kind of like C++. Just guessing, I'd say LAMMPS is easily within a factor of 2-3x of most hand-tuned assembly codes, but the generality of it really outweighs the performance penalty.

Personally, in terms of programming languages, Julia is really growing on me. I've been using it for performance-intensive, single-threaded programs and it works great. I'm actually considering experimenting with it for some of my web application projects (currently using Node.js for those) just because of how much I like the language design.

bayesianhorse · on May 10, 2014

For quite a few cases, the "symbolic" route might be the future. In Python for example there is sympy, which is mostly a computer aided algebra toolkit, but it can translate formulas to Fortran, Theano and Javascript.

Theano on the other hand is also a symbolic toolkit designed to make linear algebra super-fast. It takes a symbolic representation of the computations, and then compiles it into C code or Cuda for GPUs.

Theano has been used extensively in deep learning, but it has other applications as well.

PyMC is a library implementing Bayesian inference through monte carlo methods. Version 3 implements samplers based on Theano. The advantage here is that Theano can automatically deduce derivatives, which allows for more sophisticated algorithms and better performance.

tom_jones · on May 11, 2014

I personally prefer Scala.

It supports functional programming (combining it nicely with object oriented programming), immutability, tail recursion, lazy evaluation (you have to specify what will be evaluated lazily), collections with parallel processing support, actor based processing, pattern matching and of course a REPL.

But none of that is mandatory, for example when needed, you can also use mutable variables and collections.

It runs on the JVM, and you can mix Java code and libraries with Scala code and libraries. And the ecosystem of Java libraries is huge.

warmfuzzykitten · on May 11, 2014

Yes, Scala certainly has all the latest programming language bells and whistles, but - leaving aside the many highly tuned libraries and 60 years of compiler experience on every imaginable hardware configuration - it doesn't have the single attribute that keeps Fortran on top: runs numeric codes fast.

frowaway001 · on May 11, 2014

A trait shared with the other languages mentioned here. It will be hard to beat Fortran, although Scala might be closer than the alternatives.

tormeh · on May 11, 2014

It would be fun to see what Scala's performance is like with long-running simulations. Should be ideal for hotspot optimization. Isn't Java just as often used for HPC as C?