Scientific Computing on the Erlang VM

jerf · on Jan 2, 2015

I ask this with the presumption that there is an answer, but the linked web page does not provide it: Why create this library?

I won't deny I'm also asking this because I do not personally see a reason... after all, I wouldn't ask if it were obvious to me. But I really don't understand what this is good for; it seems like binding Erlang to a Python library bound to underlying C and assembler code is doing nothing but bringing you disadvantage to accessing the Python with no corresponding advantage I can see; for instance, it isn't obvious to me that you can get any utility from Erlang's multiprocessor capabilities. It seems like, even on Erlang's terms, you'd be better off writing a Python program that does your task with numpy or scipy and having Erlang pipe whatever data it may be in possession of to a Python process in some more conventional fashion.

But I'm interested in whatever answer there may be. (Seriously.)

signa11 · on Jan 2, 2015

from: http://blog.lfe.io/tutorials/2014/11/21/1508-erlport-using-p...

But then, you may be thinking: I like supervision trees. I have long-running processes that I want to be managed per the rules I establish. I want to run lots of jobs in parallel on my 64-core box. I want to run jobs in parallel over the network on 64 of my 64-core boxes. Python’s the right tool for the jobs, but I wish I could manage them with Erlang.

oubiwann · on Jan 2, 2015

Bingo.

oubiwann · on Jan 2, 2015

If you're going to be handling large amounts of data, then Disco is what you'd want to use, not this.

However, if you're spending your day in an LFE REPL and want to be able to parallelize computation out to multiple Python instances, then this tool will be for you ("will be" because parallelization is in the queue: https://github.com/lfex/py/issues/38).

As things stand right now (without parallelization), this library means that Erlang, Elixir, LFE, Joxa, etc., hackers don't have to context switch out of their preferred mode into another language, but can do it from the comfort of their regular daily routine. (I know Erlang hackers who refuse to fire up a Python interpreter...)

In other words, this is very much like an IPython for the Erlang world (where the ZeroMQ messaging architecture of IPython isn't needed, since that all comes for free in Erlang).

okeuday · on Jan 2, 2015

Also, CloudI (http://cloudi.org) provides supervisor functionality for Python services to help keep source code in Python fault-tolerant. There have been no problems handling large quantities of data there.

rubyrescue · on Jan 2, 2015

We have a team of data scientists that pretty much exclusively use python and a team of server devs that nearly exclusively use Erlang.

I could see a use for this, though I will say we've pretty much made separate services and these services communicate (in order of urgency) - via HTTP, Rabbit, or by rolling HBase tables on some schedule.

Because of that decoupling, this seems less necessary, but I could certainly see a place for it.

m_mueller · on Jan 2, 2015

This is pretty much the idea I had about replacing MPI for my HPC projects - I'll certainly check this out once I have python bindings for my Fortran code (yes I'm serious ;-) ). Are there any benchmarks available that compare this to pure Fortran+MPI or C/C++ + MPI (with the lsci version using wrapped versions of the same code)? I think this could have great benefits for stability regarding Erlang's respawning of processes if you implement proper checkpointing facilities that can be passed through to the python code as well as C/Fortran code wrapped with python. Stability is the big thing to solve for Exascale applications - but performance kind of needs to be proven first when it comes to HPC.

rvirding · on Jan 2, 2015

You can use similar methods for interfacing C as well, if this is a better way for you to interface fortran.

m_mueller · on Jan 3, 2015

C bindings are compatible with Fortran, you mainly have to think about two issues:

* depending on the compiler / setting, the Fortran function names have one or two underlines as well as their module name prepended.

* index order for multidimensional arrays are reversed in Fortran.

Other than that the datatypes basically just work. So as long as you have C bindings this is not what I'm worried about, it's rather the overhead of the Erlang VM when compared to MPI (although for usual HPC loads this probably wouldn't even be an issue as long as it's not an order of magnitude slower, since most of them are not network bound if done right).

IndianAstronaut · on Jan 2, 2015

Will be interesting to see this project down the line. Erlang is a great platform to scale up scientific data analysis. This need is only growing because of the LSST and other big data generating tools

polskibus · on Jan 2, 2015

I wonder if anyone thought about reusing panda implementation as BIFs in Erlang. That would've eliminate one abstraction layer.

copx · on Jan 2, 2015

A LISP-wrapper around Erlang which calls Python code which wraps C/FORTRAN code..

.. now someone needs to compile this to JavaScript!

I am just glad that I do not have to work with it. I mean, what if there is a bug? Debugging this would be a nightmare.

Sorry, but so many wrappers / so much indirection is not good design.

bitwalker · on Jan 2, 2015

LFE is not a "LISP-wrapper around Erlang", it's a LISP that compiles to BEAM bytecode, just like Erlang compiles to BEAM bytecode.