Erlang is awesome. The only problem preventing me to use it where I want to is t...

marianoguerra · on June 21, 2017

something specific? did you tried raw file handles? http://erlang.org/doc/man/file.html#open-2

atemerev · on June 21, 2017

Yes, and delayed_write as well.

I tried to port a (very simplistic, but fast) market data append-only database from Scala to Erlang. In Scala, I have no performance issues, but the code is unnecesarily complex to my taste.

In Scala, I am getting around 100000 events per second, with great additional optimization margins (memory mapped files are great). In Erlang, it barely works with few hundred events per second.

Right now I have googled that disk_log in Erlang is fast enough, but it uses Erlang's own internal binary format (there is an option to plug custom codecs, but it is wonderfully under-documented, to say the least).

This looks strange to me, because Erlang is well-optimized regarding the network IO. What's the difference for file IO?

lpgauth · on June 21, 2017

100K/events per seconds is easy to do, you can base yourself on fast_disk_log (https://github.com/lpgauth/fast_disk_log)

derefr · on June 21, 2017

> Erlang is well-optimized regarding the network IO. What's the difference for file IO?

IIRC, async network IO is handled by the VM's scheduler threads each just calling a non-blocking select() on the fdset of sockets it holds port()s for every once in a while. This is because TCP/IP implementations are pretty much guaranteed to expose a non-blocking select() for sockets, and do in every OS Erlang is written for.

Disk IO, on the other hand, is done by throwing the calls over to special async IO threads, which have to send responses back to the scheduler that wants them using (I think zero-copy) IPC. This is done because not all OSes have the equivalent of non-blocking select() for file handles (i.e. what you get from using fcntl(2) + read(2) in Linux.) So Erlang's disk IO BIFs fundamentally require context-switches and/or NUMA messaging.

You get to avoid this if you use your own NIFs—which have the lovely property of running in the calling scheduler by default, with the ability to decide on each call whether the workload for this particular call is small enough to perform synchronously, or whether it should be scheduled over to a dirty scheduler, blocking the Erlang process and yielding the scheduler. In other words, this is like the disk IO BIFs in the worst case, and can be a lot faster in the best case. It's a lot like NT's "overlapped IO" primitives, actually.

You could also use higher-level BIFs. There is a reason Mnesia is implemented in terms of special BIFs (DETS) rather than DETS being a functional layer implemented using the disk IO BIFs. Mind you, DETS probably doesn't have the API you're looking for, but there are a number of NIF libraries that plug low-level "storage engines" into Erlang as NIFs that provide equivalent convenience:

• https://github.com/cloudant/nifile

• https://github.com/basho/eleveldb

• https://github.com/gburd/lmdb

And there's also always the option to do what Erlang-on-Xen does: forego anything that "devolves into" disk IO entirely, implementing the file module in terms of network IO—the p9 protocol in their case. This is actually likely the lowest-overhead move you can make if you're going to be running your Erlang node on a VM that would just be talking to a virtual disk mounted from a SAN anyway. Instead of the OS mounting the disk from the SAN and Erlang talking to the OS, you can just have Erlang talk to the SAN directly using e.g. iSCSI.

twic · on June 21, 2017

If you are writing sequentially to a local disk, perhaps it would make sense to create a tiny shim process that listens on a socket and writes to a file. You could do that in C, or perhaps even as a socat invocation.

derefr · on June 22, 2017

Do note that this should have about equivalent performance to Erlang's disk IO BIFs, as this is essentially what they're already doing—the async IO threads are the "tiny shim process." (The difference mostly comes down to the VM having an efficient internal IPC protocol to them, and the runtime being able to control their CPU core affinity.)

Note also that if you turn ERTS's async IO threads off (set the number of them to 0), you should get improved throughput on disk IO tasks, as you're then forcing the scheduler threads to do the disk IO themselves. Of course, this trades off against latency, because the fallback here is blocking calls.

Sadly, the async-thread-pool architecture has meant Erlang has had no reason to implement anything like Linux's kernel AIO support. (I wonder if they'd accept a patch that specialized the efile driver for Linux, the way it uses overlapped IO on win32...)

atemerev · on June 22, 2017

Interesting idea! Thank you for the suggestion.

StreamBright · on June 21, 2017

I have noticed that as well. I have tried to optimize the scheduler and all but I could not get the performance up to a reasonable level. The worst part was that I could not figure out why.