Anyone has an idea if there is any way to distinguish random from sequential IO?
In my experience sequential IO is never the problem. Instead, it seems to me that random seeks are really the only performance problem nowadays. In the most extreme case throughput is only ~100Bits/s instead of ~100MB/s. Unfortunately random seeks are hidden behind abstraction layers and thus quite invisible to programmers (until the system freezes). Maybe we just need to wait for SSDs to become cheap.
Take a look at blktrace on Linux. It logs all block I/O layer activity from when an I/O request is sent to the disk to when it's serviced. It also outputs which sectors on the disk were written to service the request. People have written a few tools to read these logs and show meaningful numbers/pretty graphs -- btt is the official one, and here's another one that I like: http://oss.oracle.com/~mason/seekwatcher/.
Among other things, you can distinguish random from sequential I/O using blktrace and its accompanying set of tools by looking at the distance between subsequent I/O requests sent to the disk.
...and cheap SSDs are pathologically bad at random writes -- at least an order of magnitude worse than an mediocre hard drive on average, with regular latency spikes of several seconds!
I call your bluff. 'iotop' may be included in many linux distro's but don't say every. My Arch box doesn't have it, not by default anyway.
edit: What I'm trying to say, is that your statement is overly broad. Not all linux distro's have 'iotop' installed. It is not a guarantee. Particularly on specialized distros.
iotop is a very thin ncurses frontend to taskstats in the linux kernel, and is totally dependent on TASK_DELAY_ACCT and TASK_IO_ACCOUNTING being enabled when the kernel is built.
They were introduced in 2.6.20, so the stupidly conservative distros that have been frozen for years on 2.6.18 or 2.6.19 don't get to play.
On nice distros that don't force-feed you their kernel and litter their repos with broken-out kernel modules, installing something like iotop is not necessarily just a call to the package manager.
'litter their repos with broken-out kernel modules'
You don't seem to understand why modules exist. It doesn't matter how many modules are available, they won't be loaded unless they're needed. Eg, for PCI hardware, unless you have hardware that modules.pcimap matches to a driver. There is no overhead from having modules available to load, just extra convenience next time you, say, add a NIC or somesuch.
Also no distro force-feeds you a kernel. You're always able to build your own in the rare event that you need to, or the more likely event that you just feel interested in doing so.
For a business, most people can understand the benefit of using the same software that a few million others do.
When I said 'broken-out kernel modules', I was referring to modules that are distributed as independent packages in the repository, instead of just sitting in /lib/modules/$(uname -r)
I know exactly why modules exist, having written my own several times. The real benefit for stuff distributed with the mainline kernel is not runtime loading (you could just build them all statically), but unloading and reloading.
In my experience sequential IO is never the problem. Instead, it seems to me that random seeks are really the only performance problem nowadays. In the most extreme case throughput is only ~100Bits/s instead of ~100MB/s. Unfortunately random seeks are hidden behind abstraction layers and thus quite invisible to programmers (until the system freezes). Maybe we just need to wait for SSDs to become cheap.