I would recommend his book [1] to anyone interested in systems performance. What really caught my attention is the focus he puts in having a goal and applying a method to solve performance issues. Many times, I have found myself "lost" while isolating a performance issue. Not anymore.
I'm just a lowly student assistant that deals with a lot of Linux machines running Hadoop but I'm totally in love with these presentations from Joyent and the SmartOS guys.
I seriously considered moving to SmartOS as ZFS and Zones and likely Dtrace as these features that would make my job, that largely comes down to organizing running software on machines and debugging problems, far easier..and would allow me to use the machines to better degree but in reality it's not going to happen.
Nobody is familiar with Solaris userland. I'm not a sophisticated and educated systems engineer at Joyent I'm just a stressed guy trying to fix problems. Unfortunatly Linux is pretty good at making it work because a lot people are in a similar situation and someone will fix it for me.
I just don't have the time and knowledge and energy to e.g. fix native Hadoop libraries in the ecosystem to build with another libc or make my own or other applications able to run without some Linux specific crap..
That beeing said I really thought about pushing SmartOS/Solaris but as a lone fighter It would be suicide in a world where everyone knows apt-get install <whatever> and get his shit done in a reasonable way..
Maybe it's something for specialised application and not academia
I've came pretty far with just strace and perf top and most problems I had in my own application where better analyzed by valgrind and kcachegrind or massif and the visualizer...
You might be interested in our ongoing work to execute Linux binaries in a SmartOS zone.[1] It's still an area of very active development (one needs to unfortunately follow the source[2] to follow its progress), but we have a ton working -- and it's all being done in the open. (And it's certainly very nice to be able to "apt-get install" something and then be able to DTrace the resulting application!) We're close to having enough working to be able to document where we are and get others kicking the tires, so stay tuned to the SmartOS discussion list if this is something you're interested in!
You might be aware of this but FreeBSD has all of those features that you mention. The objections that you have against Solaris might still apply though.
Yes. I'm running ZFS on Linux and while ZFS is really great it's not really good integrated in the kernel and sometimes pretty unstable at least in my rather esoteric scenario... Despite other claims FreeBSD suffers similar problems.. (https://clusterhq.com/blog/complexity-freebsd-vfs-using-zfs-...). Other problems are that jails are fine but there is no disk I/O limitation possible...
I've also thought about FreeBSD and while pkgng is really great it's a similar problem.. I'm stumbling upon bugs or untested things and I'm unable to contribute time to fixing them.
Do you mind pointing to your PR's with the bugs you've found or at least mention what they were? It sound like you've found a hell of a bugs/problems in a system (and I am thinking about FreeBSD now) that me and huge number of other people are running without any issues, so it would be beneficial for everyone, if you'd share your problems with PR's - there is active community around it that can fix issues if you are unable to do it.
Sorry if I was unclear. I stumbled upon a few issues running ZFS on Linux that are known and on the development roadmap. Things like ARC integration and better failure handling in case of disk problems.
I don't run anything big on FreeBSD and ZFS. I have not experienced problems on a raidz2 ZFS fileserver that runs FreeBSD except that disks drop out quite randomly but I've yet been unable to pinpoint that and it's likely that these are hardware issues as the system runs on budget hardware.
The biggest problem I've had with ZFS on Linux is that the arc_cache won't back off from memory quick enough. There's still an edge case in their somewhere where it hits swap and effectively locks up (its still running, just amazingly slowly). The fix is to set the arc_limit low (like 1GB) - it's not a hard limit, but it stops it eating all the RAM on my 8gb ram box.
Sorry for off-topic. Do you have any opinion on HPCC? It seems to be aiming at the same goals as Hadoop but claims to be able to do more and have better performance. I tried to find more first hand experience reports, but it's rather obscure and not used that widely.
It's no dtrace, but check out systemtap "stap". It provides a lot of similar functionality on modern linux. It can actually use existing dtrace probe points if you have a binary like mysqld. The ability to follow call graphs, and insert arbitrary logic, is amazing for chasing down system issues.
Sorry, like other talks at LinuxCon, it was not videoed. I think it would be a useful to have a video of it, so, much as I hate to give the same talk twice, I'll probably do it again at some point for the video.
[1] http://www.brendangregg.com/sysperfbook.html