I've been using supervisord for most everything (doesn't hurt that I'm primarily a python guy), but I'm slowly testing out Mozilla's circus (https://github.com/mozilla-services/circus) and it's been going great so far.
We've actually had the opposite experience - circus has been nothing but a pain for us, unfortunately. Timeouts with lots of processes (50+), random CPU usage through the roof, and lots and lots of bugs that shouldn't be there.
There was a nasty race condition for a while that locked circus up, and it wouldn't restart crashed procs.
For a while, you couldn't specify a timeout on the commandline - on a major version, too.
It was 1.0.0 in master for a while, and then went backwards to 0.7.0. We were sitting on master for the fix to the aforementioned timeout, and so no updates happened until we realized what happened and then manually "downgraded."
All in all, it really feels like we're either using it wrong (probably, we're adding & removing processes on the fly), or we're the only ones really loading it up with a ton of processes which may or may not flap a lot.
If you don't mind, why are you moving away from supervisord?
Circus is still pretty new and rough around the edges, so I haven't totally moved away from supervisord yet. My foray there is mostly exploratory, trying to get used to it and its quirks before really diving into a comparison between the two.
That said, the ability to manage sockets sounds very interesting, hopefully simplifying my stack even more (my current use case is in getting the most performance out of a small VPS, so removing things from the stack would hopefully clear up RAM for actual web workers). I've been running a small site using nginx->circus->chaussette->django and it's faster (and was simpler to configure) than my standard nginx->supervisord->uwsgi->django deployment.
> There was a nasty race condition for a while that locked circus up, and it wouldn't restart crashed procs.
This was fixed in 0.7.1
> For a while, you couldn't specify a timeout on the commandline - on a major version, too.
To my knowledge this was never released.
> It was 1.0.0 in master for a while, and then went backwards to 0.7.0.
Yes we decided for a while the next version would be 1.0 then we changed our mind. All happened in master and was not released, so I don't see the problem here;
> All in all, it really feels like we're either using it wrong (probably, we're adding & removing processes on the fly), or we're the only ones really loading it up with a ton of processes which may or may not flap a lot.
I am still available for any help. Circus is young but works for our needs. If you are happily using Supervisord, that's fine - but keeping on posting your negative experience on HN from 3 months ago without having tried the tool recently --while we addressed to my knowledge all the bugs you mention-- is a bit inapropriate imho
We're still using Circus under production loads, and we're still seeing it go unresponsive and chew through a ton of CPU. Unfortunately we haven't been able to reliably reproduce it, so until we can, it's not something we can fix.