Pretty curious why you are beanstalkd instead of using Redis (BRPOP & BLPOP)...

pimeys · on Sept 23, 2012

For our purposes Beanstalkd has tubes and built-in delays for jobs. We can also set up a deadline for work items to prevent loss if the worker fails.

pjscott · on Sept 23, 2012

Really, there are a lot of things you might want from a queueing system that Redis doesn't easily give you. What do you do when a worker crashes, or takes too long? What do you do to a queue item which reliably crashes a worker, or causes it to hang? Can you push back when queue lengths get too high at some part of a multi-stage pipeline? And then there's all the front-end goodness for debugging: you could easily want things like graphs for flow and queue lengths. And how about master-slave replication, failover, sharding, and so on?

We're using Redis for our big document processing-and-indexing pipeline at Cue, and it's great software, but it's not a ready-made queueing system. All of the features I mentioned above are things that we've had to build ourselves. Redis is more like a general-purpose building block for all kinds of data systems.

pimeys · on Sept 24, 2012

> Really, there are a lot of things you might want from a queueing system that Redis doesn't easily give you. What do you do when a worker crashes, or takes too long? What do you do to a queue item which reliably crashes a worker, or causes it to hang?

Our queue items are plain id values, which trigger a set of actions from the database to the internet. If there's a database failure or the process itself crashes it is very nice to know our reserved work items will not be gone but released back to other workers to process.

> And then there's all the front-end goodness for debugging: you could easily want things like graphs for flow and queue lengths.

I have lots of graphs from the system in Graphite. Works perfectly.

> And how about master-slave replication, failover, sharding, and so on?

Sharding is easy to do, just specify an array of servers in the clients and the clients will shard.

Replication is a bit different problem. The server will write a binlog file to the disk, which is then backed up. We have beanstalkd servers waiting in another machine pointing to the same binlogs. On an error situation we just switch them on and set our routes differently.

Yeah, the biggest problem with Beanstalkd is the missing replication, but we can live without it.

> We're using Redis for our big document processing-and-indexing pipeline at Cue, and it's great software, but it's not a ready-made queueing system.

We're not doing so heavy processing, but we're relying on many third party services, which can fail randomly. The retrying system is a must and of course we cannot miss our work items so often, so the deadlines help.