Hacker News new | past | comments | ask | show | jobs | submit login
How Grooveshark Uses Gearman (wanderr.com)
42 points by wanderr on March 28, 2011 | hide | past | favorite | 8 comments



I implemented Gearman at a previous startup and then recently have done a "clean room" re-implementation of it for a personal project.

First: Gearman is foremost a job quque, as others have mentioned. Obviously most jobs have SOME data submitted along with them.

Second: After implementing it as a job queue, another team down the road needed a message queue. So we ran benchmarks. Turns out, Gearman was a faster Message Queue than the dedicated *MQ platforms. We were getting killer performance out of the Gearman Java client. Of course, we had to spend 2 weeks fixing bugs on the Java client to get it to work.

Which brings me to my last comment... Gearman is NOT a mature system. It's brilliantly fast and very good at what it does, when it all works. But you may find yourself, like I did, getting personal with the Gearman code itself to explain wonky behavior. Like, hey, why the heck isn't this job timeout being enforced? Fail.

But as for being awesome, let me give you a taste:

It has a MySQL client. So on our reporting database servers, we installed the gearman client. Which meant we could create UDFs in MySQL that would be actually backed-out by Gearman. In other words... we could do something like this:

Select user_id, UserRiskScore(user_id) from users where user_id = 123;

And UserRiskScore kicks off a Gearman job... which in this case could be just an entry point for a Gearman map-reduce: it dispatches many other jobs, aggregates the result, and then returns it to MySQL and your waiting query.

And, of course, the actual worker code can be written in languages far more useful than SQL.

Now, I'd never install a system like that on the live database. You don't need 200 queries waiting on a gearman response when you've got buckets of data innodb is trying to flush to disk.

But on our reporting db cluster it was brilliant. It made our code wonderfully centralized. A FRONT-END coder can get complex results, having kicked off a map-reduce job without ever having to know what, say, HDFS, or MongoDB, really is.


We have a similar setup for our application I have to say that I agree with the article. Some of our workers work in tandem to solve small data chunks, some do async fire and forget types of jobs. All in all, gearman was a pleasure to work with.

As far as the memory job queue not being reclaimed, that might be mitigated by using a secondary storage for it ala mysql via libdrizzle.


I'm curious as to why Gearman vs something else, say RabbitMQ?


I currently use Gearman in my app (and supervisord), but I also looked into RabbitMQ. It was this comment on SO that made me go with Gearman.

I would say that Gearman is better for queuing "jobs" and RabbitMQ is better for queuing "data". Of course, they are both really the same thing, but the way it works out for me is that if you are trying to "fan out" work to be done, and the workers can work independently, Gearman is the better way to do it. But if you are trying to feed data from a lot of sources down into fewer data consumers, RabbitMQ is the better solution.

Source http://stackoverflow.com/questions/2283955/rabbitmq-or-gearm...

I tried both technologies in my app, and Gearman was eventually easier to work with. YMMV.


Unfortunately, I don't remember why we originally went with Gearman. We did evaluate RabbitMQ (or was that ActiveMQ?) and at the time there was something that we wanted to do that Gearman was better suited towards. That was a long time ago, and my memory sucks!


Or zeromq?


I don't think 0MQ existed when we made the decision to go with Gearman, but a 30-second glance tells me it would probably be well suited to build a job server on top of, but isn't exactly an out-of-the-box job server like gearman is. For one thing it's not immediately clear to me how you would submit jobs via zeromq for which workers don't exist yet or aren't currently connected. That's just a first impression though since I haven't dived into it very deeply, obviously.


I didn't mean to knock the decision about choosing Gearman. I was just interested in your thoughts on comparing the two.

That's correct that ZeroMQ isn't a server. I tend to describe it as a sockets layer that has messaging semantics. If a host isn't up, you can still send messages to the socket. It queues them up, like you'd expect a messaging system to do.

So this also means you can open a socket to a host that doesn't exist. Imagine you turn a server on with no clients up yet. A PUSH socket will block. When you connect a client, by opening a PULL socket, the message exchange starts.

These socket types can be inproc, ipc, tcp or pgm. inproc is an in process queue. ipc is basically unix sockets. tcp is... and pgm is pragmatic general multicast.

To build a broker, you'd probably use something other than PUSH/PULL sockets. You would use REQ/REP (request/reply) sockets. These sockets give you synchronous communication. The request sends a request and is guaranteed a response.

I think their doc explains the rest better than I could, I recommend reading this: http://zguide.zeromq.org/page:all

I think this description offers the gist of how REQ/REP, and their nonblocking counterparts XREQ/XREP, can be used to basically script together your ideal broker for many conditions.

When you use REQ to talk to REP you get a strictly synchronous request-reply dialog. The client sends a request, the service reads the request and sends a reply. The client then reads the reply. If either the client or the service try to do anything else (e.g. sending two requests in a row without waiting for a response) they will get an error.

But our broker has to be non-blocking. Obviously we can use zmq_poll(3) to wait for activity on either socket, but we can't use REP and REQ.

Luckily there are non-blocking versions of these two sockets, called XREQ and XREP. These "extended request/reply" sockets let you extend request-reply across intermediate nodes, such as our message queuing broker.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: