Hacker News new | past | comments | ask | show | jobs | submit login
Celery - Distributed Task Queue (github.com/celery)
70 points by albertzeyer on June 23, 2013 | hide | past | favorite | 32 comments



Here is a question: I often find myself wanting to do background tasks within my application (say, a Flask app) without running celeryd. While celery is quite useful, I just want to spin up an application and have it Just Work without external dependencies. With celery, now I have to run celeryd, run some sort of queue (redis, rabbitmq, etc), and just worry about all of these pieces simply to execute long-running background tasks.

Are there options for this? Akin to an sqlite for tasks, in python?


Running your Flask application with uWSGI could allow you to do just that by way of the spool and other related decorators: http://uwsgi-docs.readthedocs.org/en/latest/PythonDecorators...


+1 for uWSGI. The Spooler is awesome and also, if you need lower-level stuff, have a look at its SharedArea and Queue[1] frameworks, as well as Mules[2].

[1] http://uwsgi-docs.readthedocs.org/en/latest/Queue.html [2] http://uwsgi-docs.readthedocs.org/en/latest/Mules.html


I've read about https://github.com/tthieman/dagobah recently, but I haven't used it...also, if you think celery is too heavy, you'll probably think this is too. I've also heard good things about https://github.com/binarydud/pyres.

Despite all that, I usually just find myself running celery. I'll usually have a redis instance going anyways that I can use as a sub-optimal queue, and if it's a small enough site, I'll run a single worker instance on the same machine as the host.


Advanced Python Scheduler (APScheduler) is a light but powerful in-process task scheduler that lets you schedule functions (or any other python callables) to be executed at times of your choosing.

This can be a far better alternative to externally run cron scripts for long-running applications (e.g. web applications), as it is platform neutral and can directly access your application’s variables and functions.


I've been using Python RQ (http://python-rq.org/) as a lightweight alternative to celery. It uses Redis as its backend so I guess that might not meet the "SQLite" qualification, but has been light weight enough for my application (e.g. no reqt for a message queue like rabbit). Really liking it so far.


We're using RQ in production and are very happy with it. We chose it over Celery because the "protocol" is much more straight-forward than Celery - when something fails, I can peek at the state of redis or the RQ source code and know what's going on.

The only downside is that, like Celery, RQ forks for every task. This can feel a bit heavy for some of our simpler tasks, although it hasn't proven to be an issue yet.


>While celery is quite useful, I just want to spin up an application and have it Just Work without external dependencies. With celery, now I have to run celeryd, run some sort of queue (redis, rabbitmq, etc), and just worry about all of these pieces simply to execute long-running background tasks.

You can use redis as a backend for celery (which you should be using already for caching, probably), and AFAIK you can use a database as a backend too (although I haven't tried it).

That means that celery is exactly one dependency.

You need some kind of persistent store so that it can recover the task queue from hardware failure. But it can probably use the backend you already have.


APScheduler is a nice in-process alternative that doesn't require running another process or using a db for persistence. Quick to start using and no external dependencies. https://pypi.python.org/pypi/APScheduler/


Celery can use a database backend, such as Postgres. Maybe it can work with sqlite?


You definitely can use celery with sqlite via the sqlalchemy backend, but since rdbms' aren't very good task queues there are drawbacks: see http://docs.celeryproject.org/en/latest/getting-started/brok...


That's true. However, as an answer to the above it may well be acceptable without needing a different library.


Anyone want to chime in on why Python threads and subprocesses can't be used?


Subprocesses aren't used because the cost of forking is high. It's more efficient to fork a worker once, then have that worker keep doing work.

The problem now becomes transferring data to that worker, which is where celery comes into play.

Python threads cannot be used for CPU heavy loads because they all run on a single core.[0][1]

[0]: CPython threads are kernel threads but effectively act like user threads. All threads run within a single VM and thus restricted to a single core.

[1]: Jython does not have the same limitation.


IIRC, unless you explicitly state otherwise, Celery forks for every task. A lot of task queues do this because it becomes a lot easier to recovery from errors.

Multiprocessing makes it pretty straight-forward to manage inter-process communication with queues, but you'd have to write all the other stuff task queues handle for you, such as error management.


What is there to see? Celery is great, but has been around for ages. Anything new I'm missing?


I've been looking to replace our in-house postgres based task queue with something more suited for the task. What has been turning me off about celery/rq/qless is that they seem tied to the host languages (python/ruby), our backend is Go, and the Postgres mostly acts as a datastore. I'd love a somewhat dropin replacement where instead of reading/writing to pg, I'm reading/writing to redis/some other in memory store.

What I want is the ability add a task with a priority and metadata, and be able to pop those tasks in O(1) time.


I think RQ is simpler and better. http://python-rq.org/

Celery has a lot more control and features that you probably don't need.


I have tried to use RQ a couple times as a result of this thinking. Each time I end up going back to Celery because there always seems to be something I need that RQ doesn't have. For instance, in the most recent project, I wanted to execute tasks X minutes in the future- celery's eta kwarg is perfect for that, and I'm not sure how you would do it with RQ.


No knock to any contributors on here, but I'm not seeing why RQ is better. Documentation looks incomplete, seems to depend on Redis, the boilerplate doesn't appear to be any less complicated. Is there something I'm missing that this does much better?


We use celery pretty extensively. It's nice because you get the flexibility of multiprocessing, but also the ability to distribute tasks across many machines.

There is definitely an overhead to get it going. Using redis as your messaging backend reduces this cost somewhat.. rabbitmq is overkill for most of the cases we've used celery.

Have a look at rq if you're looking for something more lightweight: https://github.com/nvie/rq


A simple solution (and I see a couple of requests here) is to use eventlet as the underlying concurrency library and then just spawn a green thread (or a green pool! to limit concurrency) to handle long running/background tasks.


I have yet to read anywhere why using Postgres or Postgres with Django is a bad idea. Everything says a vague "suboptimal" without any real backing of that statement.

Anyone know the real reasons behind that statement?


The workers have to poll Postgres repeatedly at intervals, whereas a worker using a message broker (like RabbitMQ) connects and just waits for something to get tossed its way. Having your task broker separate from your database can save a lot of DB IO if you've got a larger number of workers or a busy task queue. RabbitMQ (and the other more full-featured message brokers) can also do a lot of really slick things with routing, prioritization, and all sorts of other goodies.

Of course, this doesn't matter if you're running a small site with low traffic. I wouldn't get too caught up in worrying about broker selection unless you're cranking out a lot of jobs, or have special needs.


So if I am reading this right, what you are saying is that RabbitMQ is more 'push' and a DB is more 'pull' or rather 'poll'.

Perhaps they should break the brokers down into those two groups.

I seem to recall Redis having a pubsub style interface, which makes it much more push but still is a bit cloudy in my mind.

Thanks for the reply!


If you are asking for postgres as a celeryd backend, it's not really what this dbms is meant to do... Redis is easy to install, run and manage (as long as you don't lean out of the window too much), so for celeryd it's preferable...


Right, but "it's not what this was meant to do" is the kind of vague non-answer that the parent was talking about.


I have been using Celery in a recent python project with rabbitmq and it as a very good experience, celery works really nice and reliable. Would use it again!


Outside of using AQMP, what does this get me over using Gearman?


If you're using Python and/or Django already and don't need to mix with other languages, Celery is pretty easy to work with, has excellent documentation, and is well-maintained. With that said, while there is no boundary to playing nice with other languages, it seems like only recently have people started to get serious about trying to do so with Celery (and sharing/supporting their work publicly).

I haven't used Gearman personally, but if I had a polyglot project, I'd give it a serious look.


How is Celery different from RabbitMQ ?


Celery is a task framework. RabbitMQ is a queue server. Celery can use RabbitMQ as a message broker i.e a way to transfer the messages from your application to the workers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: