Hacker News new | past | comments | ask | show | jobs | submit login
Celery 4.0 (celeryproject.org)
181 points by mlissner on Nov 12, 2016 | hide | past | favorite | 50 comments



Celery is one of those things in Python that you can't (sometimes unfortunately) live without. Earlier versions of Celery have some difficult bugs and inconsistencies that made it feel like a very tough tool to work with and required a lot of developer diligence and operational experience to make sure your tool didn't break. Things like message memory explosion (multi-pass deserialization), poor defaults, difficulties debugging and tracking exceptions, poor monitoring tools like 'flower' etc. all lead to this. Problems were exacerbated by the fact that simple async operations in python (which are easily fixed in more concurrent languages with a simple go func{}()) end up requiring a heavy distributed solution like Celery (or the lighter RQWorker) which creates a whole host of issues.

I imagine as native async tooling improves in py3x (async/await, aiohttp, and other tools) use of celery to do trivially concurrent things will decrease and Celery's usage will focus on more complex workflows (chords, fanouts, map/reduce).

Looks like many concerns we're tackled here (thanks Celery team) and I'm looking forward to playing around with this release.


I've managed to live without it. For low and medium traffic sites it's hugely over-engineered. For all the sites I manage, I run a single cron task that triggers a range of background jobs. It's an approach worked very well for nearly 10 years.

I understand there's a genuine use-case for Celery but like many technologies people are told "if you need a task queue use this" when there are much simpler solutions that are more than good enough for most.


Celery was never meant as a replacement for cron, it was simply a nice bonus that fits the messaging pattern well. Writing a task queue is actually very simple using for example Redis, but that doesn't necessarily mean Celery is over-engineered IMHO. It's very easy to forget the support required once your system is in production.

Disclaimer: I'm a contributor


By "contributor" you mean you are the main author, right? XD

Joking aside, we use Celery in production for asynchronous number crunching. We build a web UI in Django that raises Celery tasks that run long number crunching jobs. When the number crunching is done we look at the report through our Django web UI.

It works well enough, though we wish there were better built-in task management. E.g. A built-in API to reshuffle tasks (e.g. for I/O resource balancing) would be nice. celery purge also seems too drastic in purging every task in every queue. Or maybe we're just doing it wrong.


I'm not saying Celery is over-engineered in general. It's just over-engineered in the context I've often seen it recommended in. i.e. for people learning or people with fairly modest requirements.


Yep, I first asked on Stack overflow the best way to achieve a background task in a Django app, and the answers all said Celery. Considering it got run around once a week, it was a very over engineered solution I ended up with.

Its useful to know Celery, and gets used in a proper context in my current work, so I guess learning it wasn't a waste.


I think the advantage with tools like celery is that it deals with the many different failure scenarios pretty well.

It's kind of like jQuery's ajax. Of course you can figure out how to make a simple replacement. But then you have to manage the 100 different edge cases for when your code doesn't go down the happy path.

Much easier to write a simple task queue that's good enough than a replacement for $.ajax, though...


Well said


For low and medium traffic sites, the simplest usage patterns are really easy: http://docs.celeryproject.org/en/latest/getting-started/intr...

The best thing about celery is that you don't have to use any of the advanced features. If you need the basics, stick to the basics.

On the flipside, if and when you do need more than the basics, your system will grow with you. No need to hack up cron monstrosities as you grow beyond a single server. Though, cron vs celery is kind of an apples to... celery comparison.


Even if the getting started is simple, there's still a ton of code and dependencies you're bringing into your project. Most importantly you've suddenly got dependencies on persistent processes such as Redis and RabbitMQ which need to be installed system wide. That's something that needs to be factored in to deploys, configs, restarts etc. You now need new tutorials for every deploy method (Heroku, Webfaction, PythonAnywhere and other outliers).

Yeah - everyone should ideally be comfortable with all this stuff but I try and keep anything that's more complex than a pip install in a virtualenv to an absolute minimum.


When building a small web app a few weeks ago, I tried to avoid additional dependencies, too. I was a bit stuck when I need to send an email asynchronous. Do you have a _simple_ recommendation to solve this without pulling something huge like celery+redis in to solve this task?


Depends on what you need. The bare minimum is to just spawn process so the request can return and your email can send. You tend to lack much control of failure conditions that way so you'd want to have some code in in the normal request/response cycle that checks for success or failure and informs the user.

You could add your emails to a db table and have a cron job consume them.

But for sending an occasional email? I've never really had a problem with just connecting to the SMTP server in the request/response cycle. If it takes more than a second to send then something is seriously wrong. You could use Mailgun or similar services which have their own queue for handling bulk sends and further reduce the likelihood of a problematic blocking of the web-server.


I'm in a corporate network where sending the email over our internal smtp takes about 5 seconds. I dont want the user to wait for this delay, so I'm using celery currently. It just felt a bit too much for such a little task.

Maybe I will try something simpler like http://stackoverflow.com/a/4447147


RQ + Reddis is "easier" http://python-rq.org/

If you are on AWS, you could just use something like their email service to fire off the request, then you don't need a queue at all.


For little things, it doesn't matter much what you do, and for bigger things I've moved to Luigi for background tasks.


"Starting from Celery 5.0 only Python 3.5+ will be supported."

This is a small detail, but I'm glad that one more big Python project is dropping 2.x support. The 2/3 split is not good for the community.


Agreed!


IMO the secret ingredient to scaling almost any Python web application, or even implementing it in the first place. It's a shame they had to curtail some features due to lack of funding.

Edit: Apparently they curtailed some features for simplicity as well.


Thank you for the kind words, it's so very appreciated :)

I have merged many features, like broker transports, result backends, etc, and while the initial contribution was great, it ends up being unmaintained with issues that nobody fixes.

If there's any feature that you really want back, chances are the problems with that feature are not super difficult to fix, so please reach out!


Asksol we use celery at AppEnlight extensively and love it. Thank you for your great work.


Do you have a low-hanging fruit tag for new contributors to start with?


Some have the opinion that Celery has (or had) too many features. I know I had to twiddle with a lot of settings in Celery for my unusual use case, but it worked well in the end. There was a long standing bug which ended up being my misunderstanding a certain setting. All that said, I would use Celery again.


Better to have 10 good features vs 20 bleh ones right?


On my last project I had to use Celery. In the end, I had to look inside the source... having done the same with Python + Django I was apprehensive.

In the end I shouldn’t have been, it was a pleasant surprise. It's great 4.0 has come out.

Almost every Django developer will touch celery at some time, if your organisation can support development please do, it's not just an important piece of software, but well put together too.


> Nowadays it’s easy to use the requests module to write webhook tasks manually. We would love to use requests but we are simply unable to as there’s a very vocal ‘anti-dependency’ mob in the Python community

I'm not a heavy Python user, and I've never heard this before. It sounds... less than good.


The author may be referring to libraries themselves having dependencies. These attitudes are changing. For example, last week the Django community published a draft proposal that declares that "Django can have dependencies". The "Background" section is a good read and helps explain the origins of these attitudes: https://github.com/django/deps/blob/master/draft/0007-depend....


It's not universal. I'd rather bring in a dependency than reinvent the wheel.

requests is used so widely that I'm surprised to see this raised as an issue.


Congratulations to @asksol and the rest of the team for sealing the deal on 4.0! I've been waiting for a number of the features now in 4.0 for a long time now. I know you guys have been busting your asses for a long time and juggling with some complicated dependencies along the way.

Some time ago I built a little celery addon library as a sort of experimental way to solve the problem of having dynamic celery beat scheduled tasks. I never ended up implementing anywhere for a few different reasons: https://github.com/fuhrysteve/CeleryStore

I really like the concept behind the old djcelery project. But I don't use django much these days, and I'd like for it to be more compatible with tools I've become more familiar with (sqlalchemy / etc).

Do you have any advice for how to approach this? I know 4.0 introduced some new abilities to add beat entries via API.


In case you also run into the depreciation of sending error emails[1] I've had to switch back to using a decorator[2] to catch unhandled exceptions and generate error emails.

[1] CELERY_SEND_TASK_ERROR_EMAILS config removed http://docs.celeryproject.org/en/latest/whatsnew-4.0.html#fe...

[2] https://gist.github.com/alanhamlett/dc8cdd4721ea63053f14#fil...


Just started using this for the first time. It's pretty nice. I'm still trying to figure out how to really use it with Flask/Connexion cleanly and not in the top level module, but it's made my life much simpler for handling long running tasks (e.g. When needing a status 202 response). Great project. Sad to see SQLAlchemy was removed from the brokers though I understand.


You don't want to use a DB a broker.

For AWS they have SQS which is awesome. Now if someone would write a Google cloud pub/sub, all would be well in the world.


Hum, Google Pub/Sub seems roughly equivalent to SQS to me?

https://cloud.google.com/pubsub/docs/overview


Yes but no one wrote the kombu broker for it yet to my knowledge. To use a new broker in Celery you have to write code on how to send and receive messages.


Has anybody compared Celery 4 to rq (which started off as a much simpler+performant alternative to celery).

Would love to see how they stack up after this release.


We recently moved one of our services from Rq to Celery. We are using an older version of Celery, but this comment should still apply. While Rq is a great way to go in the start, if you have a large number of messages you are handling, you'll get into memory bottlenecks. This isn't a problem with Rq, but with using Redis as a broker. So if anyone is considering Rq vs Celery, they should keep this in mind.

The reason we switched to Celery was the volume of messages we were handling. Since Rq relies on Redis, all your messages need to fit in memory. While Rq was great and simple to setup at start, as we grew we were consistently dealing with Rq breaking because Redis was full and stopped accepting any write operations.

We moved to Celery because it could use RabbitMQ as a broker. RabbitMQ offloads most messages to the disk which has nicely taken care of the memory limitation issues.

With Rq we would get stuck after 10K messages (our messages included images so individual message size was large). With RabbitMQ I've seen the queue grow to about 120K without so much as a single hiccup.


That's very insightful. Quick question, since you are doing this in production - how are you serializing images into a message ? Base64 or something else.


Not the person you're asking, but from the experiences I've had with Celery + other messaging queues: Don't pass around large binary blobs if you can avoid it. Whether that is an image or something else is irrelevant.

You can do such things as passing a database unique key, GUID, or file-path to the raw data on disk. Obviously, you will also need to engineer around that if you've got a distributed system. The tangential benefit is that you're not using a "messaging" queue or system for persisting or semi-persisting your image data. That's a big no-no as such systems are transient in nature and that often doesn't align with binary or image processing.

Base64 is for when you want to pass around binary in a text-based format. E.g. XML or JSON. But do keep in mind that because of the encoding format, converting binary data to base64 does increase the payload size by about 15% or so.


We just do a str(file.read()). Not the most elegant solution. :)


You really should not ram images in your task queue. Send them off to S3 or Cloud Storage with a GUID, and send along the GUID however you want (JSON is fine)


I have been using celery for years now and I love it. I was excited to see 4.0 with SQS support. However after giving it a spin I run back to 3.x.

Unfortunatelly version 4.0.0 has too many bugs for now. I will need to check back in few months. Better tests and better mocks would prevent the issues.


Many thanks Asksol, beautiful software. Have been a happy user for many years now. Hope to try out 4.0 soon!


can you suggest a monitoring tool for celery? I want to view what tasks are currently working... is flower the only solution? thx


Have you tried running 'celery events' on the command line? I find this sufficient in most cases.


should I enable something to activate celery events?


a rabbitMQ client? what's so awesome that everyone here says it is essential for python web projects? Honest question, never used python for that.


I haven't used Django in a while, but, if I'm not mistaken, it's basically the easiest way to add any kind of asynchronous behavior to a python web server. There are few other options, and I don't think any of them are as mature as Celery.


The python Tornado web server does have the ability to do async tasks. However, it'd better be non-blocking otherwise your nice async web-server is going to stop serving requests while your task is processing.


Worst experience with a python toolkit ever. I hope new versions fix all bugs and issues.


Of course, all other software is bug free...

Honestly this is a complex problem area and I think the Celery developers have done an excellent job of making it pretty trivial to get up and running while providing lots of flexibility for more advanced users! Not an easy feat.

Are there bugs? Of course - but I've never come up against one that I can't work around. Is that annoying? Sometimes but that's software development.


Being a user of celery, I find this comment unfair.Adding some points about why you consider Celery to be bad will help.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: