Hacker News new | past | comments | ask | show | jobs | submit login
Redis 2.2.0 RC1 is out (antirez.com)
109 points by antirez on Dec 15, 2010 | hide | past | favorite | 39 comments



This is great news! I'm particularly interested in these changes:

* Sorted sets are now less memory hungry.

* Now write operations work against keys with an EXPIRE set! Imagine the possibilities.

I use Redis for many, many things. In fact, I realized the other day that without it, I probably wouldn't still be bootstrapping. Not because I couldn't use something else, but because I wouldn't enjoy the work nearly as much. Starting a company is a long, arduous road and finding joy in the work is really important.


Out of curiosity, what do you use it for? I use it for caching and session storage, but I still keep postgres for the heavy lifting. I love that combination, especially the fact that I can put session (and other non-essential) data into redis and they'll be persisted and available in milliseconds, but it can crash and I lose nothing.


I use sorted sets for a lot of stuff (timelines mostly). A few other examples are job queuing, error logs (I write all error messages to a fixed length list), and rate limiting. Actually that last one works pretty well. The key is something like:

  v1:limiter:UNIQUEID:TYPE:HHMM
where UNIQUEID is a username or IP address, etc, TYPE is the specific action to limit (login, signup, etc), and HHMM is a quantized timestamp (e.g. 1205, 1210). You increment this key every time the action occurs and use an expire to keep it clean. It's not perfect, but it does the trick for most usecases.

I also make pretty judicious use of databases to group different classes of data together. Primarily into 3 groups actually: transient data, persistent/customer data, and internal stuff like the error logs and job queues. That way I can dump data at different intervals for each group (I use a tool I wrote called redis-dump: https://github.com/delano/redis-dump).


Thanks, I'm just gathering use cases to see if people are using it as a replacement for a relational database. I don't think I would, I think it's not the best tool for that job.

Thanks again for the data point!


Redis is one of the best pieces of software I've used in the past few years. I use it so many ways in production it's loco. Having flexible data structures besides (string)k, (string)v is a huge boost.


Thank you, what is interesting about 2.2 is that we discovered that also operations against the plain old strings can enhance a lot the power of Redis.

For instance GETBIT/SETBIT will turn strings into a large bitmap. While GETRANGE/SETRANGE makes users able to use strings as arrays for fixed length data.

I think there will be big use cases for this new commands, as it is possible to store a lot of data in little space, with O(1) random access.


These additions make compact data storage and access much simpler from our perspective. Instead of loading and parsing an entire entry, we have access to just the data we want.

Nice work, Salvatore. Seriously.


re SETBIT, I was thinking "cool, redis is now also a bloom filter server!" . Then I realized I've been out of the loop for a while and probably you already implemented commands for that :)


I love Redis and the new features rock! I use Postgres and Redis in tandem and it is a great combo. Postgres for all the stuff SQL databases are good at and Redis for all the stuff they are not.


Really? What stuff is PostgreSQL not good at, exactly?


That tone is asking for some passionate replies ;-)

This isn't about PostgreSQL at all. There's things where relational databases are a good fit, and things that are a bad fit. As such, this isn't about what PostgreSQL does poorly, but about the cases where a key-value store with some pretty good data structures comes in handy. Would you really store temporary data in your relational database? Would you use it as a caching mechanism? Perhaps as a queue? Not unless you're crazy. So I guess this isn't about what PostgreSQL performs poorly, but about where Redis is a better solution.


I'm not sure how to efficiently implement redis-style sorted sets using SQL, but the fact that I have to think about it at all means it's easier to use redis.

A lot of the other things could probably be done with the right combination of stored procedures and clever SQL but again, redis makes it so much easier as to make things qualitatively different. It's just a bunch of C files with no real external dependencies (I don't even have to run ./configure before building it!).

Operationally, for a generic non-db-expert kind of person like me, redis is much simpler to manage than PostgreSQL. With redis I don't need to worry about vacuuming, write ahead logs, archiving, query tuning & statistics, lock management, etc (to name a few things from the config file).


trivial example: lists.

You have items that you want to retrieve (LRANGE) in the same order, or in reverse order you push inside (LPUSH / RPUSH). Usually you need the latest items (think to timelines), and all this should be FAST and efficient.

In Redis this is trival to model. SQL is so far from modeling this in the right way that you need an "ORDER BY" statement for all your LRANGE-alike query even if there is nothing to order, you want to retrieve things in their natural insertion order.

Sorted sets can model a zillion of use cases in the same way. They are ordered per score on insertion. You can ask for ranges, for the RANK of an element, and things like this, in a matter of microseconds per queries. Completely impossible to model with SQL in a natural and fast way.

Just to cite another one (but there are tens of cases like this), bloom filters anyone? With Redis 2.2 you can manipulate a unique bitfield at the lower level, and even accessing to the sub bytes if you wish, with very efficient operations.


I use sorted sets quite heavily for timelines. And it's awesome.

One usecase I can't figure out, is storing/querying IP address ranges. Is there a natural way with Redis to check if a number is within a given range? (without storing every value and without multiple calls)


There is a very easy way to model this, just convert the IP address into a 32bit integer! :) Then use ZRANGEBYSCORE to query.


What about the other way around though? Storing the range, say, 159.18.0.0 - 159.18.255.255, and then querying to check if an address is in that range.

The only way I can think to do this is to store the range as two integers, as you suggest, and query twice. The first to find the nearest lower bound for an IP and then a second time for the upper bound.


Writing things without doing a sync for every write.


Simple storage. Just compare the speed of storing key/value pairs. Redis is many times faster, particularly for inserts and updates.


I've been using Redis for 2 months now. It's a real pleasure to use.

I especially like the simple protocol. It's possible to write a simple client without any external libraries within days --maybe hours if you're really good ;). Try to do that with SQL or MongoDB (Javascript parser anyone?).


i think mongodb fills a need that isn't covered by redis or sql -- people still need semi-relational data that can scale beyond 100M rows.

redis is a nice memcached or scalable data structures replacement. we use it as a simple rabbitmq replacement.


Yes. I didn't want to criticize MongoDB or SQL based databases. Just point out that Redis' protocol is small and easy to implement.


why did you have the need to replace rabbitmq at all? (I love redis, but I'd expect using something designed specifically for X is better than using something else that also kind of does X)


RabbitMQ is a complex beast and was, last time I used it at scale, ridden with problems under load. We had extremely serious issues like spontaneous lockups.

We eventually abandoned it when we realized our queueing needs could be modeled in Redis. In a way that we fully control, understand and can debug. And it wasn't even much work!

I'd argue this is highly preferable unless your project really needs complex routing of the kind that only AMQP can provide. Most projects don't.


I use redis to replace RabbitMQ as well, but it's all abstracted away in Celery. A question: Don't you need polling when you use redis in this way? Does it have some sort of notification functionality, apart from the recently added pubsub?


Our queues are implemented using BLPOP which blocks until items arrive, so no polling is needed.

The newer pub/sub stuff promises to be very useful, too, but we didn't have a use-case for that, yet. Our app generally needs messages to persist until they are consumed.

What I can say however is that none of the concerns we initially had about performance/scalability held any water.

We are still running on a single redis instance (plus a slave) on a moderately sized server and it happily processes 100 messages/sec average between 4 producers and a varying number of consumers (20-100).

To add insult to injury our monitoring metrics show that this isn't even a worthwhile load for redis. The server is nowhere near breaking a sweat, the CPUs barely drop below 90% idle, there's no disk i/o to speak of, and the memory usage is more than reasonable (plenty headroom for our purposes).

Thus "just throw it at redis" has long become a common stopgap meme in this particular project. And so far we didn't have to replace any of these supposed stopgaps with something else.


Ah, very nice. It seems that I shouldn't have any qualms about throwing redis at most of my problems then, thanks!


in case you're wondering, we use a simple queueing thing called redpack:

https://github.com/luxdelux/redpack



thanks for the reply, I can see if you make no use of functionalities other than push/pop how redis would be just fine.

Could you be so kind to qualify what load are we talking about, as producers/consumers & message rate/size/persistence level?

I have used ActiveMQ with systems of about ten nodes and a message rate of <2k/sec and it worked fine, and I always believed rabbitmq was faster/more stable (erlang bias!) so if your volume is 500k/s I'm ok, otherwise I'd file this as another slightly worrying information about rabbitmq (after the problems I heard from reddit)


After doing some stability testing, I'm now running 2.2.0 RC1 in production on http://wasitup.com

Unwise, maybe. But thanks to the ease of doing this upgrade on several nodes with Puppet I couldn't help myself.


Jeremy Zawodny is also running 2.2.0 at Craigslist, so it's really the case of saying: you are in good company :)


Nice work! I'm also a big fan of Redis, use it extensively as a front-end smart caching layer in-front of an RDBMS. Compared to DB calls each operation is essentially a NoOp. Just like turning nitro on your web app - instant speed-up!


And he (antirez) just said:

"oops, just found a bug on RC1 (setbit/getbit) I guess it's going to be RC2 soon ;)"


sorry false alarm, no bug :)


and then he said:

"Nevermind, no bug about RC1, just my mistake".

Twitter to HN gateway. :-)


Does anyone actually use the replication in Redis to improve durability?


What does LPUSHX/RPUSHX do differently?


Push if list exists. This are the operations that are at the base of the twitter Redis caching implementation.

This operations are useful every time you want to use Redis for caching timelines, as the idea is: if this user is already in cache (at least a single item exists) then push against the cache, otherwise not.


Ah, thanks for the explanation. I'm actually exploring Redis for storing timelines!

Twitter's Redis-backed timeline storage is here, if anyone is interested: https://github.com/twitter/haplocheirus




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: