By saving bandwidth, do you mean saving network bandwidth between a Redis server and a client on another machine? In that case, another solution would be for the user to write her own daemon (in a language of her choosing) that sits on the same machine as Redis, listens for "custom" commands from the remote client, and carries them out by communicating with the Redis server over a local socket.
I will be interested to see how scripting Redis with Lua measures up to that solution. The separate daemon would have the overhead of protocol handling for each communication with Redis, but it could also be written in a compiled language, a language with better concurrency support, etc. It would also be trivially sandboxed from the Redis server.
Teaspoon, this was definitely one of the ideas we had, but the performance hint of doing the I/O with another process is basically almost as bad as the socket in between. The problem is not just the amount of bandwidth (but it is one of the problems when there are a massive number of clients, we have an use case that I can't mention in detail about this). But also the problem is, the I/O is where most of the time is spent, and it is a shame. With Lua scripting we fix part of this problem allowing to deliver more performances.
Proof is what it was possible to do with variable argument list push. You can push something like 350k items per second per core just using an LPUSH with a few more arguments per item.
Makes sense to me. Do the Lua scripts get lower-level access to the data structures, too? E.g., could I write a variable-argument-list LINSERT that inserts M items in O(M+N) time rather than O(M*N)?
I don't think such low level access will be allowed. Would be cool but it is a lot of work and means to change the scripting layer every time we change the internals. I'll start with something much higher level than that.
in the redis community we have also discussed having a daemon sitting next to the server and doing localhost redis communication.
The downsides to doing this are it introduces an extra piece to break.
W/ luajit2, which is almost as fast as post-JITted Java, the speed of the scripts are amazingly fast (Salvatore's current method of calling the scripts is dead on, it requires no interpretation, it maps directly to lua_p* calls).
Disclaimer: I wrote AlchemyDB, embedded lua in redis a while back and have played w/ it constantly (it is robust and mind bogglingly flexible).
As for the speed differences of embedded lua vs a daemon sitting next to the server, Alchemy has its test suite in Lua and some tests I run via an external client, and some tests I run internally via embedded Lua.
The speed difference between client/daemon and embedded lua becomes VERY evident (10X faster) on large loops, where I/O and TCP kernel time are saved ... but as redis is single threaded server, scripts block the server during their execution, so it is just dangerous for novice programmers.
All in all, if redis embeds lua correctly, it will really open up the project and it is a very minimal bloat, lua is tiny, and it is just one command :)
It does sound like an amazing idea, but I can't help but worry about the focus of redis. It's a long time that cluster is "in the pipes" and it never seems to get focus for more than a couple weeks (I may be wrong, but that's how it looks from outside).
I just hope cluster-ed redis will eventually show up, as well as disk-backed redis... etc.
Hello Julien, the cluster and diskstore are completely different projects from the point of view of priority: Redis Cluster is all I'm doing every day more or less, but I stop from time to time in order to focus also on 2.4 since the cluster release date is too far (later this summer) to block everything else in the meantime: we need to provide something to the user at the same time we develop the cluster.
Diskstore is just an exercise for now. Likely we'll ship 2.4 that is an improved version of 2.2. Then Redis 3.0 that is 2.4 + cluster and other things. Later if even diskstore will be good enough, we'll ship it, but it is possible that we'll mark diskstore as "off topic".
About Redis Cluster, it is no longer into a private branch. It is into unstable, and you can even play with it, check the latest antirez.com blog posts for instructions. Currently I'm designing the second layer, that is the resharding and how master-slave nodes interact. It does not need much coding, but requires to get the details right now that we have a base.
You'll see something about Redis Cluster soon.
About scripting, don't think that every blog post I do means I'll spend a lot of time on it. What will happen is that in the following weeks at some time I'll send something like one or two mornings of work to get an alpha version with scripting and put it into a topic branch, post a message on the mailing list and a blog post. Everything else will wait for the later times.
I've the attitude of talking a lot, at the point the development of Redis is almost completely a public process.
I also change ideas often, that I think is a good idea, as to go forward without reconsidering what you are doing is not good. But this does not means that the development of Redis is a complex path that goes forward and backward, we actually are trying since Redis 2.2 to provide continuously updates not about features but about the quality of the implementation.
A more precise view of the actual development path can be seen looking at the 2.4 and unstable branches commit log messages.
I'm quite surprised that Diskstore isn't more concrete than that - were there unforseen problems with it? Or is the focus going to stay on in memory databases?
As a happy Redis user, the VM is the only sore spot - it seems like it should be possible to get Redis like performance on cached keys and still be able to store a long tail of data on the same server.
The problem is exactly the one with VM: I believe that likely disk will suck but for a specific work load, that is, extremely biased working set + mostly reads. But this is exactly the use case of on disk DBs anyway, that are doing a lot work to work well in this use case, why should we also enter this business? There should be space for everybody, I'll be very happy if we'll do our work well, that is, the in memory data structure server :)
BUT I did not stop experimenting, we'll return on this, but in the form of diskstore and as on disk allocators using mmap() that is another thing I'm playing with.
Oooh, this does sound like a good idea but beware feature bloat. I don't want Redis to end up being a full RDBMS, and this is basically the equivalent of stored procedures, especially if the scripts are stored as redis objects...
He specifically said they won't be stored. You shove the script to the Redis server every time you need it. That solves his visibility worry, and also neatly solves any problems with machines in a cluster being out of sync.
I'd say in a way this could avoid feature bloat, in the sense that new operations can be added as "frequently used snippets" instead of core commands. I wonder how this plays with replication/clustering, though.
Avoiding feature bloat is indeed one of the goals of this move, as explained in the blog post. Several users are requesting features in the Google Group, others like me maintain forks. With user-defined procedures, hopefully we will no longer need all this.
As for clustering, Antirez also explained in the blog post that it will work as long as you use a single key per script.
I am glad that Lua was chosen as the implementation language. It really is a perfect fit for the problem.
"Redis will try to be smart enough to reuse an interpreter with the command defined."
Caching the scripts should be fairly simple. Maintain a table of functions, then when a new script comes in, take its CRC32 value. If said value is in the table, just lua_pcall the script with the arguments. Otherwise, lua_load the function, store it in the table, then call it. Also, since Redis is single-threaded, you should only need one lua_State per server.
How are you planning to represent Redis values in the scripts? Would you just represent them as Lua strings and tables, or would you wrap the Redis values in userdata and allow operations to be called on them directly using metatables?
Johm builds allows to use data types on top of Redis. I implemented the ability to query multiple fields at https://github.com/gersh/johm. I've been discussing ways to implement more complex queries on Johm mailing list.
Currently, I believe Redis has enough functionality do a lot. I think it should be possible to build more query-like features on top of Redis, and preserve more flexibility in terms of how the querying works. Then, you could decide how far you want to go with SQL-like functionality.
Very excited about this, especially since nginx also has robust embedded Lua support through lua-nginx-module. Nginx+redis+lua is fast becoming my favorite stack for frontend stuff.
Simple scripting will solve some very inefficient patterns, such as retrieving a long list of values over the network, performing some operation on them in the client, and storing them back to Redis.
To completely close the gap, I'd want two more things:
1. Store multi-line scripts as Redis objects. Trying to cram a loop and several expressions into one line will not be easy to understand or maintain. I'd rather deal with the cluster consistency issues in the application than be limited to one-liners.
2. Ability to execute a script in a separate thread. If the script doesn't require isolation, execution shouldn't necessarily block other operations. Some scripts might take 250ms to run, which is too long to block the main thread.
Where does it say scripts need to be single lines? The Redis protocol prepends all arguments with a description of their byte-length, so the scripts should be able to have as many newlines as you like.
Definitely agree about being able to use separate threads, though.
Indeed the script can be multiple lines without problems.
About threads, I think we'll stay single threaded for scripting as well since otherwise we have troubles, that is, Lua scripts will not be atomic from the point of view of the caller. There is to take care and not write commands that do complex stuff, or to be aware that this commands do complex stuff :) In Redis there is the tendency of doing a simple raw idea and try to make the user aware that it is easy to shot yourself on your foots instead of making it more complex to avoid problems, and it is probably a good idea to follow this path for scripting as well. After all there is always time to make it more complex.
If you stay single threaded, I would strongly recommend having a yield command, which would just exit the script, do other things, and then come back and execute the script again with state intact.
That could be implemented fairly easily using coroutines. Redis could start the script in a coroutine, then if it yields, return whatever values it yielded back to the user and schedule the coroutine to be finished later. When Redis has nothing to do, it could go back and restart the coroutine, then discard it once it finally returns. (Though of course the user would have no way of getting whatever values it yielded after the command returned - it would have to communicate by storing a key somewhere, or possibly with publish/subscribe.)
On second thought, doing this automatically might be a waste of time for most one-shot commands. Perhaps a separate COEVAL command would work better for this.
That sounds right to me. Only a few commands would need this facility. But when you need it, you need it. A long-running script should NOT lock up Redis indefinitely.
Atomic is a good default, but it's not hard to imagine a case where a non-atomic script would be preferable to sending multiple scripts to avoid overly long execution time.
But as you point out, that can always be added later.
I've had an alternative Redis-like design for a while... What if you have a Python process that keeps the data in dicts and lists and Python objects as usual. These data structures get persisted to disk by forking and pickling the data as a snapshot, while the main process continues to serve requests. For a small server, the Python process can be the single-threaded web server itself (e.g. using Tornado web server).
Your performance will be absolutely horrible compared to Redis. I can't see doing this for anything production-worthy. (Miniredis was made for a very specific use-case where the performance hit was fine--and even there, we've replaced it with the real Redis for the next version.)
I didn't exactly mean a Redis server implemented in Python instead of C... It could be that, but speaking application-specific DSL instead of generic data structure DSL.
Yes you can do this with most scripting languages. This is why I consider Redis more a DSL, proof is, it is trivial to write a Redis-complete script using a real scripting language.
However the problem with this approach is that you get a lot less performances and memory efficiency.
If you don't have a network interface, is not the same thing as Redis. Even Redis can be written as a C library, but it's not the same. Redis is all about a shared persistent state.
I've no idea about __slots__, just a feeling that there are too much underscores to work well ;) But in general to get as memory efficient as Redis in a scripting language is very hard for different (good) reasons.
I will be interested to see how scripting Redis with Lua measures up to that solution. The separate daemon would have the overhead of protocol handling for each communication with Redis, but it could also be written in a compiled language, a language with better concurrency support, etc. It would also be trivially sandboxed from the Redis server.