Membase a new NoSQL

wvenable · on June 23, 2010

"For those familiar with memcached, membase provides on-the-wire protocol compatibility, but adds disk persistence; hierarchical storage management; data replication; live cluster reconfiguration and rebalancing; and secure multi-tenancy with data partitioning. Like memcached, membase is simple, fast and elastic."

I bet a lot of large sites using multiple memcached servers would find this interesting.

sqrt17 · on June 23, 2010

... as in, those people who use memcachedb but would like to have something bigger?

pcarmichael · on June 23, 2010

The timing for me is interesting as I was just about to start up a cassandra setup here. This might be a better fit for my particular needs if I could get it running. Unfortunately, I'm having a difficult time figuring out where to start. A general getting started guide would be very helpful.

- The FAQ on membase.org is empty and the wiki does not appear to be populated with much content (i.e. 'Directions for working with the membase source and descriptions of the components in the source will be available soon').

- The google group does not appear to have significant content besides a 'Where's the ruby driver?' post.

- My work blocks IRC, so jumping on freenode isn't an option.

- I tried downloading from the northscale.com site, but for whatever reason I cannot enter the required fields on the download form (in Firefox 3.6.4).

- I downloaded the source, which is broken out into a number of directories - figuring out what the various subsystems are and what needs to be done is going to take some digging.

dlsspy · on June 23, 2010

Thanks for reporting the FF issue. The site's been updated and tested again.

We've obviously got a bit to go from our internal build farms to things that are easier for other contributors to play with. Documentation for developers is still being written.

Forgetting everything we know about the things we've been doing for a while so we can explain it to people who haven't been doing it has proven to be a bit harder than we initially thought. :)

mileszs · on June 23, 2010

I offer no counter or dissent to your post. However, apparently, the binaries have not yet been posted. Perhaps once they get that crucial detail finished, things will be clear. It appears they haven't even reached an actual beta stage. It might have been better for them to wait until everything was ready to go before launching the site publicly.

Considering the 'front-end' of membase is memcached code, you should be able to use any library that interfaces with memcached to interface with membase.

Out of curiosity, what drew you towards Cassandra over, say, Riak or MongoDB?

pcarmichael · on June 23, 2010

MongoDB was ruled out because auto-sharding is still alpha (though it should be out of alpha soon, I'd prefer to wait and see what the community experience is with it first.)

Between Cassandra and Riak, I chose Cassandra because it can run on windows while Riak cannot. I needed it for a data store for some cluster computing work I'm doing, but for various reasons needed my test environment to be windows only. I think I can get rid of that restriction; perhaps benchmarking Riak vs. Cassandra for my particular use case is in order.

DennisP · on June 25, 2010

Linux 64-bit binary is available without filling out any forms at http://labs.northscale.com/membase/

mahmud · on June 24, 2010

My work blocks IRC, so jumping on freenode isn't an option

Get yourself a shell account and run irssi under screen.

weixiyen · on June 23, 2010

Sounds a lot like what Redis is doing with Redis Cluster, except this one is already in production.

p3ll0n · on June 23, 2010

The key/value model embraced by NoSQL databases (Scalaris, Voldemort, Tokyo Cabinet, etc) is the simplest and easiest to implement but inefficient when you are only interested in querying or updating part of a value.

This article (http://seattleweb.intel-research.net/people/lamarca/pubs/pap...) coming out of Intel argues that it is also difficult to implement more sophisticated structures on top of a distributed key/value. The author's main point is that a few specialized applications can and have been built on a plain distributed key/value store, but most applications have ended up having to customize the key/value store's internals to achieve their functional or performance goals.

From the little bit I have read about Membase it looks well positioned to bring simple distributed key/values stores to the next level and back into the lime light.

sqrt17 · on June 23, 2010

I for one like the concept of storing JSON-like data structures and using javascript-based indices on top of that, like you can do with CouchDB or MongoDB. For the greater part, having something where you can store big amorphous blobs is much less useful than something where you can (a) work on a standard format (usable at least from Java and Python) and (b) use some indices to speed up access via selected attributes.

For a project of mine, I've setup something where the frontend (i.e. javascript) munches on some data structures, which are then passed on to the backend (some Python code for business logic and authentification) which sticks them into the database more or less directly. Works really well, although MongoDB still has this "embrace of the exotic stranger" feeling (i.e., you wonder when exactly your database will stop to like you and just crash), whereas CouchDB is just too slow for my purposes.

zppx · on June 23, 2010

Had not the time to search through the codebase, but the source include 10 folders, a sysadmin's hell, some with software in C, some with software in C++ and some with python code, one of them is a folder named 'mencached'.

mahmud · on June 24, 2010

I worked on a project that had a directory named "manuel" and contained 8 C files and a file named "make.sh". I always thought it was some guy's private playground and shouldn't be touched. But once I dug into it I discovered a NIH implementation of Docbook + Doxygen + Markdown + grep + strings, and make.sh did what Makefile would have done, use make. And 'manuel' never existed, the French comments alerted me to guess what it really meant: Manual.

silentbicycle · on June 25, 2010

And of course, "manual" means, "by hand". :)

agotterer · on June 23, 2010

This looks similar to http://www.gear6.com/memcached-product/memcached.

agotterer · on June 28, 2010

Wonder why I got down voted for an informative link to a similar project that I have no involvement with?

alexpopescu · on June 23, 2010

Interesting! I've concluded the same thing based on the very scarce info they're offering.

leej · on June 23, 2010

* there are alternatives but the backers give this project one up. * web site needs a significant overhaul. * this was a commercial product, if i am not mistaken, and just recently opened up. * is this forked off memcached or just another NoSQL with memcached compatibility is still a mystery.

dlsspy · on June 23, 2010

It's not a fork. We've done a lot of work on memcached to allow it to support multiple backends on the same network interface. Part of membase is a new memcached backend.

tobyhede · on June 26, 2010

Does anyone know what kind of durability Membase offers? One of my ongoing concerns with MongoDB is the lack of single-server durability.

Lorin · on June 24, 2010

That is one ugly mascot.

ergo98 · on June 23, 2010

It's unfortunate there are no Windows binaries (well apparently there are no binaries yet period, but I mean they don't plan on binaries until some time out). We could host Linux VMs, but ideally we just run Windows binaries and get rid of the abstraction.

That is the case for a large number of these products. At best offering a terrible Cygwin port. In the Windows world the premiere product right now is the beta of AppFabric, but almost no one uses that and it has an absurd list of dependencies that preclude many uses.

siculars · on June 23, 2010

Your post is just another nail in the MS coffin. The simple fact is that the latest, greatest hotness is just not playing in the MS ecosystem. Virtually all NoSQL systems support windows as an afterthought if at all. Not to mention hotness like nodejs.

The bottom line is that the people who make the new shinnies just don't play on MS boxes and could care less if their toys ever play on MS boxes. No, I'm not talking 'enterprise'.

wmf · on June 23, 2010

Porting software to a completely different culture isn't cheap and there's little incentive for volunteer open source projects to do it. If enough software that you want isn't available on Windows, it's probably cheaper for you to learn Unix.

henrikschroder · on June 23, 2010

There's plenty of people that want to volunteer porting projects to Windows, there are other hurdles such as Microsoft refusing to ship a C99 compiler, and getting the Windows port accepted into the main repo for the project.

mahmud · on June 24, 2010

Huh? Everyone builds with mingw. We shipped pure win32 desktop and service apps that were compiled with mingw.

The difficulties in porting to Windows are slightly more technical than just the availability of a free C compiler with C99 support, and project management politics.

henrikschroder · on June 24, 2010

The problem is that MingW is a completely alien environment for the vast majority of C developers on Windows, and Visual Studio is a completely alien environment for POSIX projects.

That way you can't use the majority of Windows C developers, many of whom would love to contribute to open source projects, and who has the knowledge of the technical difficulties of porting to Windows.

ergo98 · on June 23, 2010

Where did I say I didn't know Unix? The problem is that for some deployed solutions adding in Linux boxes simply because they host one piece of software isn't optimal, so "native" solutions are chosen instead. We aren't talking about a product that is tightly coupled with the OS (or at least it shouldn't be).

jbooth · on June 23, 2010

It should be, actually, if it's going to be fast. Memcached is fast because it optimizes interaction with the I/O layers -- you can't stay portable if you do that.

EDIT: Actually, you can use libevent for some subset of these cases and get speed+portability, but I'm pretty sure memcached predates libevent by quite a bit so they're not. So there's your answer.

JoachimSchipper · on June 23, 2010

The first commit to memcached on github is http://github.com/memcached/memcached/tree/32f382b605b4565bd..., which uses libevent. libevent has been around for a while...

silentbicycle · on June 23, 2010

Memcached predates github by quite a bit, actually. It was originally made as a caching layer for livejournal, and IIRC was made in the early half of the 00s. (Having a hard time finding an exact date, though.)

dlsspy · on June 24, 2010

If you want to know the exact date, look at the link you're suggesting is wrong.

silentbicycle · on June 24, 2010

You mean the one where an existing project with several years of history has its development tree imported into github as its first commit?

I remember the memcached project being announced, and I'm pretty sure it predated github by five years or more.

dlsspy · on June 24, 2010

I have no idea what link you clicked on. The one above says this:

    committing memcached 

    bradfitz (author)
    May 27, 2003

Specifically, that was Tue May 27 07:19:11 2003 +0000.

I spent a long night back in 2008 recreating that history as accurately as possible. It involved a couple of git repositories, subversion repositories, and a lot of google and mailing list searches to ensure that every one who ever contributed to the project was properly recognized.

You can read more about that process here: http://www.mail-archive.com/memcached@googlegroups.com/msg00...

silentbicycle · on June 24, 2010

My apologies, and thanks for the exact date. I missed that the github commit date was in 2003. (If you miss that, your reply comes off as really snarky. Mine was a bit blunt, too.)

reynolds · on June 23, 2010

memcached actually uses libevent.

Edit: oops, JoachimSchipper beat me to it

mkramlich · on June 23, 2010

if Zynga is already using it behind say Farmville that will give it some cred out of the gate

ryanwaggoner · on June 23, 2010

The About page lists "Major Deployments" and seems to indicate that both Farmville and Cafe World are using it, which is impressive.

peter123 · on June 23, 2010

The site membase.org seems to be really slow now. Not a good showcase of the technology it's trying to preach.

cmelbye · on June 23, 2010

Why would a static, informational web site be using a database?

Xurinos · on June 23, 2010

There is an argument in favor of marketing to the perception, though. I am sure he/she is not the only viewer that might assume they are eating their own dog food and using it behind their website. We have countless tales on HN about increasing customer interest by speeding up your site. If you want to talk about how cool your tech is, make sure your website reflects a similar attention to the important details. I think the comment is legitimate for the perspective even if the technological reality is that the slowness is completely unrelated to the database.

Also, given the popularity of CRMs and so forth, it might not be good to assume that the pages are really "static". For all we know, their site is using their database to provide content -- edited by documentation/marketing folks -- to page templates.

reitzensteinm · on June 23, 2010

I agree completely. Rational users would realise that the database has little to no connection to the website's hosting, but are any of us entirely rational? The details always matter.

kingkilr · on June 23, 2010

True, but then the rational user might conclude that if you can't manage to have a performant static website you might not be able to handle the complicated stuff.

cmelbye · on June 23, 2010

If by CRM you mean CMS, I don't think they're using a CMS because all of the pages seem to be plain HTML based on the extension and the content type. Of course, they could be spoofing that for a number of reasons, but I think it's safe to say that their site's perceived slowness has nothing to do with the database. In any event, it's loading very quickly now.