Hacker News new | past | comments | ask | show | jobs | submit login
Membase a new NoSQL (membase.org)
58 points by leej on June 23, 2010 | hide | past | favorite | 44 comments



"For those familiar with memcached, membase provides on-the-wire protocol compatibility, but adds disk persistence; hierarchical storage management; data replication; live cluster reconfiguration and rebalancing; and secure multi-tenancy with data partitioning. Like memcached, membase is simple, fast and elastic."

I bet a lot of large sites using multiple memcached servers would find this interesting.


... as in, those people who use memcachedb but would like to have something bigger?


The timing for me is interesting as I was just about to start up a cassandra setup here. This might be a better fit for my particular needs if I could get it running. Unfortunately, I'm having a difficult time figuring out where to start. A general getting started guide would be very helpful.

- The FAQ on membase.org is empty and the wiki does not appear to be populated with much content (i.e. 'Directions for working with the membase source and descriptions of the components in the source will be available soon').

- The google group does not appear to have significant content besides a 'Where's the ruby driver?' post.

- My work blocks IRC, so jumping on freenode isn't an option.

- I tried downloading from the northscale.com site, but for whatever reason I cannot enter the required fields on the download form (in Firefox 3.6.4).

- I downloaded the source, which is broken out into a number of directories - figuring out what the various subsystems are and what needs to be done is going to take some digging.


Thanks for reporting the FF issue. The site's been updated and tested again.

We've obviously got a bit to go from our internal build farms to things that are easier for other contributors to play with. Documentation for developers is still being written.

Forgetting everything we know about the things we've been doing for a while so we can explain it to people who haven't been doing it has proven to be a bit harder than we initially thought. :)


I offer no counter or dissent to your post. However, apparently, the binaries have not yet been posted. Perhaps once they get that crucial detail finished, things will be clear. It appears they haven't even reached an actual beta stage. It might have been better for them to wait until everything was ready to go before launching the site publicly.

Considering the 'front-end' of membase is memcached code, you should be able to use any library that interfaces with memcached to interface with membase.

Out of curiosity, what drew you towards Cassandra over, say, Riak or MongoDB?


MongoDB was ruled out because auto-sharding is still alpha (though it should be out of alpha soon, I'd prefer to wait and see what the community experience is with it first.)

Between Cassandra and Riak, I chose Cassandra because it can run on windows while Riak cannot. I needed it for a data store for some cluster computing work I'm doing, but for various reasons needed my test environment to be windows only. I think I can get rid of that restriction; perhaps benchmarking Riak vs. Cassandra for my particular use case is in order.


Linux 64-bit binary is available without filling out any forms at http://labs.northscale.com/membase/


My work blocks IRC, so jumping on freenode isn't an option

Get yourself a shell account and run irssi under screen.


Sounds a lot like what Redis is doing with Redis Cluster, except this one is already in production.


The key/value model embraced by NoSQL databases (Scalaris, Voldemort, Tokyo Cabinet, etc) is the simplest and easiest to implement but inefficient when you are only interested in querying or updating part of a value.

This article (http://seattleweb.intel-research.net/people/lamarca/pubs/pap...) coming out of Intel argues that it is also difficult to implement more sophisticated structures on top of a distributed key/value. The author's main point is that a few specialized applications can and have been built on a plain distributed key/value store, but most applications have ended up having to customize the key/value store's internals to achieve their functional or performance goals.

From the little bit I have read about Membase it looks well positioned to bring simple distributed key/values stores to the next level and back into the lime light.


I for one like the concept of storing JSON-like data structures and using javascript-based indices on top of that, like you can do with CouchDB or MongoDB. For the greater part, having something where you can store big amorphous blobs is much less useful than something where you can (a) work on a standard format (usable at least from Java and Python) and (b) use some indices to speed up access via selected attributes.

For a project of mine, I've setup something where the frontend (i.e. javascript) munches on some data structures, which are then passed on to the backend (some Python code for business logic and authentification) which sticks them into the database more or less directly. Works really well, although MongoDB still has this "embrace of the exotic stranger" feeling (i.e., you wonder when exactly your database will stop to like you and just crash), whereas CouchDB is just too slow for my purposes.


Had not the time to search through the codebase, but the source include 10 folders, a sysadmin's hell, some with software in C, some with software in C++ and some with python code, one of them is a folder named 'mencached'.


I worked on a project that had a directory named "manuel" and contained 8 C files and a file named "make.sh". I always thought it was some guy's private playground and shouldn't be touched. But once I dug into it I discovered a NIH implementation of Docbook + Doxygen + Markdown + grep + strings, and make.sh did what Makefile would have done, use make. And 'manuel' never existed, the French comments alerted me to guess what it really meant: Manual.


And of course, "manual" means, "by hand". :)



Wonder why I got down voted for an informative link to a similar project that I have no involvement with?


Interesting! I've concluded the same thing based on the very scarce info they're offering.


* there are alternatives but the backers give this project one up. * web site needs a significant overhaul. * this was a commercial product, if i am not mistaken, and just recently opened up. * is this forked off memcached or just another NoSQL with memcached compatibility is still a mystery.


It's not a fork. We've done a lot of work on memcached to allow it to support multiple backends on the same network interface. Part of membase is a new memcached backend.


Does anyone know what kind of durability Membase offers? One of my ongoing concerns with MongoDB is the lack of single-server durability.


That is one ugly mascot.


It's unfortunate there are no Windows binaries (well apparently there are no binaries yet period, but I mean they don't plan on binaries until some time out). We could host Linux VMs, but ideally we just run Windows binaries and get rid of the abstraction.

That is the case for a large number of these products. At best offering a terrible Cygwin port. In the Windows world the premiere product right now is the beta of AppFabric, but almost no one uses that and it has an absurd list of dependencies that preclude many uses.


Your post is just another nail in the MS coffin. The simple fact is that the latest, greatest hotness is just not playing in the MS ecosystem. Virtually all NoSQL systems support windows as an afterthought if at all. Not to mention hotness like nodejs.

The bottom line is that the people who make the new shinnies just don't play on MS boxes and could care less if their toys ever play on MS boxes. No, I'm not talking 'enterprise'.


Porting software to a completely different culture isn't cheap and there's little incentive for volunteer open source projects to do it. If enough software that you want isn't available on Windows, it's probably cheaper for you to learn Unix.


There's plenty of people that want to volunteer porting projects to Windows, there are other hurdles such as Microsoft refusing to ship a C99 compiler, and getting the Windows port accepted into the main repo for the project.


Huh? Everyone builds with mingw. We shipped pure win32 desktop and service apps that were compiled with mingw.

The difficulties in porting to Windows are slightly more technical than just the availability of a free C compiler with C99 support, and project management politics.


The problem is that MingW is a completely alien environment for the vast majority of C developers on Windows, and Visual Studio is a completely alien environment for POSIX projects.

That way you can't use the majority of Windows C developers, many of whom would love to contribute to open source projects, and who has the knowledge of the technical difficulties of porting to Windows.


Where did I say I didn't know Unix? The problem is that for some deployed solutions adding in Linux boxes simply because they host one piece of software isn't optimal, so "native" solutions are chosen instead. We aren't talking about a product that is tightly coupled with the OS (or at least it shouldn't be).


It should be, actually, if it's going to be fast. Memcached is fast because it optimizes interaction with the I/O layers -- you can't stay portable if you do that.

EDIT: Actually, you can use libevent for some subset of these cases and get speed+portability, but I'm pretty sure memcached predates libevent by quite a bit so they're not. So there's your answer.


The first commit to memcached on github is http://github.com/memcached/memcached/tree/32f382b605b4565bd..., which uses libevent. libevent has been around for a while...


Memcached predates github by quite a bit, actually. It was originally made as a caching layer for livejournal, and IIRC was made in the early half of the 00s. (Having a hard time finding an exact date, though.)


If you want to know the exact date, look at the link you're suggesting is wrong.


You mean the one where an existing project with several years of history has its development tree imported into github as its first commit?

I remember the memcached project being announced, and I'm pretty sure it predated github by five years or more.


I have no idea what link you clicked on. The one above says this:

    committing memcached 

    bradfitz (author)
    May 27, 2003
Specifically, that was Tue May 27 07:19:11 2003 +0000.

I spent a long night back in 2008 recreating that history as accurately as possible. It involved a couple of git repositories, subversion repositories, and a lot of google and mailing list searches to ensure that every one who ever contributed to the project was properly recognized.

You can read more about that process here: http://www.mail-archive.com/memcached@googlegroups.com/msg00...


My apologies, and thanks for the exact date. I missed that the github commit date was in 2003. (If you miss that, your reply comes off as really snarky. Mine was a bit blunt, too.)


memcached actually uses libevent.

Edit: oops, JoachimSchipper beat me to it


if Zynga is already using it behind say Farmville that will give it some cred out of the gate


The About page lists "Major Deployments" and seems to indicate that both Farmville and Cafe World are using it, which is impressive.


The site membase.org seems to be really slow now. Not a good showcase of the technology it's trying to preach.


Why would a static, informational web site be using a database?


There is an argument in favor of marketing to the perception, though. I am sure he/she is not the only viewer that might assume they are eating their own dog food and using it behind their website. We have countless tales on HN about increasing customer interest by speeding up your site. If you want to talk about how cool your tech is, make sure your website reflects a similar attention to the important details. I think the comment is legitimate for the perspective even if the technological reality is that the slowness is completely unrelated to the database.

Also, given the popularity of CRMs and so forth, it might not be good to assume that the pages are really "static". For all we know, their site is using their database to provide content -- edited by documentation/marketing folks -- to page templates.


I agree completely. Rational users would realise that the database has little to no connection to the website's hosting, but are any of us entirely rational? The details always matter.


True, but then the rational user might conclude that if you can't manage to have a performant static website you might not be able to handle the complicated stuff.


If by CRM you mean CMS, I don't think they're using a CMS because all of the pages seem to be plain HTML based on the extension and the content type. Of course, they could be spoofing that for a number of reasons, but I think it's safe to say that their site's perceived slowness has nothing to do with the database. In any event, it's loading very quickly now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: