Reddit releases fully functional VM with their source code

KirinDave · on May 19, 2010

Am I the only person who thought that instead of “an image for VMware” they meant “the source code to a functional virtual machine that Reddit uses”?

Don't get me wrong, this is cool. But the other way would have been way cooler.

branden · on May 20, 2010

Hah, I once had a boss wig out when he heard someone mention a "Java VM." He thought we'd done something crazy with VMWare behind his back.

I like that HN gets confused in the other direction.

jedberg · on May 19, 2010

We're not quite that talented. ;)

eru · on May 20, 2010

We are. (I am working on XenServer.)

I did not get confused by the headline.

apgwoz · on May 20, 2010

I read it as something like a new python vm, not a xen vm.

dpritchett · on May 20, 2010

I got exactly what I hoped for when clicking through. Turnkey-style VMs are great!

coderdude · on May 19, 2010

That is why I clicked-in as well. The title is very misleading.

jedberg · on May 20, 2010

Sorry if you felt the title is misleading -- it certainly wasn't my intention.

It's just hard to fit all the info into 80 characters.

ynniv · on May 20, 2010

  Reddit releases fully functional VM with their source code
  Reddit releases VMWare image and their source code

jedberg · on May 20, 2010

The most important piece of information was that the VM is fully functional, which your title omits.

dan00 · on May 20, 2010

Well, if something is released, than I'm assuming that it works.

jedberg · on May 20, 2010

Did you have trouble running the VM?

dan00 · on May 22, 2010

Oh, I've only commented the title of the post. But yes, I could've denied myself. Sorry.

alnayyir · on May 20, 2010

Reddit releases VMware image with their infrastructure stack baked in

69 characters.

PS I'm still pissed about getting ninja-banned.

jedberg · on May 20, 2010

If your username is the same on reddit as here, you don't appear to have ever been banned. If that's not the case, email me and we'll try and figure it out.

alnayyir · on May 23, 2010

Account was knothing.

keysersosa · on May 19, 2010

The choice to use VMWare was nothing personal to VirtualBox; we just happen to already be running VMWare on our dev boxes. (I'm actually downloading VirtualBox now to see if cross-compatibility is possible.)

jacquesm · on May 19, 2010

I'm absolutely amazed at your dedication to this, and how confident you guys are that you're not helping some competitor.

That goes to prove that closed source is really a dead end, once you have a sufficient head-start you can even afford to give away a turn-key copy of your software and still sleep at night.

Impressive!

keysersosa · on May 19, 2010

Thanks.

We really hope this will help us more than hurt us in the end. Part of the problem with releasing something like reddit as open source is that it isn't designed around installation. For the most part, the pieces have been built in place organically as needed. This means that even though the source is out there, it's been really hard to get developer contributions as many get stuck before they get reddit up and running locally.

This should effectively lower the barrier to entry there and let devs actually think about the code and adding features rather than about whether or not rabbit-mq or cassandra are properly configured.

jacquesm · on May 19, 2010

I would have exactly that problem if I released code to any of my sites (not that they're worth looking at :) ), and for much the same reason.

You build stuff because you have to when it hits, especially if you experience 'unexpected growth', and things like installation documentation and so on will suffer, if they exist at all.

So releasing a working VM is a great way to do this, it's about as user-friendly as you can get.

I think you'll be setting an example here that will be followed many times.

jedberg · on May 19, 2010

We know that our code is not our key advantage -- it is our community.

blasdel · on May 19, 2010

And that's why the AGPL provides no value: Code < Data < Community

eru · on May 20, 2010

The GPL and AGPL do not try to take away your competitive advantages. They are just intended to make code free as in speech.

blasdel · on May 20, 2010

Except that people who use web apps don't give a shit about the code running the site. They're using a transient app over the network as the service over a stateless mostly-idempotent protocol -- they care only about responses coming out of it, the code that generates the responses is completely irrelevant.

Stallman is incapable of understanding this, perhaps because he simply has never used any such services. Users want to have and control access to their data. That's it. The AGPL's only effect is to underscore how inadequate it is at doing anything for end-users -- by taking Freedom Zero away from operators, it leaves the exploitation of data and network effects as their only means to non-menial profit.

notauser · on May 20, 2010

I'm curious why you suddenly started bashing the AGPL in the middle of an unrelated discussion? I believe Reddit is covered under the CAPL which is more like the GPL than the AGPL.

No one forces website operators to use AGPL code. It's a trade they make in order to cut development time in exchange for meeting the openness conditions. It's up to them to decide if that is economically worthwhile.

Banning people from deciding that trade is worthwhile would be fairly counterproductive. And having the courts invalidate the license because you have a philosophical problem with it seems to be slightly unfair on the people who wrote the code and chose to give it away for free (with some conditions attached). You might as well say it's OK to steal library books because, hey, you don't believe in the contract that says you have to return the books is fair.

mitchellh · on May 19, 2010

Very forward thinking, I'm glad dev-in-VM is moving forward.

I actually planned on creating a Vagrant box (vagrantup.com) for reddit development to ease the development for Reddit. I was waiting until after I got another release of Vagrant out to do it (for no other reason than personal time management), but I'm motivated by this to do this sooner.

I'll take a look tonight and see if I can do this sooner rather than later, which will bring this to VirtualBox.

disclaimer: I am the developer behind Vagrant.

eru · on May 20, 2010

> Very forward thinking, I'm glad dev-in-VM is moving forward.

Perhaps Smalltalk-like images will make a comeback?

blasdel · on May 19, 2010

Reddit itself is hosted on EC2, right?

I use scripts (http://github.com/wr0ngway/rubber/) to boot up, configure, and deploy to my instances from a stock Ubuntu AMI -- but from the looks of this maybe you set up and maintain AMIs for each role instead?

I much prefer a scripted installation to a frozen hand-rolled installation, even if the latter is more reliable.

jedberg · on May 19, 2010

Yeah, we use EC2. We are trying to move towards your model, which is, I think, better.

But when we started no one really knew what the "right way" was, so we went with custom AMIs.

natrius · on May 20, 2010

This is a little off topic, but can you elaborate on your reasons for preferring scripted installation? It's what I use too, but hearing someone else's justification would be helpful.

jedberg · on May 20, 2010

I'll tell you why I'd like to get there. It makes updating the master image easier. If you need a new package, you just update on the server side, and then all the new images get that new package. No rebundling required.

It also helps when you want to upgrade your OS. As long as the packages are mostly the same, you can just run your update script from the new OS and make a few changes.

The way I have to do it now, I have to build a whole new master image from scratch, because I can't upgrade Ubuntu in place on EC2 due to the way they handle the kernel and kernel modules.

apu · on May 20, 2010

Doesn't this increase the time it takes to boot up a new instance, though?

Boot, update and apply patches, copy over source & data, start running services...everytime you start a new instance

jedberg · on May 20, 2010

Yes, it does. But you probably aren't starting instances that often, and it only adds a few minutes. If you really need a fast boot, then you can still use an image.

However keep in mind, on EC2 at least, that the images are actually downloaded from S3 when you boot, so if you have a huge image, it will take almost as much time.

arohner · on May 20, 2010

We do this too. Another nice advantage is changes to the production machine state are all in source control. We deploy on EC2 + our own .deb. The entire .deb recipe is checked into source, right next to the rest of our source code. There's no wondering about when a specific change got added to the AMI.

blasdel · on May 20, 2010

I'm sure that it's just your friend that sits around manually configuring servers, taking EC2 at face value as just virtual hardware -- what some are calling the "Meat Cloud".

Source Code > Documentation > Artifacts

jwegan · on May 19, 2010

Another question, If I wanted to improve reddit's search, it would help to know what solutions you've already investigated (and what you are currently doing)?

jedberg · on May 20, 2010

Here is where we publish most of the info about search:

http://www.reddit.com/help/search

In short, it used to be a fulltext search through the database, and now is is Solr built on Lucene.

However, I think PG's essay about why he doesn't have a better search on HN applies equally to reddit -- because there are much better things to spend time on.

soult · on May 20, 2010

PG is wrong, search is a very useful feature. Even more so for a site like reddit where submissions can be in different subreddits.

If search weren't useful, why is there searchyc? And every few weeks a topic about why there is no search function?

samd · on May 20, 2010

Is there some reason that using Google isn't good enough? Most of my trouble searching reddit is because it's hard to translate "that picture of a guy with a bacon helmet" into something Google understands.

jedberg · on May 20, 2010

I didn't say it wasn't useful. I just said there are better things to work on with our limited resources.

emehrkay · on May 20, 2010

I've been bullshitting with getting this idea out of my mind and into production -- over thinking how I should build it, getting stuck in design mode, wondering if it will scale -- a bunch of self-imposed barriers. Now Reddit has provided a solid blueprint. I am amazed and very thankful. I hope to help with some of their issues one day.

Thanks reddit

ilovecomputers · on May 19, 2010

Ah, thanks for this.

I tried to run reddit on my machine. However, I got into many dead ends, the redditdev IRC was quiet, and because I had no idea what I was doing with MacPort or the Terminal (or my filesystem or the many different paradigms I had to deal with) I lost all faith in me trying to understand how sysadmins (or python developers) got things done.

Essentially, I became contempt for being a noob of a CS student, but at least I know how to run a virtual machine and prove by induction. That's a job skill right?

jedberg · on May 20, 2010

> That's a job skill right?

Depends on what you do with the VM.

jwegan · on May 19, 2010

Out of curiosity, how much external contribution is there to reddit's source code?

jedberg · on May 19, 2010

Not a lot right now, but hopefully more with this release.

apgwoz · on May 20, 2010

Any stats as to how many people have reddit clones based on the codebase?

jedberg · on May 20, 2010

These are the only ones we know about (and I think 1/2 of those are dead): http://code.reddit.com/wiki/PoweredByReddit

The goal with open sourcing was never to have clones -- it was to get people to dev the features that they felt were important but we did not, or did not have the time for. Being able to make an easy clone was just a side effect.

Nwallins · on May 20, 2010

I really like http://lesswrong.com -- a spinoff of http://overcomingbias.com

Their 'stacked' view of threaded comments makes a lot of sense to me. It might be really nice for a code editor too -- the minimal indent width can be smaller, and context is clearer.

jedberg · on May 20, 2010

Lesswrong is awesome. It is my favorite reddit derivative.