If your username is the same on reddit as here, you don't appear to have ever been banned. If that's not the case, email me and we'll try and figure it out.
The choice to use VMWare was nothing personal to VirtualBox; we just happen to already be running VMWare on our dev boxes. (I'm actually downloading VirtualBox now to see if cross-compatibility is possible.)
I'm absolutely amazed at your dedication to this, and how confident you guys are that you're not helping some competitor.
That goes to prove that closed source is really a dead end, once you have a sufficient head-start you can even afford to give away a turn-key copy of your software and still sleep at night.
We really hope this will help us more than hurt us in the end. Part of the problem with releasing something like reddit as open source is that it isn't designed around installation. For the most part, the pieces have been built in place organically as needed. This means that even though the source is out there, it's been really hard to get developer contributions as many get stuck before they get reddit up and running locally.
This should effectively lower the barrier to entry there and let devs actually think about the code and adding features rather than about whether or not rabbit-mq or cassandra are properly configured.
I would have exactly that problem if I released code to any of my sites (not that they're worth looking at :) ), and for much the same reason.
You build stuff because you have to when it hits, especially if you experience 'unexpected growth', and things like installation documentation and so on will suffer, if they exist at all.
So releasing a working VM is a great way to do this, it's about as user-friendly as you can get.
I think you'll be setting an example here that will be followed many times.
Except that people who use web apps don't give a shit about the code running the site. They're using a transient app over the network as the service over a stateless mostly-idempotent protocol -- they care only about responses coming out of it, the code that generates the responses is completely irrelevant.
Stallman is incapable of understanding this, perhaps because he simply has never used any such services. Users want to have and control access to their data. That's it. The AGPL's only effect is to underscore how inadequate it is at doing anything for end-users -- by taking Freedom Zero away from operators, it leaves the exploitation of data and network effects as their only means to non-menial profit.
I'm curious why you suddenly started bashing the AGPL in the middle of an unrelated discussion? I believe Reddit is covered under the CAPL which is more like the GPL than the AGPL.
No one forces website operators to use AGPL code. It's a trade they make in order to cut development time in exchange for meeting the openness conditions. It's up to them to decide if that is economically worthwhile.
Banning people from deciding that trade is worthwhile would be fairly counterproductive. And having the courts invalidate the license because you have a philosophical problem with it seems to be slightly unfair on the people who wrote the code and chose to give it away for free (with some conditions attached). You might as well say it's OK to steal library books because, hey, you don't believe in the contract that says you have to return the books is fair.
Very forward thinking, I'm glad dev-in-VM is moving forward.
I actually planned on creating a Vagrant box (vagrantup.com) for reddit development to ease the development for Reddit. I was waiting until after I got another release of Vagrant out to do it (for no other reason than personal time management), but I'm motivated by this to do this sooner.
I'll take a look tonight and see if I can do this sooner rather than later, which will bring this to VirtualBox.
I use scripts (http://github.com/wr0ngway/rubber/) to boot up, configure, and deploy to my instances from a stock Ubuntu AMI -- but from the looks of this maybe you set up and maintain AMIs for each role instead?
I much prefer a scripted installation to a frozen hand-rolled installation, even if the latter is more reliable.
This is a little off topic, but can you elaborate on your reasons for preferring scripted installation? It's what I use too, but hearing someone else's justification would be helpful.
I'll tell you why I'd like to get there. It makes updating the master image easier. If you need a new package, you just update on the server side, and then all the new images get that new package. No rebundling required.
It also helps when you want to upgrade your OS. As long as the packages are mostly the same, you can just run your update script from the new OS and make a few changes.
The way I have to do it now, I have to build a whole new master image from scratch, because I can't upgrade Ubuntu in place on EC2 due to the way they handle the kernel and kernel modules.
Yes, it does. But you probably aren't starting instances that often, and it only adds a few minutes. If you really need a fast boot, then you can still use an image.
However keep in mind, on EC2 at least, that the images are actually downloaded from S3 when you boot, so if you have a huge image, it will take almost as much time.
We do this too. Another nice advantage is changes to the production machine state are all in source control. We deploy on EC2 + our own .deb. The entire .deb recipe is checked into source, right next to the rest of our source code. There's no wondering about when a specific change got added to the AMI.
I'm sure that it's just your friend that sits around manually configuring servers, taking EC2 at face value as just virtual hardware -- what some are calling the "Meat Cloud".
Another question, If I wanted to improve reddit's search, it would help to know what solutions you've already investigated (and what you are currently doing)?
In short, it used to be a fulltext search through the database, and now is is Solr built on Lucene.
However, I think PG's essay about why he doesn't have a better search on HN applies equally to reddit -- because there are much better things to spend time on.
Is there some reason that using Google isn't good enough? Most of my trouble searching reddit is because it's hard to translate "that picture of a guy with a bacon helmet" into something Google understands.
I've been bullshitting with getting this idea out of my mind and into production -- over thinking how I should build it, getting stuck in design mode, wondering if it will scale -- a bunch of self-imposed barriers. Now Reddit has provided a solid blueprint. I am amazed and very thankful. I hope to help with some of their issues one day.
I tried to run reddit on my machine. However, I got into many dead ends, the redditdev IRC was quiet, and because I had no idea what I was doing with MacPort or the Terminal (or my filesystem or the many different paradigms I had to deal with) I lost all faith in me trying to understand how sysadmins (or python developers) got things done.
Essentially, I became contempt for being a noob of a CS student, but at least I know how to run a virtual machine and prove by induction. That's a job skill right?
The goal with open sourcing was never to have clones -- it was to get people to dev the features that they felt were important but we did not, or did not have the time for. Being able to make an easy clone was just a side effect.
Their 'stacked' view of threaded comments makes a lot of sense to me. It might be really nice for a code editor too -- the minimal indent width can be smaller, and context is clearer.
Don't get me wrong, this is cool. But the other way would have been way cooler.