As Digg Struggles, VP Of Engineering Is Shown The Door

amix · on Sept 7, 2010

Like Joel Spolsky said it: "They followed the single worst strategic mistake that any software company can make: They decided to rewrite everything from scratch" - http://www.joelonsoftware.com/articles/fog0000000069.html

I think rewriting from scratch is the core of their problem and not really Cassandra. Gradually going over to Cassandra would have been a much better idea.

squidsoup · on Sept 8, 2010

I've heard this time and time again, but what if your application is a genuine ball of mud? Would you really not advocate a rewrite for an unmaintainable spaghetti classic asp app still in production today?

_delirium · on Sept 8, 2010

Microsoft could be one example of a successful one. Their flagship Windows codebase that ran the 3.1->95->98->ME line was basically ditched with the rewrite-from-scratch NT (famously done by an ex-VMS team), which later had some APIs ported to it to make Windows 2000 and especially XP be close to drop-in replacements for the old line, while not really sharing much code. I think in retrospect that was probably a good idea: the NT rewrite put the codebase on much better footing than the aging, incrementally updated classic Windows codebase had been.

Solaris is another example of a rewrite that seems to have worked, though the rewrite did derive from a different set of existing code, not a total from-scratch job. But the classic SunOS 1.x.-4.x codebase was ditched, and SunOS 5.x / "Solaris 2" replaced it.

alabut · on Sept 8, 2010

You covered Windows and Linux - don't forget Apple's switch to OS X. They wouldn't done a wholesale switch at some point even if they hadn't gone with something unix-based because the other alternative was Copeland, the internal project to do a complete rewrite.

alabut · on Sept 8, 2010

EDIT: they would have made the switch. Stupid iPhone.

papaf · on Sept 8, 2010

Its a good point that NT was a successful rewrite. However, its worth noting how this was done. NT was originally aimed at a different market, there was a overlap of several years where the old system was still available AND NT ran Windows 3.1 apps in their own subsystem which contained .... the codebase of Windows 3.1.

rythie · on Sept 8, 2010

I think the NT and OSX updates were absolutely required due to architectural reasons which impact security, 98/ME and MacOS 9 just simply weren't suited to types of threats on the internet (no support for dropping process privileges, file permissions, cutting direct access to hardware etc.). If you think XP is bad in 2010, think what Me would be like if widely installed still, one program crashing the whole machine, boot sector viruses etc.

MartinCron · on Sept 8, 2010

I would advocate an incremental rewrite of parts of the unmaintainable spaghetti classic asp until there's none of the spaghetti left. It's easier to rewrite part of a system than an entire system.

Release to production dozens, if not hundreds of times. Releases are non-events, rollbacks are non-events.

A system-wide ground-up rewrite with a big-bang switchover at the end is a classic clusterfuck recipe. It's a shame that so many people think it's a good idea, even in 2010.

davidw · on Sept 8, 2010

> incremental rewrite of parts of the unmaintainable spaghetti classic

Sounds good in theory. In practice? Part of the problem with many big ball of mud systems is that all the parts depend on and talk to all the other parts. Want to fix that horrid DB schema? You'll have to rewrite all the code that talks to it, or rewrite it to talk to an intermediary. Want to rewrite that horrid bit of code that's called foobar_20040623 ("foobar" has been changed, but yes, I saw this in some PHP code...), you'll have to find out what all it interacts with, and likely redo that too.

MartinCron · on Sept 8, 2010

In practice? It's not easy, but nothing is. I see three options to the big ball of mud problem.

1. Wallow in it (work with existing structure). Sadly, this is what a lot of people do. I left a job once because that was the only way out of the ball of mud. I was afraid I would turn into a mud-person.

2. Slowly crawl out of it (incremental rewrite). This is hard, but do-able. It involves setting up barriers to mitigate ripple effects, automated testing to be comfortable with frequent releases, and tolerance for temporary imperfection (basically, you need to be willing to frequently release things that are only a tiny bit better than status quo, even if it's not ideal). Not everyone is willing to accept this persistent imperfection and lack of conceptual consistency, especially when option #3 is more exciting and fun.

3. Try to leap out of it and land wherever (total rewrite). This is very, very hard, and prone to total failure or spending valuable money and time to effectively stand still in the market.

I've seen development teams leap directly out of one ball of mud into a different one. One where nobody even knew their way around anymore. How is that anything but a huge waste of time and resources?

davidw · on Sept 8, 2010

Fascinating topic - lots of startup talk is fun because you get to do things from scratch, but this is very pertinent to the Real World.

In terms of "standing still in the market", I think the incremental rewrite contains a lot of that too, no? It's just spread over more time. You're still rewriting it, and during that time, you're not adding new features. An example might be retrofitting some testing code to a system that's never had test code. That could potentially be a fair amount of work, and given a constant pool of resources, it will take time away from "new stuff". Just that it's not so much of a quantum leap - you can still drop your new testing code and go implement some must-have feature if you need to, without saying you have to wait for the whole thing to be ready.

Sadly though, my experience in this is that the reason there's a ball of mud in the first place is a political/social one, so that any "dead time" is frowned upon.

rwmj · on Sept 8, 2010

In reality yes, it works, I've done it. We took a horrible, accreted web application, and rewrote it in stages over a period of about 12 months. At the same time we were making regular releases, and needless to say the site stayed up the whole time.

You just have to plan things carefully, work hard, and keep your head screwed on. (Just like with many things in life ...)

davidw · on Sept 8, 2010

This would make a great topic for blogs/books/whatever. "How to dig yourself out of a pile of shit."

MartinCron · on Sept 8, 2010

I agree, although it's a somewhat sensitive thing to write about if you're doing it in practice. I sure as hell don't want to be known as the guy who comes in and calls all of the existing code a pile of shit.

I guess I could "change the names to protect the innocent" and tell some stories about digging out of tight places incrementally. If it would convince even one development team that they didn't absolutely have to do a total rewrite, it would be worth it.

zeemonkee · on Sept 8, 2010

Another issue is whether you are just rewriting the code, or fundamentally changing your data store (as Digg have done).

Rewriting the code is fine - if it goes wrong, just back up to the old code, users won't notice the difference. You can do it on a page-by-page basis, just route URLs selectively to the new install.

Migrating data is something that should be done under only the most extreme circumstances - and something will inevitably go horribly wrong, so be prepared to rollback.

MartinCron · on Sept 8, 2010

Because migration is a scenario that may go horribly wrong, you should be prepared to tackle it in small pieces. So often (especially with relational databases, I don't have much experience with the NoSQL realm) everything is in one logical and physical data store, even for non-related features. That makes any migration harder than it needs to be.

dasil003 · on Sept 8, 2010

I'd say it depends on if your engineering talent is vastly superior today than what it was when it was written. Even then it's so risky. It's probably always better to do it incrementally even if it takes twice as long to do so, because you can maintain working software and fix bugs as you go.

The approach I would take is to get the minimum set of engineers who know the most about each major aspect of the code, and put their heads together on what the ideal architecture would be. But rather than building it from scratch, figure out how to implement just one of those pieces now. That way you can decrease entropy in the codebase piecemeal without chucking out all the code at once, which is no doubt full of forgotten assumptions that no one will remember until it's too late.

scorpion032 · on Sept 8, 2010

but what if your application is a genuine ball of mud?

Hello, Flash? (via Gruber)

gaiusparx · on Sept 7, 2010

The other problem is they have been hyping V4 for a long time by making beta testing invite only and hard to get. It would have been better if they can have a public V4 beta running in parallel with V3 since it is a rewrite.

blantonl · on Sept 7, 2010

you can't do a parallel beta test of two functionally different sites. You would end up with two separate data stores, workflows, and ultimately a confused user base.

Never mind the fact that the user base is confused by the rollout.

And never mind the fact that Google has forever damaged the term "Beta" in the minds of general Web consumers.

patio11 · on Sept 7, 2010

And never mind the fact that Google has forever damaged the term "Beta" in the minds of general Web consumers.

The general web consumer still thinks betas are a type of fish. Not really relevant to the median Digg user, though, since they are not the general Web user.

dawie · on Sept 7, 2010

Even better: Have a link that says: "Try the new Beta" at the top like gmail and google docs, with a "Go back to Old Version".

Implementing this should not be to hard and it allows them to get some feedback.

pilif · on Sept 7, 2010

Considering the completely different backends of v3 and v4, I think, this would have been incredibly hard to implement. At least if changes in one system should end up in the other.

In this case you would not just have to write scrips to one-time migrate all needed data from v3 mysql to v4 Cassandra.

No. You would have to build a mechanism that doesn't just do it in both directions, it would also have to work at near-realtime speed.

If you then need consistency of the transferred data, this quickly gets impossible. Try finding a way to ensure consistency between these two completely different architectures.

In the end, most of v3 would have needed to be rewritten for a parallel use to be possible, at which point you don't really gain much.

"not hard to implement" - sigh

Disclaimer: I don't work at digg and I don't know more about their backend than the rest of the public. I did however just get around doing something like what you describe and there it wad "just" a different schema on the same database backend and even just that would have been hell

MartinCron · on Sept 8, 2010

In a way, the one-time migration might be harder than near-realtime bidirectional synchronization. That way, you could move portions of users to the new system and back as needed. A sudden leap from one backend to another is like jumping over the grand canyon on a motorcycle. Personally, I would rather build a bridge.

pilif · on Sept 8, 2010

For the concurrent configuration to work consistently, digg would probably have to port v3 to the new backend.

Now in the current case, the most issues apparently come from the non-working backend as opposed to the changed featureset.

So while they could have run the two versions in parallel, they would not have gained anything. Likely, this was their rationale behind not doing so in the first place.

MartinCron · on Sept 8, 2010

No, for the current configuration to work, you would have to create a way to move data in both directions from the "old" backend to the "new" backend. Ideally, you would slice up the beast so you didn't have to do this with everything all at once.

It's not a system that would be ideal for something transactional like a bank, but it may have been possible for an organization like Digg.

bryanh · on Sept 7, 2010

As a developer myself, I love hearing:

> Implementing this should not be to hard...

Yaa101 · on Sept 8, 2010

No, what they done wrong is not to make a proper 2 way conversion tool for their database backup. Starting from scratch is no problem if you got your core data covered, then you can always revert back.

Don't ever develop yourself into a one way street!!

plnewman · on Sept 7, 2010

Actually if you google "digg cassandra" you'll get a number of blog posts dating back to 2009, describing different things they use Cassandra for.

mmaunder · on Sept 8, 2010

This is my all time favorite Spolsky blog entry. This lesson has saved me a few times. The broader lesson is that new and shiny in software isn't better and is often worse.

binspace · on Sept 8, 2010

Yep, big bang rewrites are always scary. Doing something more incremental would have been safer.

One way to "safely" do a big-bang change would have been to use both architectures in parallel and sync back and forth. That way, if the new architecture fails, you still have the old one.

Of course, that is lots of effort, and restricts new features (which may be a good thing anyway).

cookiecaper · on Sept 7, 2010

Is there any reason that Cassandra is the focus of this article? It is really silly and irresponsible to peg a nascent project like that without any reasoning or sources. I'm sure something changed besides just a Cassandra rollout, and wasn't Digg using it on v3 too?

I think Cassandra is pretty well tested. There have been lots of super-large-scale deployments. It just seems lame to blame it on that, but I guess maybe their anonymous sources inside Digg revealed it? But then we'd hope they'd know if the problem was with the datastore or the implementation.

sliverstorm · on Sept 7, 2010

Placing all the blame on Cassandra helps Digg justify letting go the VP, since he was allegedly the guy who pushed it.

TheCondor · on Sept 7, 2010

So what's the digg story here?

I get that some VP suggested a new buzzwordy technology, they gave him enough rope to hang himself and he did and left the company with a broken pile of crap. It could happen, if you have a healthy company you give trust to people. That it got this far doesn't speak well of the rest of management. It doesn't speak really well of the rest of the team either. Shouldn't there have been some circuit breakers or something?

Digg isn't a poor, bring-your-own-laptop startup. They've got resources, they've had substantial investment. They can afford to build and test software and I know of no real marketing reason they had to push something untested out. Rose could go out and say it wasn't done, it's going to take more time.

How does Rose keep his job? Wasn't he this VP's boss?

And it's single technology? That couldn't have been vetted and tested independently of all of digg? Really? And MongoDB, or hadoop, or one of the dozens of other nosqls wouldn't work, either? You do what you have to do and there is never a 'truth' with VPs and CxOs are canned, but it all doesn't float with me, just looks like another over valued and under talented company, got lucky and the blind squirrel found the nut, there isn't gonna be a second act.

Maybe it just seems low brow to me, name and finger the guy, blame the opensource tool you use, never explain or elaborate why you launched anyways when you were fixing bugs in the tool at the 11th hour.

ojbyrne · on Sept 8, 2010

I'm sure the real story is different. My rough guess at it is

* the main marketing tool that digg has left is "Kevin Rose as genius"

* Kevin and the people he drinks with pushed for the adoption of new cool technology.

* the entire thing was a giant clusterfuck because of Kevin and the people he drinks with.

* but once that became obvious there was a need for a scapegoat so that digg could keep its primary marketing tool.

dillydally · on Sept 8, 2010

That reads like sour grapes.

davidw · on Sept 8, 2010

That doesn't necessarily make it wrong, either:-)

ojbyrne · on Sept 9, 2010

Perhaps, though I think my sour grapes specifically about digg are somewhat dead. What really makes me angry is the joke that is the VC industry, which is really the root cause of this. In April I talked to an associate at Andreesen-Horowitz (where Kevin Rose is the mayor on foursquare), who said that digg, after many difficulties, was on the right track. From their perspective, engineers are cogs in the machine and the only thing that matters are executives. I worked closely with the executives at digg, and they spent the vast majority of the time feathering their nests, and the digg v4 rollout is just the logical conclusion of them draining value for five years.

Also I said this was a "rough guess." Comments on TC from people I know who worked there (there's been a mass exodus for the last two years) suggest I'm mostly correct.

Finally, I think "sour grapes" is a poor rejoinder. Especially since it's the standard PR line at digg.

binaryfinery · on Sept 8, 2010

Reads like "industry experience" to me.

nailer · on Sept 8, 2010

Perhaps it's both?

sliverstorm · on Sept 8, 2010

Never said I supported it. If my comment came across that way, I apologize.

TheCondor · on Sept 8, 2010

No, I apologize. I didn't mean to sound like I was attacking you or suggest that you supported it. It just seemed like a good place to interject. It's such a transparent move on Digg's part.

jshen · on Sept 8, 2010

Reminds me of the way NASA blamed lisp back in the day

epoxy · on Sept 7, 2010

Part of the reason that Cassandra may be the focus is that Kevin Rose places much of the stability problems on Cassandra's shoulders. Two or three minutes into the most recent diggnation he talks about Cassandra and describes Cassandra as "very beta-stage software" and says how days before the launch at least some of their focus was on fixing "Cassandra problems" rather than issues with Digg v4.

http://techcrunch.com/2010/09/07/kevin-rose-responds-to-digg...

brown9-2 · on Sept 8, 2010

If true, it's a lame excuse: blame should be placed on the people that decided to use the beta software to power their very-important-to-their-paycheck website

mikeryan · on Sept 7, 2010

So if you watch the Revision3 video where they talk about the stability issues Kevin throws Cassandra out there nothing really specific but when discussing downtime its the only technical factor he mentions.

He calls it "still beta software" and states they were fixing bugs in Cassandra during the days leading up to the release of v4.

hga · on Sept 7, 2010

In reply to you and epoxy, sounds like a gross failure of engineering management. If you're having problems of this sort of foundational nature a few days before the planned release, it strongly suggests you should delay and figure out what's generally gone wrong with the project.

riffraff · on Sept 7, 2010

as far as I remember: yes they were using it on v3 too, for some things

wsongk · on Sept 7, 2010

About a year ago, they were using it to store a list of friends of viewing user who dugg the viewing news.

original blog post: http://about.digg.com/blog/looking-future-cassandra

Treads on HN: http://news.ycombinator.com/item?id=813528

mikeryan · on Sept 7, 2010

I actually wondered if they were running into issues with Cassandra. I'm not a NoSQL hater - but its still pretty bleeding edge and it always seemed like making it your core db was super risky.

arg - really want more insight, maybe Quinn will elaborate now that he's gone.

nl · on Sept 7, 2010

Digg's put out a lot of information about their Cassandra implementation, eg https://nosqleast.com/2009/slides/sarkissian-cassandra.pdf (from November 2009).

Back then they seemed to have a rather sensible migration strategy (ie, basically running the new Cassandra back-end in parallel with the MySQL backend).

It seems to me that it was the v4 upgrade that broke, not Cassandra alone. It's possible their frustrations with Cassandra were more long term, and the fact the v4 upgrade didn't go well was the last straw.

For example, Digg has done a lot of work on Cassandra internals and tools. If you are using a new, open source product you kind of expect that, but it's possible the expense of that didn't seem like good value once v4 started to get into trouble,

btipling · on Sept 7, 2010

For some reason this reminds me of the "no one has ever been fired for choosing IBM." slogan. I guess maybe there is some truth to it. Seems like new tech needs to start slow on big places, only startups really have the freedom to risk it all.

aaronblohowiak · on Sept 7, 2010

your post implies digg isn't a startup.

soofaloofa · on Sept 7, 2010

Digg isn't a startup. It started in 2004 and has been mature for about 5 years. They are a business now. And a big one at that.

Silhouette · on Sept 7, 2010

That depends on how you define your terms. I'm not sure a company that has been around 6 years and has ~100 staff can be called a startup any more. On the other hand, I'm not sure a company that still survives more on the wishful thinking of investors than on the money it brings in after 6 years can be called a business, either, and certainly I wouldn't call it either big or mature. I'm honestly not sure what I would call that sort of organisation, though if I were an investor, I suspect the word "liability" would feature somewhere.

rbanffy · on Sept 7, 2010

> I'm not sure a company that still survives more on the wishful thinking of investors

Unless you are owned by a larger company that runs you at a loss as part of their strategy. Or stragedy.

_d8fd · on Sept 8, 2010

You call it a small business.

points · on Sept 7, 2010

For the 40m investment, I wonder how much profit they make. They seem to have a very large staff.

potatolicious · on Sept 7, 2010

I have questions about what they're doing with all that staff, especially now that it's become apparent that Reddit is doing similar (if not larger) numbers with a handful of guys (literally, countable on one hand).

Evidently Digg is better at the monetization game than Reddit has been, so more sales staff I can understand... but devs?

Devilboy · on Sept 7, 2010

I can understand that Digg needs more people than Reddit (even though their traffic numbers are about the same) but TEN TIMES more people? That's just not sensible. I'm sure Digg can make a nice profit without further growth if they just got rid of about half their staff.

pavs · on Sept 8, 2010

I wonder if its easier to get funding (and later position yourself to get sold out), if you show that you have quite a few people working for you.

Personally I have no idea nor do I see any reason for that many people working for digg, but my understanding is that Digg (or rather Kevin Rose) has always been about having the perception of big without actually being that big.

dasil003 · on Sept 8, 2010

You know what's even better? Having the perception that you are big but without actually hiring 100 fucking people to do it. I mean if that's what you need to impress investors at a cocktail party, you're a pretty sad entrepreneur.

pavs · on Sept 8, 2010

I agree, I don't think highly of kevin as an entrepreneur.

squidsoup · on Sept 8, 2010

Fake it until you break it?

aaronblohowiak · on Sept 7, 2010

Tell that to investors. I see their position as precarious and they haven't IPO'd or been acquired. They have a lot of staff and have longevity, but I don't think that is enough for them to stop being a startup.

code_duck · on Sept 8, 2010

Companies like Etsy insist they are still startups, too - five years, $55 million dollars in funding, tens of millions in revenue, and 130 employees later. I don't buy it, either. They are an established business, but routinely use this 'startup' label as an excuse.

hack_edu · on Sept 7, 2010

These days, "startup" can mean pretty much anything smaller than a Fortune 500 company...

MartinCron · on Sept 8, 2010

Being a startup isn't about the company, it's about the product. There are startup teams at fortune 500 companies. A startup company is a company with one new product in development.

Personally, I don't consider Digg a startup.

c00p3r · on Sept 8, 2010

So, why not just pay to Cassandra's authors for a quick fix? Because they can't?

riffraff · on Sept 8, 2010

because in part they are the cassandra authors, I'd say. They have been contributing code to cassandra for quite a while.

c00p3r · on Sept 8, 2010

That is it! So the problem seems to be a complex one. Some part of it I think is an ordinary Java issues - what happens when there is not enough memory, and system start use swap, what happens when a connection rate is too fast while system is doing a heavy IO? What happens during recovery of network operation or replication failure and so on. The second part is the complexity of the software itself, but not the complexity of the algorithms or tasks - it isn't a rocket science, but added artificial complexity due to all those CrappyFactoryManager().GetSpecialShitFactory().instantiateANewCrap() and so on - seem like no one can comprehend the whole mess itself. In the other hand, this failure probably will cause some improvements or at least more attention to Cassandra project, and everyone who uses it will benefit.

agentultra · on Sept 7, 2010

This man's reputation is on the line. I hope they release more details on what exactly it is that is causing the problem. As it stands he appears to be an unfortunate scapegoat.

heffay · on Sept 7, 2010

Hopefully he creates a blog post explaining the disaster. He seems to regularly update that site

madridorama · on Sept 7, 2010

Digg turned into nothing but a paid distribution and SEO engine anyway

blasdel · on Sept 7, 2010

Digg was something else before?

josefresco · on Sept 8, 2010

Yeah originally it was Slashdot but better, more current and user powered (which was a new concept in the mid 2000's). What it turned into was a traffic generator for a small group of power users who fed the community the lowest common denominator of content.

jshen · on Sept 8, 2010

User powered will always turn into lowest common denominator if the user base grows beyond a niche

narrator · on Sept 7, 2010

Reddit runs on Cassandra + PostgreSQL. They use Cassandra as a key/value store and not as the primary database though.

cookiecaper · on Sept 7, 2010

They are planning to transition to Cassandra as the primary database. PostgreSQL is not used in a relational way on reddit -- it is used as a makeshift k-v store. You can take a look at this yourself as reddit is fully open-source: http://github.com/reddit .

zzzeek · on Sept 7, 2010

> They are planning to transition to Cassandra as the primary database.

do you have a source for that ?

> it is used as a makeshift k-v store

That is very true, they should switch. It just seemed like they had no plan to do so.

ketralnis · on Sept 8, 2010

> > They are planning to transition to Cassandra as the primary database.

> do you have a source for that ?

Yes. Me.

bbatsell · on Sept 8, 2010

I could swear an admin said it with some pretty fair definitiveness; I thought I remembered that it was ketralnis, but this was the best I could find from him, and it's not nearly as definite as my memory says.

http://www.reddit.com/r/programming/comments/bcqhi/reddits_n...

pavs · on Sept 8, 2010

According to reddit admins some of the recent downtimes in the last few months were due to cassandra and they were having some bad performance and stability issues. I can look it up if anyone is interested but it will be a bit of a work, since they stated it (several times) in the comment section and perhaps once on a blog post too.

ketralnis · on Sept 8, 2010

That would be me, and you're misquoting me.

I screwed up our Cassandra deployment, and wrote about how I screwed it up. We were under-provisioned, and the version we were using didn't deal with the case of being overloaded in a graceful way. We're no longer under-provisioned, so I don't know if more recent versions deal with it better.

We've never claimed to have performance issues, I don't know where you're getting that one.

pavs · on Sept 8, 2010

yup faulty memory, I tried to be more clear here: http://news.ycombinator.com/item?id=1670968

No malicious intent, after all, I did ask you for verification on twitter, linking here.

jbellis · on Sept 8, 2010

The cassandra issues were primarily ops failures, and secondarily an older version of Cassandra making it difficult to recover once it was overwhelmed. (Some of the resulting improvements in Cassandra are documented here: http://www.riptano.com/blog/whats-new-cassandra-065)

Cassandra looks to be working fine for reddit now: http://blog.reddit.com/2010/08/everything-went-better-than-e...

pavs · on Sept 8, 2010

This could very well be, I remember one of the problems they were having is not having enough resources for cache or something similar. But I could be wrong I will ask them on twitter and see if they can comment here on it.

ketralnis · on Sept 8, 2010

> Cassandra looks to be working fine for reddit now

Yep

ketralnis · on Sept 8, 2010

> They use Cassandra as a key/value store and not as the primary database though

These are orthogonal.

megaman821 · on Sept 7, 2010

This article is a fine example of the poor logic that pisses me off nearly every day. Just because Digg v4, which heavily uses Cassandra, can't be used as a Cassandra success story does not automatically mean you can assume the opposite, that it is a Cassandra failure story. There is no indication why Digg has been down so often and there is really no conclusions to draw as of yet on the technology they use.

_3u10 · on Sept 7, 2010

The article basically comes right out and says it. You just need to read between the lines a little bit.

From the article: Quinn was the main champion of moving over to Cassandra, say our sources. Now the site is taking a huge hit, at least in the short term, because of that decision and/or how it was implemented, and Quinn is paying for it with his job.

It's always a toss up between whether it was implemented correctly or not. The correct course of action of course would have been to slowly move the site over to the new technology piece by piece rather than a wholesale switchover. The risk is in the migration strategy not the technology picked. They could have been equally stupid switching over to a new architecture with mysql.

jshen · on Sept 8, 2010

I'm not sure I'd use tech crunch as a reliable source.

mcantelon · on Sept 7, 2010

Reddit went through similar issues a few months back (downtime, slowness, etc.), but they overcame these issues without turfing people. My guess is Digg pushed the engineering VP out to make the investors happy rather than to actually move forward.

pavs · on Sept 8, 2010

How does it make investors happy? Why would investors care what kind of database you use?

fookyong · on Sept 8, 2010

it's an attempt to control the PR message.

by doing this, they are laying the blame on one person so the media can stop hating Digg and start hating the ex-VP of Engineering who "killed Digg".

of course, whether he was actually responsible in some way is something we may never know. for all we know he may have been completely against releasing v4 but was vetoed by Rose et al. or on the other hand he may have overpromised and under-delivered, putting the company in jeopardy, in which case he deserves to be let go.

it's all speculation until we hear an official comment from either side.

code_duck · on Sept 8, 2010

The typical investor doesn't take the time to really understand the companies or technical issues, unfortunately. They see 'there was a problem, management fired the guy who was at fault' and are happy.

pclark · on Sept 7, 2010

Unless Digg cannot roll out new features due to Cassandra (eg: a) Cassandra woes taking all dev team time, or b) tech limitations prevent features) it seems highly unlikely that cassandra is the reason why Digg 4 is grating with users initially.

It's a change of features/product rather than technology that is the problem.

enjo · on Sept 7, 2010

I think the folks at Digg where prepared for the heartburn associated with the content changes. They weren't ready for some very serious issues with the implementation. Digg has definitely seen a lot of downtime over the last week or two.

mikeryan · on Sept 7, 2010

Digg has had some significant downtime during this transition (Can't blame Cassandra - but can't clear it either). Issues with product features are one thing - but nothing probably got more people jumping to a different ship (reddit) then not being able to find a good lolcat when you're bored, and your favorite news aggregator is down. Hmm - I wonder if fark has seen the same uptick as reddit has.

heffay · on Sept 7, 2010

I'm really interested to see what comes of this and what went wrong. It sounds like (from reading his blog) they were making a lot of customizations to Cassandra?

pquerna · on Sept 7, 2010

They have made many feature additions to Cassandra.

They haven't said much about the details of why they are having trouble.

It could be a core Cassandra problem, something they added, or completely unrelated to Cassandra; But the internet doesn't care. It's drama at its finest.

ergo98 · on Sept 7, 2010

"It could be a core Cassandra problem, something they added, or completely unrelated to Cassandra; But the internet doesn't care. It's drama at its finest"

Digg made a big deal about their move to Cassandra (just as Digg's move to Cassandra was used to legitimize Cassandra, and by correlation NoSQL, among a wide range of zealots), going back over a year.

The thing about talking big like that is that it often comes back to bite you in the ass if things don't go well.

If Digg quietly released a new version that worked more reliably and provided a better experience, they would have been in a perfect position to pontificate on technology.

mminolt · on Sept 7, 2010

Ok Digg is having a hard time. I feel like I witness a strange blood thirsty tone in some of the comments though. I wish Digg the best and I hope they can prove all the skeptics wrong. It's going to be tough though!

houseabsolute · on Sept 7, 2010

I wonder what issues they are running into that a dark launch would not have found. As I have discovered painfully in my own projects, making big changes without a rollback plan is usually a bad idea, and it sounds like this is no exception.

gphil · on Sept 7, 2010

Yeah, it sounds like the launch process is more at fault than the technology for this if there's really no way to rollback to stability.

zephjc · on Sept 7, 2010

Digg v4 has added back the features I liked after taking them away - mostly the Upcoming section, which tends to have interesting things that never make it to the front page, and the setting that allows you to default to Top Mews, instead of things I have already read ('My News').

For most users, change in features is what drove them away, not necessarily the spotty QOS (though that was pretty bad for a while).

parfe · on Sept 7, 2010

I have to say I like the new Digg. Sad this guy is getting his career stomped on over it. Lot of people complain that the new site lets big sites submit their own content automatically, but how was it any different with a middle-man user submitting it himself?

gamble · on Sept 7, 2010

The real problem is that v4 removed the illusion that average users had an influence on the rankings, without altering the fact that they don't.

hop · on Sept 7, 2010

Yeah, I may be blind, but how do you even get to the upcoming stories?

gamble · on Sept 7, 2010

You can't right now. It's on the list of things they haven't finished reimplementing from v3.

Dervish · on Sept 9, 2010

Digg 4.x is like Star Wars Galaxies NGE.

c00p3r · on Sept 8, 2010

Let me guess - the problem is about the difference between theory and practice.

In theory, Java is great and Cassandra is great. In practice - Java under a heavy load is a disaster, because it was never designed for it, and Cassandra is a just a hype and propaganda.

Face the reality - it doesn't work in production as it supposed to - as a primary storage engine.

People at the Digg aren't amateur idiots, so I think they do everything as it described in docs, but the damn thing just doesn't work.

nl · on Sept 8, 2010

> In practice - Java under a heavy load is a disaster

Google's heavy use of serverside Java would indicate otherwise.

> because it was never designed for it

Yes it was. Java has a lot of problems, but one thing that isn't a problem is heavy load.

> Cassandra is a just a hype and propaganda

Facebook seems to be using it ok.

c00p3r · on Sept 8, 2010

Google probably use server side Java without a really heavy load, and just add more servers to keep the load low.

System which was designed for being isolated from an OS (leave alone hardware) will have a bottleneck exactly in this level.

Facebook doesn't use it as a primary storage with a high load.

cameronh90 · on Sept 8, 2010

What would you prefer to write web applications in? C++?

Most languages used in web programming are divorced from the hardware, and either use a virtual machine (e.g. Java, .NET) or an interpreter (PHP, Python, Perl, Ruby).

Virtually no-one uses a low level language for web programming, and the benchmarks say that the virtual machines are faster than interpreters. Java is very fast and efficient once it's running, but initial start-up time is often slower than interpreted languages due to the JIT compilation, but servers rarely "start up". If you're really hitting a barrier with Java's performance, just as with other non-native languages, you can write the performance critical section in C.

http://shootout.alioth.debian.org/u32q/which-programming-lan...

rbranson · on Sept 7, 2010

Queue NoACID haters declaring this the final nail in the coffin.

dangrossman · on Sept 7, 2010

This isn't Digg.