Hacker News new | past | comments | ask | show | jobs | submit login
Scaling GitHub (zachholman.com)
348 points by pascal07 on Jan 26, 2012 | hide | past | favorite | 67 comments



I'm usually wary of Holman's presentations and writing. GitHub is such a unique, special place — an engineering company with a product engineers use and are devoted to.

This presentation is pretty good and has some good tenets. However, I have trouble believing that all of these ideals will work when you have a PM screaming at your engineers to finish the internal CRM system, or something else that engineers must work on but don't necessarily care about.


  > However, I have trouble believing that all of these 
  > ideals will work when you have a PM screaming at your 
  > engineers to finish the internal CRM system [...]
That's the point. The point of these things is that instead of having a screaming PM demanding that deadlines be met, you hire self-starting employees who have fun with their job. It's an internal CRM system, so why not shove it full of in-jokes and vi-keybindings for navigation? Why not give it a REST API and a CLI app to consume it?

It may not be possible to get an existing company to this state, but perhaps it's possible to start a fresh engineering department that's run on these principles?


There's also the issue of exposure industry cred. I mean, we're talking about GitHub here, one of the hottest websites amongst developers. Working on it comes with a certain amount of prestige and every feature they add results in a massive amount of industry press on sites like HN. Of course engineers working there are going to be extremely driven and motivated, it's a very rewarding product to work on.

Now imagine you took the engineering team of GitHub (or one like it) and transplanted them to work on Widget Inc.'s CRM system that's primarily used by their sales, marketing and HR people. Instead of your end users being similarly savvy and motivated developers they're now your typical corporate office staff. They're not using your product by choice but only because it's what the corporate higher-ups handed to them. They couldn't care less about new whiz-bang features you add to the product, they're mostly concerned about getting their TPS report out on time. Interface changes only confuse them and slow them down. They're never going to tweet or post a blog post about how awesome the new thing you just rolled out is.

Do you think the engineers would be similarly motivated to work on such a product? I doubt it.


I agree with what you're saying, but the solution is not to make the situation worse by having a PM screaming at them about TPS reports. That will only drive away what talent is in the pull of Widget Inc; a net negative for the company.

Widget Inc does not need to have the best devs, so this is largely moot. Widget Inc does not compete on IT, does not need their apps to be good, does not care if the best devs work there, no matter what they say.

And so, github has the prestige.


perhaps it's possible to start a fresh engineering department that's run on these principles?

The key word there is department. GitHub is an engineering company, with engineers from top to bottom. It's extremely unlikely that an engineering department could live in isolation of it's company as a whole. So there would indeed be a PM screaming at someone to finish the internal CRM system and move onto something that the (non-engineer) customers really want.


"It's an internal CRM system, so why not shove it full of in-jokes and vi-keybindings for navigation? Why not give it a REST API and a CLI app to consume it?"

Because the users of the system don't care about any of that crap, so it's a waste of time and money, which most companies don't have extra of to spare.


GH is a special case. They're darlings of the tech community, and they throw a ton of events.

Even if they were crappy to work for they would probably have an easy time hiring.

THAT SAID, "finishing the internal CRM system" is a problem that happens in organizations an order of magnitude larger, methinks.

My finishing platitude is, how did you hire - and why do you retain - engineers that don't give a shit about the project they're working on?

Also, why is your PM screaming? A project meeting deadlines is his or her responsibility. Sounds like you have a crap PM to boot.


I work on several projects that I have never used. However, my employer doesn't completely shield me from people who do use it. At least for me, the satisfaction comes from interacting with people who the project helps and seeing how it helps them.

I think part of why (some)? engineers may not give a shit about the project they're working on isn't necessarily because they have no interest in it, but because they haven't met anyone with an interest in it. In github's case, it just happens to be themselves.

I like to know what I'm doing actually matters. Meeting people my stuff impacts helps with that.


My best experiences in corporate IT were when I escaped the department for short times - a flight to an office whose processes were on fire, a deployment of a new warehouse management system where all the workers were counting on me, a 3-month 5-person project sprint to deliver a late, overbudget project without adult supervision.

Whatever the purpose of IT middle management, it does not appear to accomplish much.


I like to know what I'm doing actually matters

I wish I could give you ten upvotes. I get a lot out of interacting with the people that actually use my software. It makes things more personal and that makes me more motivated to work on a project. And the feedback helps to make what they actually want instead of having to second-guess, or build to some abstract specs.

That the project is technically interesting/challenging also goes a long way, of course, but that's not everything.

Worst would be a project that is boring and you don't know whether anyone, including yourself, is ever going to use it.


Cause and effect? I mean, yes, they're darlings of the tech community and throw a ton of events, but that is explicitly mentioned as part of their strategy for keeping their employees happy, which is the underlying principle for their “scaling”. So perhaps it's a self-reinforcing cycle: they were originally darlings because of their service, and as they've grown, their attempts to keep their employees happy result in more goodwill, which further makes them darlings.

In essence, I think if they were crappy to work for, they wouldn't have an easy time hiring, because a lot of the cool stuff they do just wouldn't happen.


Oh, I'm really, really open to that interpretation.

I'm currently unemployed by choice and what they describe is basically a dream job, imho.


I'm not a fan of loudmouth hipster programmers either.

But it's easy to see how a programmer was hired and came to work on a project he didn't like. He was hired to work on a project he was interested in, and was moved over to a project he dreads. They won't stay long though.

Which is the difference GH has from many other places: The work is interesting (to many). Therefor they have an easy time hiring and retaining good employees.


>I'm not a fan of loudmouth hipster programmers either.

I throwing hipster around is a waste of time, especially considering these are extremely successful hipster programmers.

Create product everyone loves, create environment everyone loves to work in, live in San Francisco: fucking hipsters.

>He was hired to work on a project he was interested in, and was moved over to a project he dreads. They won't stay long though.

There is always shit work to do. It's a constant of the universe - there will always be work you're not super interested in. The difference is, is the rest of your job good enough to suck it up and take one for the team?

(Do you even think of it in terms of taking one for the team, or do you think you are just being shat upon?)


You're right, they wouldn't. GitHub would never hire the PM you're referring to, because that's the type of PM that makes engineers hate their jobs and become demotivated.


Never say never.


"you have a PM screaming at your engineers" - and there lies your problem.

I'm the sole engineer for a distinctly non-engineering company (we deliver food, and I'm the only person with any sort of engineering background).

However, we work in a very similar way to GitHub. We have a motivated team who love our product - and that rubs off on our working practices. No one is ever told they have to work late, or they have to hit a deadline, but things get done at an amazing pace. At some point soon I'll be working on an internal CRM system, but because I care about the company, and the impact it will have on everyone else's ability to do good things I'm excited about that.


One thing the slides mention in passing that I cannot stress the usefulness of enough is an internal wiki. There are so many common tips, issues, ideas, etc. that can be more effectively communicated and expanded upon when they're in a centralized location for all to see. So many conversations of the form "I've never dealt with that, but I know $coworker ran into that once. Go ask him." can be reduced or eliminated if the ideas are already visibly detailed. In short, setting up an internal wiki helps improve/scale your team's productivity.


+1 But the issue I always face is that many coworkers aren't diligent enough to document in the wiki. How would you address that?


It's gotta come from the top.

You can't dictate awesome culture like this. If the highest folks in the company update the wiki with useful information and point to it constantly, the newer hires will do the same.

If someone dictates that the wiki should be the source of information and then never puts anything in it it won't go anywhere.


Move the wiki somewhere where they don't even have to leave their normal flow of work. We moved ours to github. Just make a private & empty repo, add a wiki to it, and now you have a wiki you can edit in your favorite editor, search with grep, and save / revert with git. Back of the napkin estimate, I would say we have 4 to 5 time the activity on the wiki that we did before. We found that developers really don't want to leave what they are doing, open a browser, possibly log in somewhere to edit in some embedded editor that doesn't have emacs/vim/whatever bindings. If your team still doesn't use it, well then it might be time to polish off that CV.


I like this - I certainly lose steam when having to login to another service to do it. Certainly worth trying. Couple of follow up question: Which wiki do you use?


I had the hardest time convincing anyone to use the wiki, possibly because several of the people who needed to be using it weren't technical and didn't really understand the benefit.

But then we moved away from our ISP's awful email service to Google Apps for Business and people started using Google Docs. Docs is now our de-facto wiki.

I think the reason people use it is a combination of familiarity (everyone understands a word processor), constant reminders (everyone gets an email when a new document is added), and peer pressure.


I'm not a huge wiki proponent for mostly that reason. However, the other big factor is that a lot of the time the Wiki becomes an excuse for not automating something. A surprisingly good place to start is to add a README and a Makefile. Both of those things show up in recursive greps ;-)


Two of the replies to you illustrate the classic joke about the traveler who gets lost and asks the village peasant for directions to his destinations - 'Well, if I wanted to get there, I shouldn't start from here.' (And are as useful.)


Figure out what kind of traits the workers you want (diligent, documenting) exhibit in a job interview and hire that type of people.


Build that time into project budgets.


I once worked at a "startup" that was the anti-culture. It hired a programmer after the current team had turned them down. That person was hired b/c she was a recommendation from of the VPs friends. At one point, we had more "Managers" in the company than people to manage...needless to say, no one got anything done.

I think most companies don't realize that businesses are always about people; they are composed of people, run by people, and sell their products/services to people. When it comes to the composition of a company, having camaraderie and talent goes hand in hand. The moment you hire someone who is average or doesn't fit, the entire work force degrades gradually.

Thomas C. Shelling developed a statistical model that actually describes this behavior and shows how a class of people clump together (http://www2.econ.iastate.edu/tesfatsi/demos/schelling/schell...). This is why talent attracts talent and mediocrity attracts mediocrity.


The moment you hire someone who is average or doesn't fit, the entire work force degrades gradually.

I agree, as long as you're talking about a socially conservative work force. There are plenty of companies comprised and tolerant of unique people. This doesn't make anybody "average," either, and I'd venture that the less tolerant of outsiders a workgroup is, the more average they are. They're certainly insulating themselves, regardless, which can't be good in the long run.


I think he was talking about average in terms of talent while you seem to be thinking in terms of social norms.

The way I've seen is put is: A's hire A's and B's hire C's (couldn't find the original source of that)

Highly talented people like to hire other highly talented people. They recognize that they will learn from them and the team will be better. Average people (the B's) will often hire other average people or less talented people. Have heard it explained as they're less secure and more protective, or that their criteria is more generalized and less stringent.


A's hire A's and B's hire C's

OK, now define "talented."


I'd be interested to hear more about the Kinect/Arduino video recording platform. We are always looking at better ways of capturing and storing video archives for later use internally.


Github is a bit special but I guess this can work every time you have a few basic ingredients: a product that has final customers (no busines-to-business), and a company that is completely focused in developing this single product.

It is not very common but also not rare either for web startups to have this kind of setup, so I think this may work for many... of course a fundamental thing here is that they hire smart guys, you can do something like that only if every piece of the company is skilled and independently able to handle his own work.


I think saying that GitHub is a special case is a copout. The real meaning of this presentation is that if you treat your employees well and let them do their job, they'll build great stuff.

Yes, not all companies will have the same success as GitHub with this approach, but I bet for most it would be a huge improvement over what they have now.


I don't necessarily think it is a copout. They are bootstrapped, are making a ton of money, and hit the market with a good product right off the bat. Not many companies can afford to do the things they do and still pull a profit. How many startups can afford and justify having an artist (not graphic artist mind you) permanently on staff? Don't get me wrong, I love github and they have a great team, but not every company can do the things they do and still be in business.


Promoting these practices does little good for companies that aren't GitHub, and believing in them might be woefully idealistic.

Let's look at GitHub:

* Most of GitHub's 50 or so employees are engineers[1], not sales or marketing or product management or art directors or logistics or accounting.

* GitHub's primary product is github.com[2], a product which is primarily used by engineers.

* The company hasn't taken funding outside of friends & family[3].

* They claim to be "very profitable"[4].

* Their team is distributed[1], which can work well for engineers who enjoy time alone.

Compare this to my startup: Engineers are < 33% of the company, we're all in one location, we have a range of web and mobile products, and we've taken funding from VCs.

Our customers are normal humans, not engineers (no offense, I'm one too), and our products need to be sold and marketed so that people understand why they're good. We don't have the benefit of a customer base who intrinsically know why the product is great once they use it like engineers do. Our products want to make our customers come work for us.

We're all in one place because we need to collaborate with non-engineers to build our product. Good in-person relationships with the sales and marketing people lets us work together very well and is important for non-engineers, who especially can't use a DVCS to collaborate.

We iterate quickly, so systems like Campfire or wikis are useless. There's no point in writing documentation and employee on-boarding instructions if they're going to be out of date in two weeks or if it's easier to walk over to The Guy Who Knows That Stuff and ask. Yes, our bus factor is horribly low, but it's not worth documenting a system that will change wildly in two months.

We also have a board of directors who are demanding. We have monthly and quarterly goals we need to meet. We don't have time to work on cool things like bots and music players because we're busy experimenting and building products which we're still defining. Sure, we add a few easter eggs here and there, but I need to finish the damn CRM improvements, otherwise the sales team can't handle our customers in the way we want, the customers won't have the best experience, we won't hit our January numbers, and the board will be angry.

I love GitHub. It's one of the greatest things to happen to open source and software project collaboration. But whenever Zach Holman shows off a well-manicured presentation or blog post about how awesome GitHub's ideals and work environment are, I have to roll my eyes a little. His intentions are good and he's a swell writer and presenter, but I can't help but wonder what kind of magical, mystical fairyland he think's he in.

(No offense, Zach. I think you're great. Let's get a beer.)

  [1] https://github.com/humans.txt
  [2] Probably. What else do they do?
  [3] http://news.ycombinator.com/item?id=1454597
  [4] http://techcrunch.com/2010/07/24/github-one-million/


We iterate quickly, so systems like Campfire or wikis are useless. There's no point in writing documentation and employee on-boarding instructions if they're going to be out of date in two weeks

That's a terrible thing to say.

Remember, documentation is like sex. When it's good then it's really, really good. And when it's not good then it's still better than nothing.


What graphing utility/service do you guys use at GitHub?



What about for the dashboards, like on slide 63? Is that generated by Graphite as well?


Just basic HTML/CSS/Javascript built around the Graphite JSON APIs. We have Hubot integration to build graphs on demand through commands in Campfire, a Graph Store to save graph configurations, and static HTML pages that arrange these graphs nicely.


They are beautiful, especially compared to the built-in dashboard UIs. If you're looking for yet another piece of infrastructure code to give away for free, I'd love to see it open-sourced.


You might be interested in https://github.com/paperlesspost/graphiti

It's somewhat based on some of the stuff Github uses internall for graph generation (so the README says).


This is of particular interest to me because I'll be starting my full-time job hunt early next year (or even earlier because I get neurotic about these things), and I really only want to work somewhere that understands the appeal/necessity of "work where you want/when you want/on what you want".

How do I filter potential job opportunities on that criteria without being rude to anyone? I don't want anyone to think I turned them down because I thought I was too good for them or something, but the reality is that in a world where workplaces like GH exist, there's no reason to work somewhere that doesn't "get it" yet (and maybe never will).


Github is a very, very special company, and there aren't many companies out there that operate this way. Unless the organization as a whole adopts the philosophy you're looking for from the top down, this kind of freedom usually doesn't exist.

I think a good start is to begin early and search for companies that fit your criteria, then find ways to demonstrate why you would be a good employee to those companies directly. Don't go through the normal channels; find a way to stick out and be seen before you even apply.

Best bet: contribute a ton to a high-profile, well-known open source project that will stick out like a bright, shining beacon of light to your potential employers.


Filter based on if they're touting the work ethics you want. If they're not touting it from the tops of the hills, they probably don't do it. Places like GitHub and 37signals yell about how awesome their work environment is, if you want the places you're considering working at to be like GitHub or 37signals and these places aren't doing the same, I'd be wary.

If you're looking at bigger companies, find a list like Fortune's 100 best employers (for the US, sorry don't know other regions) and read it. There's a good chance what's written about the best employers is mostly true (not completely true, but it's a good guide). Also, ask other people who work for a place you're interviewing if you can buy them a beer / coffee the day after your interview. Pick their brain about the culture. If you're paying for the drinks and considering joining them, many people will be honest (I would be).

Don't feel you're being rude. I'd recommend telling the places that don't fit your desired culture exactly why you're not going to work for them. Worst it can do is nothing, at best, maybe the people who do work there see some tiny improvement.


I'm in a similar company at the moment - it's not quite GitHub, but the atmosphere of smart people working on things they love is there. Since I started they've also started adapting to the "Work when you want, where you want" approach as well ;)

When looking for work a big requirement was that I could work from home, and have Wednesdays off - filtering down the opportunities was just a case of mentioning that in the initial contact with a company I was interested in. No one you want to work for is going to be offended at you pointing out something that could be a problem early on in the process so long as you go about it politely.


Pretty nice, but the slides are only interesting from page 45 to 74 the rest is totally useless.

I must be allergic to PR aimed at hipsgrammer (hipster programmer)...


Snide comments aside, I think you're missing a lot of value in slides 26 through 34, the section on communication. Reducing the 'spin-up' time for new employees and documenting even the one-off conversations seems like a great way to increase productivity.


>Every internal GitHub talk is automatically recorded, uploaded, and viewable to every future employee. ... on a Kinect-powered Arduino-based motion-detecting portable video recording platform.

Um, more please! Wtb a video of that in action or meetup presentation or something.


Related presentation: Hacking Your Organization[1]

[1] http://www.infoq.com/presentations/Hacking-Your-Organization


Can someone explain what exactly he meant by "throttling the google bot"?


It's telling googlebot to slow down and not be so aggressive.

You can use google webmaster tools to dial down the crawl rate.

Some think you can use Crawl-delay: in robots.txt but I have not tested this (and it may very well not work).


Liked it and compare with 'Netflix Culture' too.


Totally devoid of information. I would be more interested in reading actual technical reports of higly scaled infrastructures. These flimsy half-truths gleaned from such scaling, I can do without.


I saw Zach Holman present this in Cape Town today. These slides are merely a back-drop to what was an excellent talk.


"Totally devoid of information."

That seems a little harsh for something that is clearly intended to be an overview of general principles, not specific technical details.


Agreed, but sadly this is the state of passing information now in "new age" startup/ companies.

It's like Twitter: some nice pastel colors, very little text, no information but it's easy access and people can say "Oh yeah, I know about scaling I have read this on a blog post". Sorry to break it everyone but scaling is hard, totally unfunny or uncool, with tons of problems and stressful.

But to answer your question ans scaling is for most companies kept secret because it is such a key process that they don't really want to share it with their competition. If you find some I would be interested too, I am currently in the process of doing so in my company and it's a pain.

Anyway that put me in a bad mood :).

edit: you really shouldn't be downvoted for stating a cold truth


1) It's a slide deck for a talk. What were you expecting, a detailed tutorial? I also don't think you got the most important bit: focus on employees.

2) Scaling is also… really custom. I'm extremely unlikely to have to find a way to shard git repositories across a network of users. And they do cover their architecture and release back a fair number of their tools.


Re.: 1) OK-why post it then? Post the talk.

Re.: 2) Doesn't matter if it might not be useful, it sure would be interesting. :)


We've gone into excruciating detail about how GitHub's architected: https://github.com/blog/530-how-we-made-github-fast


Sorry to break it to you, but scaling isn't hard anymore. The tools we have at our disposal today makes it trivial compared to companies attempting to do the same thing 5-6 years ago. As soon as you figure out the scaling from 1 -> 2 servers, you're basically done until you hit top-1000 traffic. Because as soon as you spend a weekend doing that, scaling to 2 -> N is child's play.

Oh look, I'll pick from any number of battle-tested, high performance, and free load balancers. I'll spin up a couple more VPS nodes. I'll add a few more slaves to my DB setup. I'll install some memcached and a quick write-through wrapper in my code. Oh yeah, duh, don't do long-running operations on the request/response cycle, here's an open source queue system for you. And since I like going overboard, let's throw some Varnish all over the place. Congrats, we've just covered the scaling concerns of 95% of web apps.

Try doing that several years ago with the terrible PHP and Java services we were writing, and without most of the hardened tools we now get from thirty seconds of Googling and learning from everyone else's mistakes.

Disclaimer: yes, when you start to break through the top-1000 some hard problems start appearing again. You begin to run into the limits of what others have created, and patching / rolling your own solutions becomes commonplace.


Well first I said I was in a bad mood.

More seriously, scaling is not simple. Yes products have been introduced or have evolved and simplified greatly the task. But saying that scaling is simple is simply not true. Not everybody do simple web apps, some require very heavy processing behind the scene, Hadoop/ MapReduce operations that have to return complex results in memory constrained environment and all that while maintaining low cost. As an example when Relational database are not sufficient anymore and ACID is not casting its reassuring shadow you can be in trouble if you do things like payroll/ real time process etc...

And again you don't have to be in the top 1000, many companies do not even have a lot of traffic from the general public but have to deal with very hard scaling problem.

Just look at Reddit, they are still having trouble with the Cassandra database which is said to be easiest to scale.

My 5 cents on the question!


Reddit is a top 200 site. They do over two billion pageviews/month. A lot of their pages are loading 1,000+ nested replies. They use Postgres as their primary datastore.

Scaling is a solved problem until you're huge if you're not doing something really avante-garde.


> A lot of their pages are loading 1,000+ nested replies.

Reddit only shows 200 comments by default. It also used to be you needed Reddit gold to even show 1,000 on the page when it first loads, not sure if you still do (my gold expired)


> A lot of their pages are loading 1,000+ nested replies.

Reddit only shows 200 comments by default. It also used to be you needed Reddit gold to even show 1,000 on the page when it first loads.


Okay, you've made the point that scaling technology is easy ... what is less easy (for example) is scaling the business processes to support in increased number of users or handling the communication overhead of an up-sized development team. People-centric processes cause significant scaling trouble.

Yes, aspects of scaling may be easier, but scaling (in general) is still a complex beast that kills many companies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: