Hacker News new | past | comments | ask | show | jobs | submit login
Pinterest Architecture Update - 18M Visitors, 10x Growth, 12 Employees, 410 TB (highscalability.com)
118 points by Anon84 on May 21, 2012 | hide | past | favorite | 86 comments



They have a very Interesting Stack as seen on Quora[1]:

-Python + heavily-modified Django at the application layer

-Tornado and (very selectively) node.js as web-servers.

-Memcached and membase/redis for object and logical-caching

-RabbitMQ as a message queue.

-Nginx, HAproxy and Varnish for static-delivery and load-balancing.

-Persistent data storage using MySQL.

-MrJob on EMR for map-reduce.

[1] http://www.quora.com/Pinterest/What-technologies-were-used-t...


Pardon my ignorance: what is interesting about Pinterest stack?

I think by now most startups, YC or not, in Silicon Valley will pretty much have a similar setup:

1) Choose the main web-stack (Rails or Django)

2) Choose the API framework (node.js, or something else)

3) Memcached for caching (or some NoSQL)

4) A message queue (ZeroMQ, RabbitMQ, or something ...)

5) nginx, HAProxy, Varnish, (or similar technology)

6) Hadoop for map-reduce (only if you need to go that path...)


Not trying to be a nit-pick, but wanted to mention that ZeroMQ is not a message queue in the sense that RabbitMQ (and Kestrel, Beanstalkd, SQS, etc) is. It's a socket library that you'd use to implement those technologies.

And while this is truly unimportant, my experience with bay area startup stacks is that Redis has very thoroughly supplanted Memcached for most use cases. Though most still have a memcached cluster or two.

That being said: 90% of software development in this world isn't done at tech companies and startups. So from that perspective, that stack is interesting.


Can you elaborate why Redis has supplanted Memcached? Is it because Redis occupies a space in between Memcached and MemcacheDB? (in the sense of Redis can be both in-memory and/or persisted on disk?)

(Not that I care about starting the holy war of X vs Y, just out of curiosity why)

You have a point regarding your last statement. Perhaps I've been here far too long to notice more of the 10% and less on the 90% :)

... which is true ... last I read someone is making almost $1M revenue developing Windows desktop app in 2012 and another company is making $60k/monthly revenue developing BB apps using in-apps Ads. I think I'm going to sign-off on HN and go to the other side... :D :D :D


Redis is far more feature-ful, and can serve the purpose as memcache. Why have two parts when you only need one?

Redis offers persistence, set operations, namespacing, you can delete multiple entries with wildcards...


Oh, I wish it was that simple.

You can't mix and match (you can theoretically)

Let's say you began with Django, right? Or there was something already ready in Django.

And then you begin to see the warts. But ok, you keep churning along.

And the more you churn the more of a specialist in the deficiencies of each technology you become. Like the fact that Django's ORM runs like a dog.

Or see all the "we moved to MongoDB and we regret it" discussions

From that list, there are two technologies I would recommend strongly (if you know what you are doing): nginx and Redis

The rest needs to be handled with care. And Redis is very powerful if you know how to use, but it's easy to go the "lazy way" with it, and it will be fast, but not as fast as it can.


Yes, but the Django ORM helped them launch faster. There is nothing wrong using it to start with and then writing your own SQL queries when you need to scale or even changing your data model to accommodate scaling.


I hate the Django ORM and I don't know why Django doesn't just deprecate it and switch to SQLAlchemy as the new default, which is a much, much better ORM and used primarily by non-Django Python apps.


Why don't YOU try?


Why don't I try what? Deprecating the Django ORM? I've already done so on a personal level (in fact, I avoid Django wherever possible), but since I don't control the Django project, I can't deprecate it for everyone, as I suggested in the parent comment. :)


Excuse me but couldn't they just optimize the way they use Django ORM?

... as in: check the output of SQL queries constructed by Django ORM and tweaked it in your Django Model layer such that the resulted SQL query will be tuned for performance?

Or at the very least, shouldn't Django ORM at least provide a way to retrieve data with "filtering" as optional parameters? (get only a few columns, but not all..)


Let me tell you something

If you're reading "just tables" without, or with simple relations and conditions, it's ok (because, as you pointed out, that's basically SELECT with)

The problem is when for example you have inheritance, or moderately complex relations.

Then you'll see Django spamming your DB with requests and the DB may be fast but the operation will be slow because of the sheer number of requests.

So yeah, in a sense you can optimize for it, or use SqlAlchemy (which is good, but it has a different philosophy than the Django ORM in the way it's used)


Inheritance is a different beast that ... well, let's just say that it may creates and endless debate whether one should or should not have inheritance in the model...

My experience is heavily in Java (JPA2 and Hibernate) and less of Rails (simple/moderate apps). Typically an ORM provides a way to handle custom queries but still help maps the result to the Java object model (which helps a lot vs writing getter/setter :)).

I don't know how people down south in Silicon Valley design their model (especially young scrappy startup) but I have a slightest doubt that they go that far (advanced/complex). It's not like startup start with "analyzing requirements" and move on to "modelling session" :) => they just go ahead and write code.

So in theory, the data model shouldn't be complex enough to warrant edge-case.


I agree 100% (even though I don't know what or if they changed the Django ORM - most likely yes)

Use the tools to launch faster, then optimize.


My company is going through exactly this scenario right now. We launched on and use Rails for most stuff, but there are a couple of important cases where we have to do direct sql.

Though I've had to work around some of Rails defaults for my use cases, there is no doubt that without it we'd not have been able to launch and iterate so quickly.


I didn't say that it is simple to come up with such setup. But whenever I see such list, it reminds me of some sort of blueprint architecture (or template) that I've seen somewhere else.


I agree. I think the interesting aspect is that they built a service that serves an impressive number of users on fairly common technologies.

There isn't anything very revolutionary about their stack by itself. FB didn't buy them for their technology... they could have replicated that on their own.

If they want to be talking about infrastructure, the story of using common tools at high volume is more interesting that just what tools...


Seems like a sensible stack.

However, is it not reasonable that would someone not consider node.js, as web stack as well?

I will agree the focus is more towards backend and asynchronous API's traditionally, but the amount of front-end/UX orientated modules are growing every day.


Sure why not. Some people don't even mind the lack of polished modules and instead spend time to write their own.

Not a big fan of node.js (or JavaScript for that matter) and probably never will so I can't comment much.


I would think that Nginx + Django + SQL, and the rest is premature optization...


Even simpler : Django + SQL in Heroku.


It's nice to see that two companies valued at >= $1 billion are running a Python/Django stack. #teampony


Yeah, I'm actually in the process of switching to Python/Django from Rails, so its great to hear how well it scales.


At 18m visitors, I don't think your choice of Django or Rails matters. DB tends to be the limit for even the normal applications, let alone something with that kind of volume.


There's a kabillion factors to consider though. With that many users, yes the exact implementation matters less and the ancillary systems and how they interact are more critical. But, building the platform, being able to iterate quickly and deliver robust functionality along the way is important as well. Not that Ruby hasn't, but Django and Python have really proved their worth with some of the heavy hitters recently. Instagram, Pinterest, FriendFeed, Disqus... It's an impressive roster.


... and when we look at the um... "old" fart Java... their roster is even more impressive: LinkedIN, Google, NetFlix, Amazon.

Just commenting ... :D


Regarding Google my understanding is that Java is only used in a few areas, and is far from the majority of their portfolio. Most their production systems are C++, while a lot of offline processing is Python. There is even a smattering of .NET (in Orkut). I think Java was used in Wave, but that is now defunct. Likely a few of these systems are being migrated to Go.

Likewise with Amazon. My understanding, from chatting with some Amazon engineers at a conference, is that they have a pretty heterogeneous portfolio, even including some Perl and Oracle PL/SQL.

Others who are better informed, please correct.


Transcript from Coders at Work interview with Joshua Block:

Seibel: As a Java guy at Google, do you think it could be used more? Leaving aside the force of history and historical choices, if somehow you could wave a magic wand and replace all of C++ with Java, could that work?

Bloch: Up to a point. Large parts of the system could be written that way, and over time, things are moving in that direction. But for the absolute core of the system - the inner loops of the index servers, for instance - very small gains in performance are worth an awful lot. When you have that many machines running the same piece of code, if you can make it even a few percent faster, then you've done something that has real benefits, financially and enviromentally. So there is some code that you want to write in assembly language, and what is C but glorified assembly language?


GMail, Google Calendar, AdSense, AdWords, Android to name a few of the _actual_ heavy hitters from Google. Sure, not the majority, but they are all the bigger ones.

http://google-opensource.blogspot.ca/2009/01/opengse-release...

(I think Google Sites is also written in Java).

Amazon is pretty much a big Java shop (don't forget the companies they acquired as well). Check out their jobs site.


Java is pervasive at Google. Check out the Guava library and Google Guice for examples of how they adapted it for high-scalability web services.

On top off edwinnathaniel's list, add Google Sites, Google Docs, and Mapreduce frameworks such as Flume Java.


> ... and when we look at the um... "old" fart Java... their roster is even more impressive: LinkedIN, Google, NetFlix, Amazon.

Sure it is. But I won't use Java for web stuff. I might use it for some background services, but for regular CRUD functionality, Java buys me nothing. I generally use Python(Flask)/Jinja2/Flask-SqlAlchemy. Flask is intuitive; Jinja2 is pleasant and fast; Flask-SQLAlchemy is concise with the option of exploring the raw power of SQLAlchemy. For regular use cases, nothing in Java beats this combination. I have looked at Play; it comes closer but it's still not there.

Checkout this toy benchmark that was doing the rounds 2 months ago https://github.com/grahamking/Key-Value-Polyglot The naive python solution https://github.com/grahamking/Key-Value-Polyglot/blob/master... performs dog slow. Java has real threads, JIT; but guess what, the naive Java solution still is dog slow https://github.com/grahamking/Key-Value-Polyglot/blob/master...

My specialized python solutions are magnitudes of times faster than both naive Python and Java solutions:

https://github.com/rahulkmr/Key-Value-Polyglot/blob/master/m... https://github.com/rahulkmr/Key-Value-Polyglot/blob/master/m...

The naive solutions were taking around 20 seconds to do 500 writes followed by 500 reads. My changed solution does 5000 writes followed by 5000 reads in under 2 seconds. The 500 and 5000 aren't typos - the naive solutions was taking about 20 seconds for 1000 operations, whereas tailored solution was doing 10000 operations under 2 seconds.

This is the discussion thread http://news.ycombinator.com/item?id=3733090

I get it that this is an IO bound problem, and naturally an epoll based solution will totally smoke a thread-per-request solution, but that's the point I am trying to make. If at the end of the day, intuitive solutions don't work and I have to do custom implementations, I am not getting much out of using Java. I might use it for something which is CPU bound about 70% of the time, but anything other than that, the pains far outweigh benefits.


And the last time you use Java was....5 years ago?

Better keep up with the latest frameworks and libraries.

I know Rails and Django and I used cherrypy, cheetah, and sqlalchemy as well, and Java is on par with both modern frameworks if you know to choose the right frameworks.

In fact, I do not have to deploy multiple processes one for Rails and one for sinatra or node.js. Just some app server or web container and I am ready to go.

... And you are asking me to check out a "toy benchmark"?

I care more for the overall solutions from deployment, tools, etc. End to end baby...


> And the last time you use Java was....5 years ago? > I know Rails and Django and I used cherrypy, cheetah, and sqlalchemy as well, and Java is on par with both modern frameworks if you know to choose the right frameworks.

What are these frameworks in Java you talk about which are as expressive as Django and Rails? Less verbose that it was before isn't the same as as expressive as Rails/Django.

> In fact, I do not have to deploy multiple processes one for Rails and one for sinatra or node.js. Just some app server or web container and I am ready to go.

Rails/Django deployment isn't that hard. If you are talking about a large app, deployment is a very small consideration compared to other aspects.

As far as multiple processes go, more often than not, large apps are run as collection of co-operating services. That is by choice, irrespective of whether it's implemented in Java or Python. Which one of the examples you listed you think runs as some app server and container?

> ... And you are asking me to check out a "toy benchmark"? > I care more for the overall solutions from deployment, tools, etc. End to end baby...

All benchmarks are toys. If you want "end to end", the only way is to do it is to implement the same app in Python and Java, and then compare them. Doing that would be batshit insane(sure, we are going to do too versions of pinterest to compare whether we write it in Java or Python), nobody does that, nobody should do that.

As far as benchmark goes, I only claimed that out of box code in Java performs like dogshit, and I am to jump hoops, I would rather jump hoops in Python, Ruby, Clojure et al.


Expressive lies in the language not in the framework, I think you have a different mindset on that one but I'm not sure what you're trying to achieve.

Spring MVC is on par with Rails VC (let's leave ActiveRecord for another time) or Django TV minus Django ORM.

Testing? Spring has a really good testing library that plays with both straight up Java EE components (Servlet, Portlet, etc). Functional, Integration, Unit-Test, name your game.

Migration? Flyway works nicely and I don't have to spend days to set it up, just less than an hour for multi-year project. Minuscule in terms of effort. It just works.

I'll give you that ActiveRecord is nicer than JPA 2 (but not by a lot). The rest are... indifferent, same stuff, same type, same ol' same ol'.

JAX-RS can spits both XML and JSON easily with no code changes. I don't have to use Sinatra (or node.js) for web-api and deploy it separately from Rails: just deploy JAX-RS project as a separate WAR to the AppServer and I'm done. Done.

I don't need to spin up a different process, write a script to do X,Y,Z, maintain another infrastructure, etc etc.

Regarding your mark about benchmark: then don't start flashing numbers and stuffs if they are toys. If OOB Java is like dog stuff then Python, Ruby are like snail or turtle.

Wait, what am I doing trolling like redditors or slashdotters over mindless debate.

sigh technology has never been the problem, the people are sigh always holds true.


> Expressive lies in the language not in the framework,

For me, expressiveness is the sum of language expressiveness, framework, culture and api design.

For example, see this crime against humanity

    <%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
    <%@ taglib uri="http://java.sun.com/jsp/jstl/functions" prefix="fn" %>
    
    <c:choose>
        <c:when test="${emails.unread != null && fn:size(emails.unread)}">
            You have ${fn:size(emails.unread)} unread email(s)!
        </c:when>
        <c:otherwise>
            You have no unread emails!
        </c:otherwise>
    </c:choose>
Now, the designers could have very well implemented:

    You have ${emails.unread ?: 'no'} ${emails.unread?.pluralize('email')} !

> I think you have a different mindset on that one but I'm not sure what you're trying to achieve.

I am not trying to achieve anything, and it's not just a matter of mindset. See my example above. Expressiveness isn't just something which is confined to the language. Horrendous apis and frameworks are a bane for any language.

> Migration? Flyway works nicely and I don't have to spend days to set it up, just less than an hour for multi-year project.

I don't follow your remark that you don't have to spend days to set it up. Which migration lib takes days to set up? I am not contesting there isn't something bad out there(see my JSP example above), but since we are talking Rails/Django, I don't see how it is relevant here.

> JAX-RS can spits both XML and JSON easily with no code changes. I don't have to use Sinatra (or node.js) for web-api and deploy it separately from Rails:

Umm what? Rails needs Sinatra to produce JSON?

    def foo
      respond_to do |format|
        format.html {  }
        format.json {  }
        format.xml { }
      end
    end
> Wait, what am I doing trolling like redditors or slashdotters over mindless debate. > sigh technology has never been the problem, the people are sigh always holds true.

I won't bother responding to this.


http://freemarker.sourceforge.net/ and ditch JSP or use pure JS+HTML.

A typical setup is to use Rails for front-facing website and Sinatra for web-services (RESTful).

What I'm saying is that that can be done with Spring MVC + JAX-RS:

@Produces("application/xml")

@Produces("application/json")

// Want more? add your own? very flexible

// @Produces("application/atom")

// @Produces("text/plain")

// @Produces("application/something+xml") <-- custom

public Employee get(@Parameter("id") final int id){

  // code no need to pepper with respond_to

}


> http://freemarker.sourceforge.net/  and ditch JSP or use pure JS+HTML.

You said frameworks don't decide expressiveness. I was quoting an example that they do. Whether you use jsp or not is tangential. Java isn't anywhere near python or ruby in terms of expressiveness, neither are the frameworks.

EDIT: Like another commenter pointed out, if you can get a 55k LSoC Java project down to 8k Python project, that says something about expressiveness.


"I do not have to deploy multiple processes one for Rails and one for sinatra"

FYI, you can mount your sinatra app in your rails app and run them in the same process.


But you're not going to be able to build a Pintrest or a Instagram with 5 people in 2 years with Spring.


You can. I feel sorry to those who can't :)


Of course you can, anything is possible, including convincing yourself Spring is as productive as Rails or Django :)

Nothing against Java, it's the best tool for the job on many occasions but launching quickly and scaling fast are not it's forté.


When was the last time you use Spring? How about Spring MVC for front-end/controller and or JAX-RS for web API.

I find Java to be much more modular and adaptable to different web paradigm: old school MVC, one page JS style (GWT or pure JS), asynchronous (Netty or Servlet 3.0), scalable web api that can spit both xml and json without extra coding (JAX-RS), component oriented (JSF).

The integration between business logic and let say if you somehow need to use message queue are quite good via EJB 3.1 (very minimum coding) and JMS.

Java took a beating a few years ago, fast forward to 2012, they have improved quite significantly. In the next two years, multi tenant architecture will be available in JavaEE 7 out of the box.

Many people who keep complaining about Java typically are those who have wayy past experience.


The company I work for (which would qualify as an enterprise by any standards) decided to switch over to Django almost 3 years ago, rewriting our existing Java infrastructure on a project-to-project basis. The reasoning was primarily two-fold, we'd begun to have trouble recruiting good new talent interested in working with Spring/Java and we felt that for what we were doing, Java was not the strongest technology choice anymore (not like it was 10 years ago).

Last year, we switched a 55k loc Spring MVC (with JAX-RS for the json api) over to Django (using tastypie for the api side and celery for offline tasks). The result ended up being little over 8k loc of Python. Closing a ticket has gone from taking on average around 5 days to a little over 4 hours. That was the third project we ported over, the others having had similar success. YMMV.


I find your story to be very interesting. It used to be the case that it is easier to find talents in Java/.NET but that doesn't seem the case these days.

Why Django not Rails since Rails seem to be more popular?

I found it hard to believe that you can switch 55k LoC of Spring MVC + JAX-RS to Django with resulted of 8k. Because from what I've learned over the year, the amount of code requires to write Spring MVC + JAX-RS is very minimum and very close to Rails (my experience is more with Rails) at the very least from the Controller point of view. My personal like of the paradigm is to re-use the code between Spring MVC and JAX-RS. Love it so much.

That is of course if you discount everything else: imports, comments, parentheses, configurations. What about the actual logic?

We typically use JS heavy on the front end and less on JSF/JSP.

But you're right, YMMV. Projects have different requirements and skillset/experience. We typically don't have technical bugs but more of requirement bugs: someone forgot to handle the case of X,Y,Z and as such, it requires about a day or less to plug the issues.


Most of the gain was from actual logic, if there had been only models & controllers we wouldn't have seen any major benefits. But there was a lot of IoC with the customary mountains of rather pointless interfaces, an "admin" of sorts & a "messaging" queue running tasks. Most of the POJO data classes ended up being replaced by Python's built in dictionary. Basically the mixture of Python's stdlib, the language ecosystem & Django's got rid of most of our code. In the cases where we couldn't reuse anything from anywhere, something like a business logic class would normally shrink by 2/3. And the frontend was not JS heavy so that factors in.

I hadn't started when the decision to go Django instead of Rails was made (Rails & .NET MVC were also considered) but I've been told it came down to a number of things:

* This is in Europe, generally Ruby is not used very much over here, Python has been used in some shape or form in every company I've have experience of but I've only seen Ruby recently and then mostly for Chef or Puppet.

* Lack of explicit imports did not work in Ruby's favour.

* If needed numpy & scipy don't have a analogue in the Ruby ecosystem.

* General negativity towards the religious zeal permeating through the Rails community.


From what I know, JavaEE 6 IoC doesn't require interface so that may cut the code a little bit further. The thing about Java ecosystem is that they rely heavily on IoC/Interface for testing and I can't really blame them because that set-up guarantee super fast unit-tests unlike Rails ActiveRecord (in fact there's a little bit of movement recently in the Rails community that picked up from Java).

ASP.NET MVC won't have much gain over Spring MVC/JAX-RS. In fact, there's nothing similar to JAX-RS in ASP.NET MVC (you'd have to write your own stuff here and there and wire them together, not as straightforward as JAX-RS).

This is an excellent story to share both from technical perspective (Java+Frameworks => Django+Python+Pythonic mindset) and from recruiting perspective. I hope one day you could share (perhaps in slides/presentation style) the actual code and technique in details.

Mind if I ask you where in Europe? (Country/City?) For a while I thought Europe is Rails heavy.


> Last year, we switched a 55k loc Spring MVC (with JAX-RS for the json api) over to Django (using tastypie for the api side and celery for offline tasks). The result ended up being little over 8k loc of Python.

While comparing loc, take into account the fact that there were no unknowns - you are simply porting functionality. You generally don't have to solve a problem when you are re-writing a Java codebase in Python, just pythonizing the existing java solution. I would wager if the same system was re-written in Java, it would shave some bloat(provided you are re-implementing and not adding features).

With that said, 55k to 8k is still very impressive.


Touché ;)

I mean really we're all just moving 1's and 0's around.


> DB tends to be the limit

Yes, but Rails is another huge factor too (the ORM, Ruby itself, etc.)—you would get Rails scaled somehow to 18m visitor but it's damn hard.


I'm curious as to why you're switching. Care to share? I'm a long-time Django guy who's just been thrown into some Rails projects so I am going to be learning it shortly.


From a Django, Rails, Flask, Sinatra guy; Rails is going to be fun if you leave your pre-conceived notions at the door. Rails is not Django, and Ruby is not Python, and any attempts to make it behave like how it's not supposed to behave will cause pain. However, if you can play per Rails rules(and you should), that would be a very pleasant experience. How are you learning Rails? Since you are an experienced programmer, the official rails guide will be the fastest way to get a feel of rails - http://guides.rubyonrails.org/getting_started.html

The link above is the modified, famous "blog in 10 minutes". When I was starting, I found it very pleasant as it covers the 20% of the Rails which you use 80% of the time.


This is a great reply. Thanks for the tips. I think I might do a little messing around this evening.


Check this list out, Running Python - Django [1][3][4]:

-Instagram[0]

-Pinterest

-Disqus

Disqus.com - Disqus serves over 3 billion page views, and more than 500 million unique visitors a month on it's Django stack. As far as we know we are the largest installation out there.

-Mozilla[2]

addons.mozilla.com and support.mozilla.com

250k+ add-ons, 150 million views per month, 500+ million api hits per day (firefox checking for updates!

-Justin.tv(they moved to Django from Rails)[3]

-NASA

-National Geographic

-Canonical

-Bitbucket.org

-Discovery Networks

-Intel, AMD, HP, IBM

-Lexis-Nexis

-The Library of Congress

-The New York Times

-Orbitz

-PBS

-Rdio: Huge traffic Radio site.

-VMWare

-Walt Disney

-The Washington Post.

-lanyrd.com

-OSQA Sites. OSQA is an Open Sourced similar copy of Stackoverflow, a QnA community. AFAIK some 5K sites are powered by OSQA Stack. OSQA is built on Django.

-Youtube, LinkedIN, Google, NetFlix, Amazon: Python Stacks.

-GMail, Google Calendar, AdSense, AdWords, Android MarketPlace to name a few of the heavy hitters from Google are on Python - Google App Engine.

[0]http://techcrunch.com/2012/04/12/how-to-scale-a-1-billion-st...

[1]http://jacobian.org/writing/django-community/django-communit...

[2]http://reinout.vanrees.org/weblog/2011/06/06/large-mozilla-s...

References-

[3] http://stackoverflow.com/questions/886221/does-django-scale

[4] http://www.quora.com/Django/What-is-the-highest-traffic-webs...


EDIT: I upvoted the post, and now it's no longer in gray. So, it was just one downvote.

Why the downvotes? Parent post is just pointing out sites which he/she knows to be using Python and/or Django, some of which are incorrect(Gmail?), or are giving the impression that they run on Python when only small sub-projects are using it(LinkedIn?, Amazon?). There are some mistakes, but on a whole, I don't see anything wrong with someone pointing out something relevant to the discussion, even if it involves bragging about something he is associated with.


You seem to exaggerate Python's use: gmail is mostly Java, like most of the otger Google apps, LinkedIn ain't using Python to my knowledge.


> Amazon

I would double check your sources on that one.


Thinking also for a while to switch—what was the trigger that you switched from Rails to Python/Django?


Does anyone else feel that 410TB of user data seems quite a lot? If I have my maths right, even if all the 80 million objects are user data (as opposed to, say, logs) thats 5.3mb per object. Considering that most pinterest photos are from the web, that seems quite big.


Yes, that number is ridiculous.

For reference, that's about four times bigger than the iTunes music catalog (20mio MP3 files * 5MB average filesize = 100TB).


5MB seems a bit small, they'd also have lossless copies right?


Yes, their total storage consumption is bigger, but the MP3-part of the library is in that ballpark.


Original image plus two (or more) scaled thumbnails. Also, I don't think they're doing any reuse (upload the same pin for 2 different S3 objects of the same image). 5.3Mb / object is still really high though.


Wow, if they are not doing any reuse that really really dumb. I can't back this up, but would have thought at least 50% of all pins are repins.... if that 410TB figure is correct, this oversight is costing them $18k a month. It would be simple to have a separate asset id and pin id...


Those kind of articles makes me feel so stupid. Even though those technologies are ready to go, making them work smoothly without any interruption always seems hard to me. I believe, there has to be lots of tips and trick. ( in other words you have to be experienced with all them or I am very lazy) I was wondering if any common receipt exists or if there is someone who can answer my couple of high scalability questions.


Yes, see "heavily-modified Django"

(even though this is probably around scalability and connecting with other technologies)

Scalability and uptime is still hard

Still, Pinterest is laying on a great infrastructure (EC2, Elastic Map Reduce) and using it to the fullest.


Pinterest has performed quite awfully and has been very, very slow up until now. They may have fixed the problem, but it's not necessarily a model to follow. It's nice to know that even a site like Pinterest can't get it quite right all the time.


> Sharding is used, a database is split when it reaches 50% of capacity, allows easy growth and gives sufficient IO capacity

Nice, compared to the usual, "The database was at 100% capacity, then we tried to shard/partition, and it did not go too well."


According to the article Pinterest is spending on AWS EC2 >$30k to support 18M visitors/month.

Data: $52/h (peak time, let's say 18 out of 24 hours) and $15/h (night time, let's say 6/24).

Edit: as pointed in the comments $30k/month would only be the EC2 costs.


You are just counting EC2 cost. The AWS cost for 410TB of S3 storage is around $39k. You would need to add in BW cost on top of that.

It is also interesting that they seem to be using Akamai for a CDN instead of Cloudfront so not a completely AWS based solution.

I wish they went into what they are storing in S3. 410TB is a lot of storage. My initial guess was cached images but 80M objects breaks down to 5MB per object and that is a lot more than what is needed for image caching.


From these figures they seem to burn over 100k per month on outsourced cloud services alone. Holy shit.


If they're at 30+ staff, they've managed to keep hosting to a pretty small chunk of total budget.


Yeah...that's insane. Since they're a "pre-revenue" company, they're just burning money.


"Burning" seems like an understatement.

Looks like a switch to dedicated hardware would amortize within... 3 months.


Given their growth rates and cash in the bank, and the fact it is still looking for a business model, it is probably better for them to focus on their key problems before working on other issues. If they were self funded, like 37signals, and are running operations where they'd tighten the screws on cost, then the focus is different again.


I disagree.

Yes, they obviously have quite a few loose screws (read: whoever invested 100MM in that).

Either way, this is not a matter of "tightening". It's a matter of hiring an admin and having him not only pay for himself after 3 months, but for 1-2 other employees, too.

Yes, when you have 100MM in the bank then a mundane couple dozen thousand dollars a month might seem to matter less. But I can't think of a company where that kind of decadence has led to anything positive in the mid term.


That's actually right, S3 costing at $39k/month seems completely crazy.

No wonder 37signals decided to switch to their self-hosted storage solution[1].

[1] http://37signals.com/svn/posts/2483-nuts-bolts-storage


As they just raised 100 million, I wonder if putting (a lot of) effort in replicating S3's functions to save part of 40K/month is worth the trouble. And setting that up won't be free either. (their 100 million can pay for the current storage for 200 years...)


It's cool that we're seeing companies reach 1 billion USD valuations with 12 employees.


It will be even cooler when we see companies reach 1 billion USD net-income with 12 employees.


I heard Microsoft Visio team probably have less than 50 people working on it (granted the name Microsoft and the Office brand provide a huge selling point).

Visio almost hit $1B according to insider... (can't verify).


No that's interesting! Gots to dig into this little tidbit.


To be fair the employee number is as of last December, and they've grown quite a bit since then.


Based off of their team page, Pinterest is up to 31 employees.

http://pinterest.com/about/team/


Small, cohesive goal oriented teams with clear well defined goals can achieve so much.


Interesting. I'm migrating some old PHP apps over to Python and have been learning Flask + SQL Alchemy. Why would a site with so much traffic choose a full stack framework like Django that required so much modification?


Because when they started they didn't have the traffic, quicker to get to a running product. Probably easier now to cut back/change Django functionality than rewrite.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: