sigh The article makes it sound like these sites are doing everything themselves including pushing the bits.
Maybe some are but I can say from personal experience that most of your traffic, if you're smart, comes out of a CDN. The sites themselves are definitely not that interactive which makes them simpler to publish. The pages are almost all cached and that doesn't take much horsepower to serve up. The big video sites have ratings and comments but they are not that big of a deal. People go to watch porn sites to watch porn, not interact. Customer analytics have shown that over and over.
I know of virtually no porn company that handles their own transactions, either. They all go through billing companies that handle things like PCI compliance for them.
Most sites also use a system like NATs to do their affiliate management. You need one that the affiliates trust isn't shaving sign-ups from their account. They tend to trust NATs.
For the data on the backend you just have a SAN to manage the data or you just manage it on a few servers with lots of disks but if you are really at the 100TB mark then you get a SAN I would think. That's what we did. Sure, it's a lot of space but they're big files so managing them isn't that hard.
I'd say the largest issue that a company like YouPorn will have is the amount of data in their working set for a CDN. CDNs generally charge you for the size of your working set that they keep at each POP in their network so you want to keep it as small as possible.
At the end of the day running a large porn network is more about integrating the myriad of partners you need to run the network. The infrastructure is interesting for a while but once you have it working the business of doing deals and handling promotions and figuring out why integration point A isn't working like it should is what keeps you busy.
It's really quite easy to serve large volumes of porn. The dataset doesn't change often and 90% of your working set is the first couple pages of content. Back in 2007 when I was in the business, there was only one CDN that would touch porn (LimeLight) and they were absurdly expensive. Today there are hundreds of porn-friendly CDNs and they charge 1/10th the price (no exaggeration).
Storing a couple hundred terabytes of porn is not expensive or complicated.
Sites like YouPorn don't authenticate their content. Most of the high-volume web page content is static. Even then, you're looking at just a few page views before the user spends 5 minutes watching a video that streams from the CDN.
Payment transactions are handled by third parties, and usually abstracted through third-party affiliate software like NATS. Which, BTW, is a piece of junk and the one part of our system which we had trouble scaling.
Big bandwidth numbers sound impressive but the truth is running even a mildly successful social network with heavily personalized pages is ten times harder than running even the largest porn sites.
"YouPorn is a beast, streaming three full DVDs of video every second (900TB/day, like Netflix), handing 300K queries every second, and generating up to 15GBs of log data per hour."
Serious misconception: it's just a couple of boxes and two dudes, nothing more. It runs itself. CDN FTW!
And the only thing a CDN will help you with in this case, is offloading CSS, images and JS. You can't put that much streaming content up unless you host it yourself or want to spend every penny you have.
And the only thing a CDN will help you with in this case, is offloading CSS, images and JS. You can't put that much streaming content up unless you host it yourself or want to spend every penny you have.
This is utter nonsense.
It is the nature of YouPorn's UX that the vast majority of requests are for the first couple pages of data. You don't have to put all the content on the CDN, only the part that represents 80-90% of your traffic. If you have a pull-based CDN you don't even need to plan it; the CDN automatically populates itself with what it considers a reasonable working set.
Updated: I should add, I designed Kink.com's modern porn-serving architecture back in 2007. Prior, it ran off of 20 apache httpd boxes at 365 Main. Now it runs off of a handful of appservers, a couple MySQL boxes, and a lot of CDN capacity... on vastly more traffic. Believe me when I say there's no reason that the bulk of YouPorn's traffic couldn't be served off of one or more CDNs.
This is really in reply to powertower's statement (above or below since I couldn't reply directly to the comment) that they aren't using a CDN for their content. Here's the source domain for the content on one of their videos:
cdn1.public.youporn.phncdn.com
This domain resolves out to:
cdn1.public.youporn.phncdn.com.swiftcdn1.com
Which is hosted by a CDN company called SwiftWill.
Besides, the article you referenced says that they are using nginx to act as an external engine for static content such as css, js, etc.
According to the info, that's all YouPorn uses CDNs for (page assets minus video). That might, or might not have changed recently.
I'd imagine that paying extra for shaving a few 10s of milliseconds off latency might not really be much of a benefit in this type of a business, they are not doing VoIP phone calls. I'd imagine having fat pipes on a decent tier is #1 here.
The point of using a CDN (in this case) is not to reduce latency. The problem is that it becomes exponentially more expensive to serve high data rates out of a single data center. Basic infrastructure like switches and loadbalancers start to get crazy expensive, as do their support contracts. Also, it requires a lot of fairly rare (and highly-paid) expertise to set it up.
Distributed CDNs are like the RAID of content serving. Each node can be simpler, cheaper.
Another bonus of using CDNs is that you're in a great negotiating position. If you're serving 80% of traffic through one and 20% through another, you can flip it around the moment one offers to shave a percent or two off the price. I've had people in the sales department of the formerly-80% side notice the traffic drop and suddenly call up with counteroffers. In contrast, getting someone to draw fiber cables across the datacenter usually requires a lot of onetime expense and long-term contracts.
I'd be really curious what kind of CDN deal they're getting.
At regular CDN rates you're looking at ballpark $150k/month for that kind of traffic (rather optimistic extrapolation from my own rates...).
Also the figures remain mind-boggling regardless how you slice them. 900T/day breaks down to a healthy ~80 GBit/s average. That's more than most mid-sized datacenter uplinks (plus conveniently ignoring any bell curves they may have).
Yup that seems more realistic (my estimate was too optimistic then). Works out to around 2.2ct/GB. Personally I haven't seen a CDN quote below 10ct/GB, but we also measure our traffic in TB/month, not TB/day.
My guess is that this one came from a post a while ago where someone at Youporn wrote about how they used Redis. Obviously not for the videos - the article writer clearly didn't read that part very thoroughly, or didn't understand it.
Redis can store binary data -- and YouPorn's Redis cluster apparently handles 300K queries per second. Those queries obviously aren't all page views (the site only peaks at 4000 PVs per second).
Why can't you store video in a database? YouPorn says that Redis is its primary data store.
I think you are. That just means that you hit MySQL but doesn't necessarily imply that the data itself are served from MySQL. Filesystems are just fine for this task and as was already mentioned, most of the data are in the CDN anyway.
Maybe some are but I can say from personal experience that most of your traffic, if you're smart, comes out of a CDN. The sites themselves are definitely not that interactive which makes them simpler to publish. The pages are almost all cached and that doesn't take much horsepower to serve up. The big video sites have ratings and comments but they are not that big of a deal. People go to watch porn sites to watch porn, not interact. Customer analytics have shown that over and over.
I know of virtually no porn company that handles their own transactions, either. They all go through billing companies that handle things like PCI compliance for them.
Most sites also use a system like NATs to do their affiliate management. You need one that the affiliates trust isn't shaving sign-ups from their account. They tend to trust NATs.
For the data on the backend you just have a SAN to manage the data or you just manage it on a few servers with lots of disks but if you are really at the 100TB mark then you get a SAN I would think. That's what we did. Sure, it's a lot of space but they're big files so managing them isn't that hard.
I'd say the largest issue that a company like YouPorn will have is the amount of data in their working set for a CDN. CDNs generally charge you for the size of your working set that they keep at each POP in their network so you want to keep it as small as possible.
At the end of the day running a large porn network is more about integrating the myriad of partners you need to run the network. The infrastructure is interesting for a while but once you have it working the business of doing deals and handling promotions and figuring out why integration point A isn't working like it should is what keeps you busy.