> When you vote, your vote isn’t instantly processed—instead, it’s placed into a queue.
I remember looking into this a while ago and was bewildered to find that when I upvoted or downvoted, there was no XHR call to the backend! There was no hidden iframe/image, no silent form post. Absolutely no network activity. Yet when I refreshed, my vote was shown correctly. I thought I was going crazy.
This was long ago so I'm a bit fuzzy on the details but after a bit of digging, I found the most elegant data collection technique I've ever seen. Instead of sending network data when I voted, a local cookie was set with the link id and vote value. Then when I went to another page, my browser naturally sent the cookie to the server, where I believe it was processed, and then a fresh cookie was sent back to my browser. I could vote on 10 links, the local cookie would get large and then on the next page refresh, the backend would receive my batch of votes, process them, and send me a fresh cookie again.
I don't think they do that now and I've never seen anyone do something like this. Even HN just makes an XHR call on voting. After twenty years on the web, it's not often that I am surprised so this was quite a thrill.
Maybe I'm not a 'normal' user so it wouldn't have mattered to them, but I used to open loads of tabs at once, and slowly make my way across all the stories, then more often than not, I'd close the browser where all my cache, cookies and history and whatnot are cleared.
This would mean none of my votes would ever have made it to the server!
To an extent it might be worth letting those votes be lost to keep things simple, if only relatively few users browse that way.
There are ways around it if it were a problem though:
* Have a timeout on updates. If a vote has been sat in the local cookie send for some time send an XHR request to push the current queue to the server
* Use onunload event handlers to push any remaining queued votes when a tab closes. You could even try maintain a count of how many tabs are currently open (in the same cookie) and only send that request on the last one closing.
* If the amount of votes hits a certain length, also send a XHR request to register them.
Probably not perfect (I have in the past found onunload events to not be terribly reliable) but it would capture most of the otherwise lost votes you describe. It might be too much work to write/debug/maintain though, if the number of lost votes would be small anyway.
I'd guess they'd execute this function in onbeforeunload (or the equivalent) so the functionality is called when you go to leave the page rather than when you load a fresh one.
It took 8.4 seconds to load the Reddit front page on my phone. Hacker News took 1.1 seconds. This feels like advice from the overweight gym teacher on how to do pushups.
The desktop Reddit site took 2.2 seconds over the same connection, by the way. It seems like it would be much more valuable to optimize whatever is taking up >75% of page time on mobile.
HN serves the same front page to all users, other than the bar at the top. To make a reddit front page, you have to look up the hottest results of ~100 subreddits and shuffle them all together in the correct order. Literally every time a vote is cast, that could change the front page for every single user, in a different way for each one.
The perf problems referenced are in front end not back. reddit's mobile website really does have terrible performance, resulting from a whole list of mistakes. Paul Irish did a study - https://github.com/reddit/reddit-mobile/issues/247.
It's hard to blame them for missing the point when the comment is kind of nonsense in the first place. It's a false dichotomy. Even if it wasn't and you had to choose one or the other the choice isn't clear cut since the backend improvements benefit the desktop site, mobile site and the mobile app.
To be totally honest I'd say the comment is verging on toxicity. They're basically telling these backend engineers to piss off and save the blog post until the mobile site loads faster. It's quite likely they have little-to-no control over the mobile site and caching improvements are one of the few indirect ways they can improve it. This exactly the opposite of what we should see in response to content like this.
> They're basically telling these backend engineers to piss off and save the blog post until the mobile site loads faster
That's not really what I'm going for. The way they framed the blogpost just lacks perspective. Perhaps that's due to organizational structure, but maybe there's a lesson in that too.
I am a little disappointed I ended up as the top comment. The technical comments are more interesting. The upvotes on my comment probably just represent frustration.
So, approximate. Don't make the calculation exact. No one will care if the page is mostly correct, since the algorithm you've created is already just a proxy for relevance instead of actual relevance.
And you could literally NOT update everything every single vote, and the users wouldn't even know... That's one of the major points of all those queues reddit uses, isn't it?
Blind trust in such a simply verifiable case is quite naive.
You can easily open up the front page of HN as a logged in user and see that it contains information about which storeies you've voted for and which you've flagged. On top of that, you can click "hide" to hide stories. These will only be hidden for your specific account, not for every person who loads the front page.
What's more, the go-to call-to-action by HN admins during really popular stories is to have people log out, because logged in users don't get cached results. Have a look at the comment by HN admin dang on the Trump winning story. [1] The key part being "please log out to read HN today, unless you want to comment. Then we can serve you from cache".
Exactly. It's the difference between "Look up the first 100 people in this phone book not named Justin" and "Look up the 100 Justins that come first alphabetically across these 100 phone books"
As far as I know hide doesn't change the order, it just skips some results. This is a far simpler problem than merging multiple sources of information into one result.
I'm a long-time Reddit user (10+ years on my oldest account) and a mod of several subs, including a 50K+. It is with great sadness that I say this, but Reddit is rapidly moving in the direction of pulling a Digg v4.
They just released for beta a new mod-mail that is unusable, so I'm guessing this new trendy but shitty design will eventually replace the whole site. I don't know what will happen after that, but I won't stick around to see it.
Reddit is an interesting phenomenon. The home page has been a toxic mix of memes and trash for much of its life (I looked at it a few times, but don't go there any more).
The subreddits vary from highly toxic to well moderated (spacex for example), but most are so lightly moderated that they have some good links but little interesting comment content. The contrast with HN is interesting - I think the moderation here is much better and the culture is more welcoming as a result, but the home page here now just goes past too fast for it to function as a firehose on specific subjects.
HN is just more niche. HN is 842 on US alexa. Reddit is 7. The criticism that reddit gets is so naive. What other site of comparable size, where the users are mostly anonymous, has better community? Compare reddit comments with comment threads on youtube or national newspaper sites, and re-calibrate your humanity baseline.
It's also perfectly valid to criticise the criticism for having too high standards. If you expect a car to fly you to the moon, your standards are too high. If you expect Reddit to be like HN, you're also a fair bit off.
No, it's not really valid to come back saying that nobody has done better so therefore your sights are too high. If someone is complaining about the CO2 output of the lowest emissions coal plant, it's a perfectly valid criticism even if it is the best coal plant.
Reddit is a cesspool, and there is no proof that's the only thing that can happen with large online communities. There is plenty of room for innovation there. It's not like online communities have been around for thousands of years and this is clearly the best solution as proved by extensive research.
The parent is saying that the blame lies with humanity rather than any particular site's methods of operation. The fact that HN is virtually unknown to trolls is probably the only thing that keeps it solidly on the good side of the temple/cesspool gradient.
There's criticism, and there's not knowing what you're talking about. Unless one has run an online community like Reddit before, it's more than likely that that individual is playing armchair CEO.
Unless you're the CEO of Reddit, you're by definition playing armchair CEO. I don't need to be Cristiano Ronaldo to have a legitimate criticism that he needs to fall back and help the defense sometimes. I mean, he's the best, but he doesn't have to score all the goals.
This response always drives me crazy. The "well why don't you make something better" has to stop. If everyone had to be on the same level of what they were criticizing, then there would be absolutely no feedback. I'm not a director, but I've studied film enough to know the elements of what makes a good movie. If you've never been displeased with something, then I applaud your saintly hood, but please understand that often the most passionate users, are the most vocal. Regardless if they are are singer/guitarist/artist/ceo/presidentoftheus/whatever.
Because you're going to more or less make the same dumb decisions that the person you're criticizing made once you're in that same position dealing with a thousand different angles, people, and pressures your limited mind could not imagine before, and that's a fact.
Would just like to say, that as a moderator of r/SpaceX, I appreciate your kind words.
Reddit really is a hit and miss barrel of communities. We (r/SpaceX mods) tend to find that a lot of our users don't even bother commenting in other subreddits. I certainly don't. Some of communities and the stunts they pull are downright embarrassing, and being hosted on the same platform sometimes feels a little ugly; so having our own little island that's peaceful and courteous is a big breath of fresh air.
Thanks for the good work, I really appreciate the subreddit and the work done moderating it. People undervalue good moderation because if it's working well you don't often notice it.
The problem of all support professions. No one sees the immense amount of work needed for a good status quo, only when something goes wrong people notice the people that work all the time in the background.
The problem with reddit, IMO, is that there is a unified identity as "redditors" that brings conflicting communities too close on a routine basis. That makes some sense based on the site's early history when it was basically targeted at the same audience as HN, and the unified identity has been fading more and more over the last couple of years, but it's still a large part of reddit's design.
Contrast with Facebook, Twitter, or other massively-accepted social networking sites. They start you out with ... a blank page! This not only allows but requires the user to craft an experience that will make them feel comfortable, and there's no need to confront them with things that may not be palatable.
I think you can only get away with a real "front page" if you're targeting a specific niche. Otherwise, you have to go really generic. Twitter usually recommends that new accounts follow very generic things, like late night talk show hosts, sports stars, and popular singers.
A lot of reddit's growth problem is that if a Republican or a religious person or someone over the age of 40 hears about it and decides to check out reddit.com, they're likely to leave angered, offended, shocked, or all of the above, and that's before they try to participate/comment, which is a whole 'nother can of worms.
Combine with occasional news stories about reddit's less-savory underbelly and the prevalence of pornography and profanity, and it really is no wonder that reddit struggles to find mainstream acceptance. The issue that reddit must now deal with is to become mainstream-acceptable without destroying its existing userbase in the process.
The idea of a community where anyone can create a subreddit is interesting.
Back in the day the SomethingAwful forums were hugely popular largely because they had subforums covering a big variety of topcs. They're still somewhat popular but Reddit has taken a huge amount of that traffic.
One big problem with Reddit though is ongoing discussion. Anything older than a few hours and you may as well not comment. Whereas traditional forums excel in that area.
I feel like there'd be a use case for a more traditional forum, but where users can create and moderate their own subforums. In fact I wonder why no-one's done that already.
That's not quite right, in that at least the person you are reply to gets an orangered notification (and how I wish HN had this) so, although everyone else may disapear, you may find yourself having a prolonged conversation with that person.
> most are so lightly moderated that they have some good links but little interesting comment content.
Isn't that just Sturgeon's Law?
I have no idea how many subreddits there are, but I'd wager there are probably dozens, if not hundreds, of Hacker News equivalents - albeit tailored to different interests.
I tend agree with you that the default homepage is a flaming trashcan, but I honestly haven't seen it in years - I just see my curated subreddits.
That's just how large pseudonymous forums work. Remember Usenet? The popular newsgroups were wastelands, but the smaller ones were often decent (unless, like comp.lang.lisp, they had their own specific diseases).
Considering that you've participated on Reddit for a decade, even with multiple accounts, I doubt that you would able to suddenly give up your regular virtual social interactions.
So what is the alternative you're going to switch to, are talking about the distant future, or is it just a remark that you can't proof you would follow-up?
I dunno. I'm part of the 10 year club (/u/barclay is my main account). I've modded in the past. However outside of a couple of select subs i still casually participate in (watches and bikes) the draw just isn't there for me anymore.
I think it's more like slashdot. I didn't leave slashdot for reddit or HN. I just left. It got bad. I'm thinking the same thing is about to happen for me on reddit as well.
The interesting difference, to me, is that Reddit isn't one community (like Slashdot) but rather more of a hosting service for communities, like Discourse or Slack (or Usenet or IRC.)
At some point, I might be following a whole different set of subreddits than I am today, as old ones die/jump the shark and I discover new ones. But I don't see what would specifically push me to leave the entire site, as long as there are individual active communities there I'm interested in—any more than I would choose to categorically stop visiting blogs hosted on Wordpress.com or categorically choose to never read a post on a phpBB forum again.
Of course, I might leave the whole site anyway, but not because of a push; rather because activity on the whole might drop off in favour of something else, and all those communities I like might migrate away (like many of the chatrooms I visit have migrated from IRC to Slack), leaving a hollow shell of a site that's still, in theory, a host for many communities I enjoy—but not their canonical home, just a secondary one. At that point, there'd be no reason to visit.
I think it'd be really interesting to set up a more traditional forum system (like Discourse), but in a way where anyone can create a subforum and moderate it. Like Reddit but allowing for a lot more long-term discussion on a topic.
Not sure why you think it's so difficult to quit reddit cold turkey. I know several people who have done it already just because of the culture. I'm already visiting less frequently.
HN, hubski, voat, etc. are picking up the slack. I'm also trying to spend less time socializing and more working on stuff.
Remember the HN thread for the presidential election? A single huge thread affected the server so much it had to be split up. Reddit has hundreds of threads that big.
HN is a much much much simpler website, and has orders of magnitude less traffic. From what I heard, Reddit is on a pretty tight budget, and it's a pretty small team as well.
We don't measure complexity for complexity sake, we look at the functionality offered. The fact that HN is a much much much simpler website while doing much of the same functionality is an indictment of the complexity of the Reddit mobile site. Traffic doesn't really enter into it, that is really not the limiting factor here.
They just spent the past year relaunching "reddit - the startup" so their team is now plenty big, they just don't have much to show for it.
That's just not true. Reddit has a ton of extra functionality, including creation of subs, subscription to subs, controls of voting rules, user bots that run on reddit's servers (example: RemindMe), theming, tons of settings even for the regular users, moderators/admins interfaces, apps (example: ChangeTip), reply notifications, self-serving ad management system, image/video embeds.
I'm sure there's a lot more, these are just the obvious features that HN doesn't have.
> user bots that run on reddit's servers (example: RemindMe),
RemindMeBot runs on its creator's own server and communicates through reddit's API as if it were a regular user. AutoModerator is the only bot I know of that runs on reddit's servers.
As far as i know hackernews has no concept of "subreddits", it is a subreddit, a single one. I would guess that single additional layer creates enough problems that they don't 'have the same functionality'.
This comment is kind of shitty. This is an engineering blog post by a team that very likely does not work on the mobile site. They probably read HN and a comment that completely dismisses their work probably isn't very encouraging. I don't think your criticism is incorrect, just misplaced. There are even a lot of ways to point out that the mobile site doesn't reflect those values while praising the backend team for espousing them or remaining neutral.
I'm probably overreacting but reading this really got to me. My apologies if I came off too aggressive or negative.
For starters, that's not really on-topic here as this is a post about the server side's caching, having nothing to do really with any client issues.
But regardless, what does the second page load take? What does the 3rd take?
For many people, they go on reddit and stay on there for a significant amount of time. So that first page load is much less impactful, especially if it can enable them to have faster "pageloads" while on the site. If the site is optimized for them, then having the image/text post load in a fraction of a second, and the comments in under a second (which it does for me) seems like a good bet.
That being said, they really could spend some time reducing that impact.
And just to throw my experience in with the others, I'm getting much faster load times. Timed using the chrome dev tools connected to my phone, it takes 2.76 seconds to the "load" event and about 3 seconds until it's fully usable. (cache disabled, 4g tmobile network, high-end android phone, obviously these will be much worse on slower phones, slower networks, etc...)
It's bad enough it takes several seconds of loading on my iPhone when the desktop version is near instant, it's utterly broken too.
Most menus don't open for me on the first try (especially in landscape) or open in the wrong place with touch targets that are "off" so I have to rotate the phone a few times.
The old iOS 6 style site didn't age well visually, but it sure was fast. I miss the days when mobile sites were the fast and light version meant to work over 2G.
To me it's insane a non-SPA desktop site would become a SPA mobile site
> To me it's insane a non-SPA desktop site would become a SPA mobile site
Not to me. Yes, you might pay additional time on the first load, but you save massive amounts of data (and thus, due to TCP, highly latency bound packets) when you just transfer small bytes of JSON. Oh, and that saves battery too - mobile data transfer is the hugest drain on any phone's battery.
How do you "save massive amounts of data"? The average Reddit page I visit's content is dominated by either an image, a video or the comments.
All 3 of which have to get transferred with a SPA or normal site.
And mobile radios don't immediately go to their lowest power state after transferring the data so I'd be shocked to see a SPA dynamically loading smaller pieces as you move about the comment section save battery over a static one that loads large amounts in one shot, sporadic small buts of data is much less efficient than large chunks on mobile in general.
Not to mention the CPU overhead of the SPA's Js which is non-zero even if radio battery usage is a larger issue.
Every single resource/asset, including logos, CSS, webfonts etc. will be fetched from the server. The server will answer with a 304 Not Modified, but still it's one round-trip per resource, and more if SSL is involved.
On a 2G or nasty 3G link all that latency adds up. Oh and please don't mention "but clients should cache stuff"... caching stuff on clients is broken beyond repair, buggy as f..k (hello iPhone, hello Firefox, hello Chrome - honestly, I haven't been able to get ALL major clients on ALL OSes to properly use caching!) and if you expect half of your userbase likely has a totally-broken-caching client (i.e. old Android phones) you better build your infrastructure to require as few requests as possible => SPA route.
> And mobile radios don't immediately go to their lowest power state after transferring the data
Now imagine me reading a typical reddit thread on the front page. Takes me about 30 seconds before I am disgusted by some troll comment, and I go back to /r/cats. In the non-SPA case I'd have 2x approx. 10s of the radio blasting at full power, in the SPA case I'd have 1x 10s for the initial load of the app and maybe 2s for the second load of the 30 posts of /r/cats. Oh, and not to mention I, like many other people, like to reddit while on the train - which means in any case that 20 kbytes/data for a simple JSON is better than 2 MByte for each simple request. That small JSON can be transferred via EDGE if the signal is as bad as it usually is but good luck for your nerves if you have to wait ages for every website.
> Not to mention the CPU overhead of the SPA's Js which is non-zero even if radio battery usage is a larger issue.
Yeah but unlike broken browser caching this one is something YOU can influence, not having to hope your users some day have enough cash to buy a phone with proper up-to-date software.
Edit: missed yet another thing. An SPA can request thumbnails matching for the device display resolution dynamically whereas a non-SPA solution (usually!) has only one thumbnail size.
HTTP/2 doesn't require a separate connection per resource if they are served from a few domains.
Also, do you really have a problem where an asset falls out of cache from one page view to the next? If so, the server wouldn't be returning a 304 in that case.
>In the non-SPA case I'd have 2x approx. 10s of the radio blasting at full power, in the SPA case I'd have 1x 10s for the initial load of the app and maybe 2s for the second load of the 30 posts of /r/cats.
Taking that example the radio would be at it's full power state for 20 seconds in the second case and 15 seconds in the first, not 12 as you'd expect. And as soon as the connection isn't so horrible that it takes 10 seconds just to get back everything is cached the 5 seconds of high power state starts to look less and less likely, and in most cases you'll be at parity if not using more power.
The number I've optimized/seen optimized around in the past is assuming it takes about 5 seconds of inactivity for the radio to go back to low power state (It was taken from the 3G days but is still what Google's Android documentation bases it's recommendations for efficent downloads on: http://www.research.att.com/articles/featured_stories/2011_0...), so each request gives at least 5 seconds of high power usage.
Of course this just goes back to what I meant about other content dominating load times, if your connection is so bad that getting back responses for cached items takes 10 seconds, I'm sure the download time for those images and videos will be through the roof.
Reddit solved the problem of reddit on a bad connection like that long ago with .compact. Before the "mobile" reddit links were .compact links (which is why there's now an annoying banner pushing you towards the SPA). .compact links being the non-SPA version for mobile/low-bandwidth.
It had hardly any assets to load and did tiny thumbnails on the subreddit's page, and comment page, instead of trying to load a giant image of the content.
The moment you click on the SPA link it takes a whopping 30 seconds to load a preview image of the GIF. It also takes an entire minute just to load the code and assets for the SPA itself if it hasn't been cached. Compare that to 2 seconds for the .compact link's document + styling. If you're visiting that SPA on a bad connection you'd better hope those browser caches do their job, and that ProductionClient.js hasn't changed recently...
.compact worked well on the kind of connection where bandwidth was so low you'd want to evaluate content before dedicating to a 30 second wait because you clicked on "Show Comments", which used to be any mobile connection in general.
>Yeah but unlike broken browser caching this one is something YOU can influence, not having to hope your users some day have enough cash to buy a phone with proper up-to-date software.
That'd be fine if it hadn't been chugging on an iPhone 6 plus while scrolling since it's beta til now. An iPhone 6 might not be the newest hardware, but at launch (and even now), it's what most mobile developers would treat as "modern hardware" and "modern up-to-date software". Code size doesn't need to equate to how CPU intensive a script is, but it's not exactly a mystery that the SPA chugs while .compact is fast when the SPA has three times the Js and twice the CSS just for it's "core". .compact looks dated because of the few assets it uses (and trying to look like an iPhone app). Nothing would stop Reddit from making .compact links look modern, except they're clearly trying to drive .compact users to the SPA (they were much more aggressive about it back when it was in "beta")
Beyond performance, the entire reddit mobile experience is terrible anyway. It's advised to use the unofficial 'reddit is fun' app which is simply amazing in comparison.
I use Relay for Reddit. It's the best mobile app I've ever used, and only the second one I've ever actually paid for, there's also a free version which I used for a long time which puts ads at the bottom.
The interface is great. You can swipe for (almost) everything, which makes it easy to use one-handed.
If anyone here uses Sailfish or MeeGo, the Quickddit app for Reddit is excellent. It's on the Jolla store but it's also on GitHub: https://github.com/accumulator/Quickddit
On my iphone 5 (yes I realize thats old), the mobile website straight up doesn't load at all, just a blank white screen. I wonder how much device testing they've done...
Here's an awesome performance audit of reddit mobile by Paul Irish. It shows how completely unoptimized the code was back in 2015. Sadly it doesn't seem to have improved much since then.
Did the rewrite coincide with the re-skinning of the UI? Because any gains in performance are imperceptible, as the UI now hides all content until the browser has completely finished fetching. So even if it is 'faster' the UX is actually 'slower'.
The site was in beta at the time so yes no performance work had been done. The rewrite last year improved things and we are working on more performance improvements.
Another test for the skeptical. I logged out of reddit and tried loading /r/programming. That subreddit has no image submissions and no obvious customization. I loaded it once, then loaded it again while timing it. It took 6.5 seconds.
This was over a different connection, so I tried HN again. It took 1.0 seconds.
Have you tried looking at different page elements and see which ones took longer? I've always felt that the main reddit.com domain loads reasonably fast but the rest arn't
My theory is that the mobile version of reddit has gradually evolved into slow, buggy nagware because it's designed to push you to install their app on your phone, ala facebook removing features from their own mobile site. I've never used a site that forgets I'm logged in so often or still throws error codes if you try to upvote or post a comment while logged out. I would buy a year of reddit gold if it allowed me to permanently disable the mobile site, but for now I've pretty much just ditched the site entirely when I'm not at a desktop.
reddit has i.reddit.com (old mobile page) that is faster then the new one. i hope they dont turn that off because i use it sometimes. the new one is just slow.
I use this too. It's also better UX in my opinion. In the new site, a lot of the buttons are either hard to see or too small to reliably hit with the finger on the first try.
I am glad to see that I am not alone with my preference of the old mobile site. It fast, simple and it does not have any unnecessary whitespace and padding which just waste valuable space.
The new mobile site is slkw fails in basic functions such as as preventing double posting - which happens all the time because the UI is often irresponsive.
If it 'feels' slow, that seems functionally worse than actually being slow. It implies an unresponsive site where it could be faster. A slower site, that 'feels' faster, is more functional to any human. So to echo about half these posts, it seems to be UI/UX holding it back.
That website is very heavy on JS, that's why it's unbearably slow. I remember having to use it on a mobile with 512MB of RAM once, it literally killed my browser (probably the OOM killer)
> This feels like advice from the overweight gym teacher on how to do pushups.
General guidelines: Always doubt advice from startup folks. Usually the more unicorn and hype they are, the less qualified they actually are.
Basically, they grow by recruiting a lot of people (most inexperienced in average) and grow in all directions without a plan. To caricature, it's like a big playground.
The pain of static slab allocation is real! Changing usage patterns causing problems can be tricky to track down too; mcsauna looks helpful for this. Upgrading to memcached 1.4.25 and running with "slab_reassign,slab_automove,lru_crawler,lru_maintainer" was a huge improvement for our primary memcached cluster at Eventbrite.
The slowness of page load mentioned by folks here is the reason why I think caching at the HTTP level (ex: Varnish) is much more efficient than caching at the service level (ex: Memcached), which is much further down the stack and is bound to be latency-sensitive. Because it's much less entangled in your code and deep into your infrastructure (less technical debt). A hybrid approach can work too but only if it's light and unobtrusive.
By the way, and I'm going out on a limb with my shameful plug, I built a Varnish-as-a-Service kind of infrastructure called Cachoid ( https://www.cachoid.com ). But to my own defense, I'm putting my energy, time, and money where my mouth is.
To my understanding Reddit serves a highly customized content to each logged in user. Can you help me understand how HTTP level caching will solve this problem more efficiently than their service level caching does right now?
You cache non-logged in users to start with. And then you cache based on sessions (logged in users) because you don't really need to show fresh votes on each visit and right away (admitted to it in the article). Plus there's lots of room for ESI.
Logged out users see a "snapshot" of the page updated every so often.
And I really don't think that caching pages per session would really help with their load all that much. Why not just use HTTP cache headers at that point?
Plus while you don't really need to show votes ASAP, logged in users will want up to date comments.
> For example, when new comments are added or votes are changed, we don’t simply invalidate the cache and move on—this happens too frequently and would make the caching near useless. Instead, we update the backend store (in Cassandra) as well as the cache. Fallback can always happen to the backend store if need be, but in practice this rarely happens. In fact, permacache is one of our best hit rates—over 99%.
They basically have their application state duplicated in both places. Interesting architectural choice.
This sounds an awful lot like an old(ish?) technique termed "write-through caching". Building data structures that can take advantage of write-through caching at reddit-comment-scale does sound like it'd be an interesting problem to optimize.
I wonder how much better/worse the site would run if they had their own hardware like StackExchange. And I wonder how StackExchange would run if it were on AWS.
In my experience running large sites, dedicated hardware is not only cheaper but orders of magnitude faster because you can finely-tune the hardware to very, very specific use cases, and you can have a fully private, highly optimized network with very small cable runs to effectively eliminate network latency issues.
Most of my current job involves solving problems that are caused by cloud limitations.
And if you need throughput, 40gigE / 56gigE is not that hard expensive with your own hardware. You end up with like 4x to 10x servers to handle the same load.
I read an article about StackExchange's servers and it seemed super optimized, although I'm not an expert. I can't imagine AWS would be as optimized for exactly StackExchange's requirements. So I think StackExchange would run slower on AWS.
Reddit, on the other hand, is the worst performing out of all the top sites that I can think of.
Hard to tell. Don't know how much volume they have.
Going from a 50th site in the world in a niche topic (stack overflow) to a 27th worldwide that target all audience. It's entirely possible that it grows the traffic by 100x.
Anyway, who cares about money? 100k a year is peanuts. At this scale, if you have a solution that works well, you're good. Not even worth thinking about it.
I mean, doesn't feel like it's working well - fwiw. The number of times i have seen reddit is down pages, ranging from maintenance, to oopsies, to slow loads and failed requests... i just can't count, they seem part of the normal reddit flow.
They're better now then they were ages ago, but it still happens. Combine that with slow load times, etcetc.
Now, don't get me wrong they're likely doing better than i could - it's a very difficult problem to solve. But, something must be fundamentally wrong if a site who's growth has been steady and not "overnight", it just seems like it's always been very very slow. By overnight i just mean, it's growth has appeared predictable; Unlike the poor sites that get HN/Reddit hug of death.
edit: Not sure why i was downvoted, but i'll elaborate:
From a user perspective, Google never goes down. It's a big event when that happens. I've never seen Amazon down (again, personally!). I've never seen most of the big players down. Yet i've seen Reddit down more times than i can count. It's a meme on the site.
Regardless of how difficult the problem domain is, the end user doesn't care. All i'm saying is that the end user experience is not good enough, in my opinion. I don't think that is an unfair statement.
So if this caching is what is powering the current state of things, what can be done better? From a technical perspective, is this the best that can be achieved? I surely hope not. Would you argue it's the best we can hope for?
Anecdotally, I also get way more errors/dropped connections/random slowness on Reddit than on any of the other massive sites. Plus the mobile site is beyond terrible, unusable on my phone.
I'm sure it's not an easy problem to solve, but they should be able to afford some hosting with all the reddit gold that seems to get thrown around.
For a text link site that lets you vote submissions up and down? Yes that's a lot. I'm sure a lot of the decisions at reddit corp were along the lines of, "well, we can't go back and change that now, so let's do whatever we need to fix it and move forward" AKA lots of RAM and caching all over the place.
If a company is grossing $20MM/year and stands to improve that by more than 0.75% by having a faster site, then yes, it makes sense to spend 0.75% of your revenue to do so.
If you think that's all reddit is (text link, with up and down), you have to go use it a little bit. Subreddits, comments, self-hosted images, etc disagree.
That's the cost of 0.5-1 FTE. To come up with a better solution it would have to be enough cheaper to reduce the engineering hours of design and build and then take zero extra people to maintain. Feels unlikely.
Thanks for sharing the details. It's impressive to see the memory allocation and pool size for a site handling this much traffic. I would love to get some more information on reddit's platform overall traffic volume as I feel this would complement the discussion nicely.
I thought the same thing. I stuck with memcached for quite a while because I didn't see a real reason to switch to redis, but after hitting some repeated annoyances with memcached, I gave redis a try and very happy I did. It's much better. Is memcached even maintained anymore?
It's hard to talk in general terms but Memcached is threaded so you can saturate your CPUs without requiring multiple instances, and Redis has more advanced features both from the POV of programmer API facing and operations. But if we want to zoom on Reddit itself, the fact of using data structures and changing caching paradigm to really use Redis the proper way, that is, storing metadata in Redis and not just only in their main DB, I suspect would provide a very big boost to Reddit. For some reason Reddit always has been an "anti Redis" shop. I'm sure they have their good reasons, and btw I love Reddit too much to complain, whatever they do to run it, I don't care as long as they provide such a wonderful service to the community :-)
But... their use case is IMHO one that you can accelerate tremendously by using Redis. I ran the most popular Italian Reddit-alike site for years, and I wrote a simple Reddit clone that uses Redis so I had the opportunity of exploring the problem a bit.
Reddit is most definitely not an "anti Redis" shop! If we were to rewrite it today, we might start with Redis. We've just put so much work into making memcached reliable and easy to understand for developers that the possible benefits of switching to Redis for many of these use cases don't outweigh the operational knowledge we have from years of running memcached at this scale.
That said - we use Redis in at least 2 (soon to be 3) capacities across the site for different services (outside of the monolith) and it works really well.
FWIW, a GET / (front page) while logged in takes 1-1.5s for me (TTFB). ~3s till load with ads blocked, ~5.5s with ads. Both numbers without caching, but that only seems to shave off about 200ms (most of the time is spent in JS and rendering the site).
Front page does indeed take that. But many links (comment pages) take around 3 seconds as I mentioned (generation time, has nothing to do at all with my browser) and pages with hundreds of comments (like most links in the default subs) can easily take over 5 seconds to generate.
They are number 19 on Alexa and are seriously understaffed for the traffic they are getting.
I think they are holding up just fine. All they monetize is Reddit gold and a few small ads, which I'm not even sure are real ads or just ads for more reddit itself
I remember looking into this a while ago and was bewildered to find that when I upvoted or downvoted, there was no XHR call to the backend! There was no hidden iframe/image, no silent form post. Absolutely no network activity. Yet when I refreshed, my vote was shown correctly. I thought I was going crazy.
This was long ago so I'm a bit fuzzy on the details but after a bit of digging, I found the most elegant data collection technique I've ever seen. Instead of sending network data when I voted, a local cookie was set with the link id and vote value. Then when I went to another page, my browser naturally sent the cookie to the server, where I believe it was processed, and then a fresh cookie was sent back to my browser. I could vote on 10 links, the local cookie would get large and then on the next page refresh, the backend would receive my batch of votes, process them, and send me a fresh cookie again.
I don't think they do that now and I've never seen anyone do something like this. Even HN just makes an XHR call on voting. After twenty years on the web, it's not often that I am surprised so this was quite a thrill.