The Infrastructure Behind Twitter: Scale

burgreblast · on Jan 20, 2017

100,000's of servers for 100,000,000 of messages/day ?

I understand that half the servers aren't even doing messages, but, isn't WhatsApp doing 2 orders of magnitude more messages with 3 orders of magnitude (?) fewer servers?

Is that right? I'm curious how one would justify 10,000X worse?

So for each message, 10,000X more equipment is needed?

sametmax · on Jan 20, 2017

Also:

- whatsapp doesn't have to allow browsing the entire history of their billions of messages;

- whatsapp doesn't have tags. A message can go not only to 1000000 users, but also to so many apps requesting updates for one tag.

- Twitter allows advanced search, where you can browse, in real time (or down to the entire history), a complex combination of people, tags and free text. With settings such as choosing the lang or the date.

- Whatsapp has a list of messages. But Twitter has a graph : message can be RT again and again, answered to and liked.

- all those features have some impact or the other on the way the tweets are displayed to the user.

- Twitter's API is much more used than Whatsapp's.

jraedisch · on Jan 20, 2017

WhatsApp also does not need ad related analytics.

monocasa · on Jan 20, 2017

That's a big leap. They don't have to actually show ads for that analytics data to be extremely valuable.

StreamBright · on Jan 20, 2017

WhatsApp from the very beginning went with Erlang and it perfectly suits their needs. You can almost map 1:1 the messages in WhatsApp to messages in Erlang. On the top of that they optimized the hell out of their stack[1].

Twitter on the other hand is a very different problem where you need to broadcast messages in a 1:N fashion where N can be 100.000.000 (KATY PERRY @katyperry. Followers 95,366,810). On the top of that they need extensive analytics on the users so they can target them in the ad system. I am pretty sure there is some space for optimisation in their stack, not sure how much % of these servers could be saved.

http://www.erlang-factory.com/upload/presentations/558/efsf2...

vidarh · on Jan 20, 2017

Twitters analytics are either lossy or eventually consistent [1]. I'm sure they're resource intensive, but they're taking shortcuts that makes them very amenable to saving resources (unless it's just very buggy).

In terms of the broadcast problem, it's trivially handled by splitting large follower lists into trees, and introducing message reflectors. Twitters message counts is high for a public IM system, but it's not that high overall messaging volume for private/internal message flows. More importantly, despite the issue of large follower counts, if breaking large accounts into trees of reflectors, it decomposes neatly, and federating large message flows like this is a well understood problem:

I've half-jokingly in the past you could replace a lot of Twitters core transmission of tweets with mail servers and off the shelf mailing-list reflectors, and some code to create mailboxes for accounts an reflectors to break up large follower lists (no, it wouldn't be efficient, but the point is distributing message transfers including reflecting messages to large lists is a well understood problem), and based on the mail volumes I've handled with off the shelf servers I'll confidently say that 100's of millions of messages a day that way is not all that hard to handle with relatively modest server counts.

Fast delivery of tweets using reflectors to extreme accounts would be the one thing that could drive the server number up high, but on the other hand, there are also plenty of far more efficient ways of handling it (e.g. extensive caching + pulling rather than pushing for the most extreme accounts)

Note, I'm not saying Twitter doesn't have a legitimate need or the servers they use - their web app does a lot of expensive history/timeline generation on top of the core message exchange for example. And the number of servers does not say much about their chosen tradeoffs in terms of server size/cost vs. number of servers, but the core message exchange should not be where the complexity is unless they're doing something very weird.

[1] Taking snapshots of their analytics and the API follower/following count shows they don't agree, and the analytics numbers changes after the fact on a regular basis.

burgreblast · on Jan 20, 2017

Ha. Love the mail server idea.

It simply proves the point that it's not a terribly large problem that takes 10,000 times the equipment because of [search | many recipients | tags | etc].

It reminds me of that flickr architecture from back in the day: hopelessly complicated with tiers and shards and tiers and caching and tiers and tiers...to serve some images. But tagging!

Do people feel more important if they make a complicated solution? Where is Alan Kay?

polskibus · on Jan 20, 2017

Could you elaborate a bit on the message reflectors and using follower trees instead of lists with regard to messaging like Twitter? I am genuinely interested in improving messaging patterns in twitter-like scenarios (ie. large fan-outs)

vidarh · on Jan 22, 2017

Let me start at the beginning: I have used mail servers as messaging middleware. Back around 2000 I ran an e-mail provider, and we jokingly started talking about taking our heavily customized qmail install and turning it into a queuing system for various backend services we were building. Then we decided to try it, and it worked great (we ended up using it in a reference registrar platform we built when we build the .name registry; but I've used a similar solution elsewhere since)

The point is e-mail provides the federation, and has a rich eco-system of applications and handles things that are easy to mess up, like reliable queueuing and retries, as well as a rich systems of aliasing and forwarding.

So let's consider Twitter: You have a list of followers, and a list of people you follow. It provides two obvious ways of knitting together a timeline: Push and pull. In real life it's probably most efficient to mix, but for the "twitter by e-mail" architecture, let's consider push only.

In its simplest form you map twitter ids to an internal but federated "email address" to a virtual bucket. Then you use MX records to map virtual buckets to a server. On each server you map the internal email address to a mailbox.

You also maps twitter ids to an internal "email address" for reflecting tweets to that twitter accounts followers. It also maps to a virtual bucket, with MX recors mapping to a server. But instead of mapping this addres to a mailbox, you map it to a mailing-list processor.

When user A follows user B, in this model that means user A subscribes to user B's reflector.

To handle fanout, you can use the aliasing supported by pretty much all mail servers to remap the reflector address to a second mailing list. This second mailing list is a list of lists. Here you need "non-email" logic to manage the mailing lists on the backend.

To outline this, for user A, the above might look like this:

- Twitter handle A maps to A@virtual-bucket-56.timeline.local ("56" is arbitrarily chosen - imagine hashing the twitter handle with a suitable hash)

- MX record mapping virtual-bucket-56.timeline.local to host-215.timeline.local ("215" is also just arbitrarily chosen in this example).

- On host-215.timeline.local there is a IMAP mailbox for tweets from people this user follows.

- Twitter handle A also maps to A@virtual-bucket-56.reflectors.local, with MX record mapping that to host-561.reflector.local (the point being that the MX records can be used to remap failing hosts etc)

- On host-561.reflector.local "A" maps to a mailing-list package that accepts basic subscribe ("follow") and unsubscribe ("unfollow") options.

Here you already have the basics. The "magic" would happen once the mailing list A@host-561.reflector.local reaches some threshold, say 10k. At this point you'll want to add a level of indirection, say you rename A@host-561.reflector.local to A-sub1@host-561.reflector.local and creates a new A@host-561.reflector.local with one subscriber: A-sub1@host-561.reflector.local. Then you create a new mailing list on a different server with sufficient capacity, lets say A-sub@host-567.reflector.local, and subscribe that (you might want to indirect these two via virtual buckets) to the main list.

There's no magic here - mailing out a list of 10k is trivial. A two level tre with 10k at each level can have 10k leaf nodes with 10k users each, for 100m users.

In practice you'd likely "cheat" and mark the top users someone is following, and do pulls against cache servers for their tweets instead of pushing them, and so drastically reducing the need for big fanouts. Basically you need to spend lots of time testing to determine the right cutoffs for pull (which potentially will hit many servers on each page reoad) and push (which hits many servers each time someone tweets to a large follower list).

Again, let me reiterate that while this type of setup works (have tested it for milllions of messages), it's by no means the most efficient way of handling it. The e-mail concept here is more of a way of illustrating that it's a "solved problem" and "just" an issue of optimization.

For starters, you'll want to consider if it's easy enough to reconstruct data to drop syncing to disk, using RAM-disk to speed up "deliveries" etc., and you may want to consider different types of storage backends etc. You may also want to consider other "tricks" like locating leaf-reflector nodes on the servers where the accounts the reflect to are located (at the cost of more complicated "mailing list" management).

The most worthwhile lesson is that if you hash the id to a virtual bucket, and have a directory providing mapping from virtual bucket to actual server, you gain flexibility of easily migrating users etc.. If you in addition provide a means of reflecting messages to a set of subscribers you have pub-sub capability. If you need to handle big fanout, you'll want a way of "pushing down" the list and inserting a fan-out reflector "above" it.

Those patterns can be applied whether you use e-mail, or zeromq or any low level messaging fabric for the actual messaging delivery (in general the [entity] => [virtual bucket] => [server] indirection is a worthwhile pattern for almost anything where you may need largescale sharding)

kalleboo · on Jan 20, 2017

In WhatsApp, a typical message goes to 1 other person. On Twitter, it can go to millions of people.

When Twitter initially got their failwhaling under control, I recall reading they solved it by changing from a relational "join in and merge the timelines of everyone you're following on each refresh" model to a messagebox model. If that's true, maybe that naive model is now showing its limitations (I doubt they stopped there though, it seems like they have things under control)

utborin · on Jan 20, 2017

I suspect the writer was using the phrase "hundreds of millions" figuratively. When I worked there years ago there were already 14 billion API requests a day, iirc. (That number was public at the time, for the record.)

burgreblast · on Jan 20, 2017

I belive it's in the low 100's of millions of tweets per day. I've seen that stat elsewhere

> 14 billion API Do you mean 14B internal, services-requesting-services API requests?

Surely you can't mean 14B API requests from the outside world, can you? I'm scratching my head over how their real user base could generate anywhere near that load.

kingryan · on Jan 20, 2017

As of several years ago the putlic http endpoints would easily do 1M/sec at peak times. Not just api, but web, images, et al.

threeseed · on Jan 20, 2017

You really need to read the article.

Those servers aren't just for managing the messages. It's also for their advertising and analytics platforms. And since over a third of their servers are for generic Mesos it could be for anything e.g. development containers.

nrjdhsbsid · on Jan 20, 2017

That's only 1000 messages per second on average. A single database +app server could handle that load. Assuming a bunch of other stuff is happening 500 servers sounds generous.

Wtf are they doing that each server can only handles one tweet every two minutes?

burgreblast · on Jan 20, 2017

Actually, it's 1000-9000 messages per server per day. Or about 1 message every 10-100 seconds.

Of course, that's just the new messages inbound. They may need to distribute that single message to 100M people (who likely won't even see it, but still.)

Problems that are trivally solvable with one database don't simply scale by adding more DBs or machines. Scaling isn't easy or they would have done it. I'm in no way disparaging their team, because I don't know what kind of constraints they had getting to this point.

Still, I'd bet it could be optimized by 2+ of orders of magnitude if people sat down and re-evaluated the whole structure again at this point in time.

Regardless, is that really a priority?

They may have bigger issues on their plate now (growing revenue, growing users, making users happy). Assuming their business can generate the cash flow to overcome the inefficiencies, they may be better served to focus on growth.

niftich · on Jan 20, 2017

The day before, Discord did high-quality a write-up on why they chose Cassandra [1], and now this post hits explaining how one of the world's most popular and trafficked service has engineered their infrastructure; it's like a dream.

I'll echo the praise I wrote earlier, that insights like this aren't only some of the best content to hit HN, but become some of the most valuable resources for designers who have yet to face a scaling issue, but know they will soon.

Since you have developed custom layers on top of open-source software to fit your particular usecase and load profile, and host all this in-house, have you considered monetizing your infrastructure for outsiders who may have similar needs?

Today, one has limited, unpleasant choices: either pay out the nose for something like AWS or Google Cloud to get elastic scaling and the captive storage systems that can be made to handle these kinds of workloads and still have to write a fair bit of custom glue to get all pieces to play nice, or you can build out the servers yourself, but have to employ dedicated talent with the requisite expertise. Either way, the barriers are fairly steep; you could tap into an under-served market should you choose to sell IaaS (edit: or, more accurately, PaaS). Has this conversation come up in the past?

[1] https://news.ycombinator.com/item?id=13439725

atcole · on Jan 20, 2017

This is a fairly technical analysis, and the terminology used in many cases is above what I know about networking. But the one quote that will stick is this.

"There is no such a thing as a temporary change or workaround: In most cases, workarounds are tech debt."

kristianc · on Jan 20, 2017

Tech debt is a choice. Sometimes you'll want to embrace some level of technical debt in order to bring something into production quickly, with the understanding that you will fix later. It's part of a triad with Speed and Quality.

That may not be the choice Twitter has made in this instance, but it's a viable choice nonetheless. Defaulting to tech debt = evil wrong imo.

dingaling · on Jan 20, 2017

The problem with technical debt versus financial debt is that the latter has a monthly mandatory cost ( interest payments ) that can't be ignored and which is visible all the way up to the C-level, whereas middle-management can keep obscuring the presence of technical debt and pushing its repayment out to the right.

Essentially it's a 'free' internal debt, regardless of how often architects and developers complain of its cost.

Thus in a contest between doing something right, but expensively, versus good-enough-for-now but technically-constrained the latter will usually win.

tomblomfield · on Jan 20, 2017

It's not "free", it's just much harder to measure.

The cost is reduced development velocity, and perhaps reduced systems stability.

lostcolony · on Jan 20, 2017

Which developers get blamed for, even though it was a managerial decision to take on the debt.

Financial debt has clear cost; technical debt doesn't. So it seems 'free' to the non-technical.

fn1 · on Jan 20, 2017

And sometime a workaround is the best solution you'll get. Because solving the problem properly might introduce new issues, which might require new workarounds.

paulddraper · on Jan 20, 2017

Exactly. The operative part of "workaround" is "work".

rhizome · on Jan 20, 2017

Sometimes it seems like "technical debt" is used as a dysphemism for "refactorable." We see railing against technical debt, then tomorrow there's a "code is never finished" post that gets nods all around. A bit of a strawman, but my point is that there's a big picture of the technology lifecycle that somehow fosters disparate contexts for the same exact thing.

djsumdog · on Jan 20, 2017

I really like this tweet:

"I'm the Technical Debt Fairy. If you leave technical debt under your pillowcase at night I hire away your best developers"

https://twitter.com/mipearson/status/351539310199189505

wpietri · on Jan 20, 2017

Truth. The only time I consider something temporary is when the business stakeholder asking for the temporary change has a) promised b) a specific period c) when the cleanup happens, and d) they have a track record of honoring their promises.

And I encourage everybody to make that their standard. Now I never cut a corner without that. I never even offer. My default is a zero-additional-tech-debt approach, because that's the only thing I think is responsible or sustainable. If there's a legitimate business need for taking on a bit of tech debt, I will propose the deal of splitting the work into, say, "experiment" and "cleanup", but cleanup ends up in the workstream with a date attached. If the stakeholder accepts the deal but fails to honor it, I revoke their tech-debt credit card.

paulddraper · on Jan 20, 2017

That makes as much sense as saying "there is no such thing as a temporary credit card purchase; in most cases, loans are debt".

Also, if we're talking of debt, Twitter ought to be more concerned about its VC debt (investment).

coldtea · on Jan 20, 2017

No, it makes much more sense than that, since there are changes that are not incurring technical debt. It's only "temporary changes" and "workarounds" in other words quick kludges that the say are incurring permanent technical debt.

jlgaddis · on Jan 20, 2017

I've been telling cow-orkers this for a long time. They're finally starting to realize that it's true.

rollulus · on Jan 20, 2017

It strikes me that so much of the components they use (e.g. under "Storage") are in-house built (several dbs, blob store, caches, etc). Is that because at that time equivalent solutions didn't exist? Is that because Twitter suffers from NIH?

fizx · on Jan 20, 2017

I built custom storage for Twitter back in the 2010-12 period. There wasn't much off the shelf in those days that worked out-of-the-box at scale besides Cassandra, and Twitter had a well-documented attempt at using Cassandra for primary workloads that failed due to the amount of variance in IO and latency for high-volume workloads.

Most of the time, we were building custom distribution layers on top of open source storage (e.g. memcached/mysql/redis/etc). I think blobstore was the first thing twitter put in production that was mostly custom, followed a year or two later by manhattan. I'm not sure if there's even now good open-source competitors for those projects, largely because any reasonable smaller company uses s3 or dynamo.

There's plenty of open-source things twitter created, or nurtured out of the existing ecosystem, from mesos to memcached to some of the hadoop/scalding/parquet stuff.

seanp2k2 · on Jan 20, 2017

The flip side of that imo is that if you don't have Twitter-scale needs for the specific things they've optimized their infrastructure for, you probably don't need their solutions :)

TheAceOfHearts · on Jan 24, 2017

I haven't used it, but one tool I've read good things about Minio [0]. I don't know if it's able to match twitter-scale, though.

https://minio.io/

daveFNbuck · on Jan 20, 2017

Probably a bit of both. I know the team I was on when I worked there was fairly proud of duplicating effort implementing things that other teams had already done. But Twitter does tend to build really good stuff.

thinkingfish · on Jan 20, 2017

In general, Twitter infrastructure didn't start by building solutions in house. Like most early stage companies, it started by using whatever it can find in the OSS repertoire. But a lot of problems that are non-existing at a smaller scale or at least tolerable become more acute when you scale up.

This is when the developers face a decision: to improve the existing solution, or develop a new one? There are many factors that impact the final decision. Maturity of the technology? Community support/inclusion? How big is the necessary change? What scale can existing solution handle by design? How does in-house talent compare to those maintaining the original? What's the focus/need of the in-house use cases compared to the broader objective of the OSS solution? It should not be a surprise that the answer differs from project to project. But if the decision is to use the existing solutions, it probably won't raise much eyebrow/question.

It is very common for larger-scale operations to write up their own solutions. Because a different scale can change the nature of the problem fundamentally. Google, Facebook and whoever has production fleet above the 10K range tend to build a lot of in-house solutions, often after similar considerations as mentioned above.

throwawasiudy · on Jan 20, 2017

Sounds like it to me. Storing a crap ton of 300 byte messages is pretty common. Thousands of companies have been doing it for a decade. Anyone that does analytics or email probably stores far more of these than twitter.

Blob store is perhaps the only forgivable custom solution. Besides the eventually expensive S3 you pretty much have to roll your own large binary storage at that scale.

For regular DB sized workloads 1-16k bytes they had hundreds of options to choose from. Same with caching. They're both relatively solved problems.

tabeth · on Jan 20, 2017

I know nothing about storage, so I'm a bit confused about why Twitter needed:

1. Hadoop

2. Graph

3. Redis/Memcache

4. Blobstore

5. SQL variants

(and a few others).

I do see that the post has a short snippet briefly describing what they're storing, but I'd be curious to know why (speed, cost, latency, space tradeoffs/constraints).

Also, if any more experienced folks want to chime in: Elixir/Erlang is "built for concurrency" as they say. I'd love to hear people's opinions on their one sentence simplifications of what kind of situation Hadoop/SQL/Redis/etc should be used for (similar to how Erlang is best used for situations where concurrency and fault tolerance is desired). In particular, is there a "Code Complete" type book for storage?

marcinzm · on Jan 20, 2017

That seems a pretty standard stack, I don't know about twitter but I can say why I'd use them in my experience:

5. SQL variants: This is where you keep your source of truth data needed to serve request. It's flexible, reliable but moderately fast. The backbone of your infrastructure and you can consider everything else as enhancements to it to overcome specific limitations.

4. Blobstore: Your SQL database is not going to be good at storing large binary blobs such as files or images.

3. Redis/Memcache: Your SQL database may be too slow for some requests and/or it's data model may require too much work in the API layer to align with the final presentation model. So you want a blazing fast store that you can use for that final presentation view when requests don't differ from each other.

2. Graph: SQL databases aren't the best for graph queries or data (I mean you can hack it but it's going to be slower and uglier code wise) so if you need to look at data as a graph something dedicated makes sense.

1. Hadoop: You're getting lot's of log data from users, what they do, when they do it and so on. You want to analyze this data, build models (who to recommend for people to follow for example) and use it to probably serve ads (gotta pay the bills somehow). You could put this in SQL but it's a lot of data and you'll be doing a lot of read requests which will end up being slow and messy in SQL. Hadoop, on the other hand, is well suited for this type of data and workload. Large slow read heavy queries on data that is appended to but not modified in place.

dajohnson89 · on Jan 21, 2017

FTFA: >Graph: Our legacy Gizzard/MySQL based sharded cluster for storing our graphs. Flock, our social graph, can handle peaks over 9 million QPS, averaging our MySQL servers to 30k - 45k QPS.

Isn't storing a GraphDB in MySql a Bad Idea? Graph queries are extremely poorly suited for relational databases.

thinkingfish · on Jan 21, 2017

I don't work on GraphDB, but basically the situation is this: 1) yes, it is not a great use case for MySQL, but that's how it started, partly because Twitter needed join on its graphs; 2) legacy systems die hard, especially at scale- Twitter is working on a better solution, but for now Flock is still what's running in production.

dajohnson89 · on Jan 21, 2017

Thanks for explaining. I had a similar situation -- storing a graph db in postgres. The performance sucked, but we didn't have much of a choice.

timClicks · on Jan 20, 2017

Erlang was behind the NoSQL database that started the NoSQL hype train (CouchDB), a document database. The Erlang distribution contains an amazing distributed database called Mnesia.

For books, read the AOSA Chapter on Riak (http://www.aosabook.org/en/riak.html) and then read through the tutorials on Riak Core. Oh, and Erlang in Anger https://s3.amazonaws.com/erlang-in-anger/text.v1.0.3.pdf

rozap · on Jan 20, 2017

Mnesia is a joy to use, but keep mind it was written before the CAP theorem existed, so it doesn't behave that well in the face of network partitions.

redler · on Jan 20, 2017

You might try "Seven Databases in Seven Weeks". It's a few years old now but still solid.

jzl · on Jan 20, 2017

Lots of CIO-type buzzwords and acronyms in this, including many coined by Twitter themselves. More worthy of a skim than a rigorous reading. But interesting for a bird's eye view of how complex Twitter is behind the scenes.

mnutt · on Jan 20, 2017

They mention that they used to base their geo routing on the location of the client's DNS server but have moved to BGP Anycast. I've heard that there can potentially be routing issues for long-running connections using anycast to end users, is anybody else doing something like this and do these issues happen in practice?

pyvpx · on Jan 20, 2017

TCP Anycast - Don’t believe the FUD

Matt Levine (CacheNetworks), Barrett Lyon (BitGravity), Todd Underwood (Renesys)

Operational experience with TCP and Anycast

https://www.nanog.org/meetings/nanog37/presentations/matt.le...

smueller1234 · on Jan 20, 2017

As far as I know, using anycast for geo routing is pretty standard. Eg. I'm not aware that my employer (large ecommerce) has had any routing issues. This being said, we mostly avoid long running connections today. With more messenger like products becoming ubiquitous, I expect that to change though.

seanmccann · on Jan 20, 2017

> Fast forward a few years and we were running a network with POPs on five continents and data centers with hundreds of thousands of servers.

That seems high given Twitter's size and the hardware distribution pie chart they showed. Does anybody have an idea how this compares?

smueller1234 · on Jan 20, 2017

I'm working for a very successful Internet company with 9 figure user count. We have 7 or 8 POPs on 4 continents and run a low 5 digit number of servers.

This being said: we do not have to deal with anything like the network effect and lifestyle type usage that twitter has. Once you pass a certain size and complexity of infrastructure it happens really easily that economy of scale doesn't help and the complexity of everything having to be distributed works against your efficiency big time. So I'm not shocked by their numbers.

easytiger · on Jan 20, 2017

Web companies think 12k transactions per node is fine.

It isn't fine. They just have an idiotic funding ans scaling model

ashayh · on Jan 21, 2017

| We have over 100 committers per month, over 500 modules, and over 1,000 roles.

| we were able to reduce our average Puppet runtimes on our Mesos clusters from well over 30 minutes to under 5 minutes.

This isn't just tech debt .. it's poorly designed, poorly thought, poorly architect-ed and poorly managed in the first place.

Is it because Twitter cannot find good talent because of its falling stock?

marknadal · on Jan 20, 2017

I've talked/pitched (full disclosure: got a "no") some of the bigger names behind the product at Twitter. I was a little disappointed because they seem to be proud of their at least $15M+/month server costs which is partly driving their company into the ground (the user facing product hasn't improved despite "re-engineering" of their backend, at non-substantial price differences, and lack of innovation for consumers have made them lose everyone to Snapchat or anti-censorship sites).

Of tweets alone (about 200bytes per tweet), over the last decade, they probably have about 3 petabytes. Unknown to them (because of the aforementioned pride) they have 1.5 petabytes a month of free storage/caching they aren't even touching. If they switched to a P2P model like IPFS or (full disclosure: I work here) http://gunDB.io/ , but Twitter seems determined to stay as a centralized monolith. Which is too bad, because that has now become their own death - the regime change is happening and decentralized services will win instead.

Edit: Compare against 100M+ messages (1000bytes each) for $10 a day. 2 min screencast here: https://www.youtube.com/watch?v=x_WqBuEA7s8 . Even if you multiplied the feature set by 10K times, you would still be saving $12M+ a month. At this rate the Discord guys (pushing 120M+ messages a day) are doing way better - there post is on top of HN right now too, see my comment there as well. And they only have 4 backend engineers.

jhoechtl · on Jan 20, 2017

Why the downvote? The post contains valuable architectural links worth a discussion

snowwrestler · on Jan 20, 2017

Because it reads like "I tried to sell Twitter something, and now I think they're dumb because they didn't buy it."

Also because it claims that decentralized services are winning when the exact opposite is true.

marknadal · on Jan 20, 2017

No matter how biased I am, Twitter's collapse over the last half year is evidence of their demise.

Decentralized tooling (even if it runs on a centralized service) is winning, the stark contrast against Discord's engineering post ( here: https://news.ycombinator.com/item?id=13439725 ) is proof.

I love being proved wrong though, shoot me some links backing up your claim?

rapsey · on Jan 20, 2017

1. Because it's a .js database.

2. Because the website does a poor job explaining what problem it solves.

eduren · on Jan 20, 2017

Agreed, their website definitely turned me off.

Quick tip for them: Having a section for showing that a bunch of people are using and "Trusting" the product is good, but your's seems to mostly (4/7 from what I can see) be the investors behind your product.

Edit: Looked at the names more, and it seems 7/7 are individual investors or VCs

https://angel.co/gun

Which is all well and good, but maybe the wording should be "Investors and Backers" rather than "Trusted By".

marknadal · on Jan 20, 2017

Sorry the website turned you off :( what recommendation do you make for balancing both enterprise and developers? Currently we have two website, the other for developers: http://gun.js.org/

Better links:

(again) https://www.youtube.com/watch?v=x_WqBuEA7s8 100M+ msg a day for $10.

Doing distributed machine learning: http://myrighttocode.org/blog/artificial%20intelligence/part...

30M+ ops/sec performance testing: https://github.com/amark/gun/wiki/100000-ops-sec-in-IE6-on-2...

1 minute demo of seamless multi-server failover during a live chat app: https://youtu.be/-FN_J3etdvY

1 minute demo of recovering from a complete server meltdown: https://youtu.be/-i-11T5ZI9o

You are right - trust should be proven by demonstration, not the blind faith of others.

We're also deploying out to customers with 1.5M users in production and customers with a product being shipped to 1.5K homes. And we're nearing our production-ready v1.0 stable release.

Feedback like yours is helpful, so please hit me up with any other critiques or ideas on how we can explain ourselves better. Thanks for jumping in on the convo :)

rapsey · on Jan 20, 2017

After reading the topmost info I still don't know if this only runs in the browser, on the server or both. I don't know how data is stored, how nodes sync, what protocol it uses, what storage engine it uses, etc. The main page has paragraphs of text that tell me nothing.

marknadal · on Jan 20, 2017

Yes, it runs in the browser and server (both).

Data is stored in a graph format, currently (unfortunately) as just JSON. There are storage adapters to backup to S3, Level, SQLite, etc. Here is an article on the JSON graph structure: https://github.com/amark/gun/wiki/GUN%E2%80%99s-Data-Format-...

Sync is done using a deterministic AP (of the CAP Theorem) conflict resolution algorithm. Specifically, a hybrid vector-lexical-timestamp CRDT. An article on how it works is here: https://github.com/amark/gun/wiki/Conflict-Resolution-with-G...

And the tradeoffs are discussed here: https://github.com/amark/gun/wiki/CAP-Theorem

Custom protocol, our docs aren't up to date on it though. But it is designed to work over WebSockets, HTTP, TCP, UDP, WebRTC, etc.

eduren · on Jan 20, 2017

Thanks for being receptive. My original recommendation is all I have right now.

>maybe the wording should be "Investors and Backers" rather than "Trusted By"

marknadal · on Jan 20, 2017

Done! Thanks :)

sciurus · on Jan 20, 2017

Previous discussion of the Gun database: https://news.ycombinator.com/item?id=9076558

hueving · on Jan 21, 2017

They switched to a BGP anycast model for twitter.com, which implies TCP. I'm curious how they deal with situations where route preferences change in intermediary ISPs mid-TCP stream. Does the new server reject the TCP connection or are they synchronizing TCP sessions across clusters?

therealmarv · on Jan 20, 2017

No Ansible? In an older (2014) Ansible video they claim Twitter is using Ansible but I only see Puppet mentioned.

rco8786 · on Jan 20, 2017

Only ever used puppet during my time there, fwiw

turbohz · on Jan 20, 2017

If Twitter can manage to operate at profit, then I might be interested in Twitter's infrastructure.