> This left a lot wondering what exactly was going on with all those engineers and made it seem like it was all just bloat.
I was partly expecting the rest of the article to explain to me why exactly it wasn't just bloat. But it goes on talking about this 1~3-person cache SRE team that built solid infra automation that's really resilient to both hardware and software failures. If anything, the article might actually persuade me that it was all bloat.
> the article might actually persuade me that it was all bloat
First of all, how does it persuade you of that? The article touches a really small (though incredibly important for up-time) subject.
Secondly, in any large company, the majority is 'bloat'. It's security engineers, code reviews, data architecture, HR, internal audit teams, content moderators, ccrum masters and I can keep going. In a start-up many of these roles can be ignored, becaus growth > stability. In a large organization, part of the bloat helps insure a certain amount of stability that's necessary to keep an organization alive.
If a product is mature enough, like Twitter seems to be, removing engineers won't instantly crash the product. It'll happen slowly. Bugs will creep in, because less time is spent on review and over all architecture. Security issues will creep in because of about the same issues and less oversight. Then, once this causes enough issues for the product to actually crash, the right people to fix it quickly might not be there anymore. That's when fixing the issues suddenly takes a lot more time.
If the current state of affairs at Twitter keeps up, it'll probably be a slow descent into chaos. Especially with Elon pushing for new features to be implemented quickly, inevitably by people who cannot fully understand the implications of said features, because 80% of knowledge is missing.
By flowing from many people think it's bloat - I'll tell you what's really going on to tiny team of 1~3 built whole infra for critical component.
I'm not really trying to make commentary on whether or not Twitter engineering was bloat, or whether or not I think it'll hit problems in the future. Just commenting on the fact that the article broke my expectations a little bit as a reader.
https://twitter.com/atax1a/status/1594880931042824192 provides a bit more context and nuance (albeit with a sardonic tone, but it's hard to blame someone for being sardonic who saw their entire professional community fired).
There's no doubt that OP built a great and stable automation layer on top of Mesos for caching workloads. But there are numerous other types of workloads on top of Mesos (including, I presume mission-critical database deployments that need well-disciplined draining protocols to shift between nodes), as well as administrative needs for the Mesos-to-infrastructure level, and things running on bare metal below the Mesos level. These things all needed dedicated SREs, and the absence of these SREs could result in a scenario like the one mentioned in the Twitter thread I linked - two obscure mutually-dependent components expire and cannot be re-provisioned using documented tools.
I also think an important meta-point is that when Twitter was bringing in substantial revenue from advertising, every minute of downtime would have significant costs - costs that could make it easily worthwhile to "over-provision" SRE talent. With advertisers pausing engagement, perhaps Twitter loses less money from a day-long outage than it would save having the right talent to turn a day-long outage into a minutes-long outage.
Twitter is only judged by its profitability (namely, Musk's ability to service debt without selling more Tesla stock than he already has), while most other tech companies (both public and private) are judged by both profitability and revenue growth. If you want both, larger SRE teams, to say nothing of feature development and regulatory compliance teams, start to make a lot more sense.
> In a start-up many of these roles can be ignored, becaus growth > stability. In a large organization, part of the bloat helps insure a certain amount of stability that's necessary to keep an organization alive.
It also (a) increases the bus factor, [1] and (b) allows people to take vacations and time off without having to watch their phones like hawk.
Good point. I know this as the Mack Truck Theory. For the project I'm working on right now, there's a couple of incredibly valuable people that would cause a pretty significant issue if they disappeared.
Sensitive workplaces (which seems like most these days) have taken to calling this the Lottery Factor (as in team members quit bc they hit the lottery) to spare more delicate types the pain of imagining their peers run over in traffic accidents.
Having been in the situation that a coworker did no longer arrive to work one morning because some person opened the car door when they were passing with their motorcycle, I can assure you that it's not only the more delicate type of people that all of a sudden are a bit more quiet than usual.
There's advantages to both terms. Bus factor maintains the (possible) fiction that people care about their coworkers/bosses enough that they'd transition out if they had the choice. Bus factor highlights that there's no way to get that information even if the person is nice enough to help for effectively free.
I work in the EEO/AAP space and can confirm our company calls it the Lottery Factor though it is partially because we do an office lotto pool when it gets big and jokes bled over. However it is much nicer for discussions. What will we do when employee x gets hit by a bus just does not sound good to random people walking by.
Almost lost a co-worker when he got hit by a bus five years ago. He's back at work full time now, with a substantial settlement from the transit agency (nice amount of f** you money). He wasn't an essential employee, but was important to our team in many ways. Fortunately he was always willing to joke about his accident. He'd survived, after all!
>Sounds like those people are in strong negotiation positions
You'd be surprised, but that's not necessarily the case.
One of my friends was such a person in a shoe making company as a designer. Instead of giving her a raise, they fired her.
Cue them re-hiring her a month or two later after they found out the hard way that the less experience subordinate really couldn't handle the job of the two of them on their own.
I know the bus factor is the morbid "how many people can get hit by a bus?" idea, but I actually like to present it as "how wide is the bus?" in terms of people being the conduits along which information and instructions flow; a wider bus provides redundancy. Aside from being more positive I think it's also more accurate for how we want teams to work. The "win the lottery and quit" metaphor is just stupid.
Have I been in a life-long bubble such that the ordinary explanation of "bus factor" seems so mild that until reading exchanges like this a couple times it never occurred to me it might be morbid enough to bother someone? Or is this one of those things where people are looking for something to worry about but it's actually entirely fine? Like, I've seen children's cartoons with jokes that were more morbid than that.
(I do also like the lottery version and use them basically interchangeably, though the shorthand is always "bus factor" for me)
I think it may be that some people have low environmental-sensation barriers.
For them, hearing "bus factor" may make it through the barrier and result in an involuntary creation of imagery in their head of someone getting gruesomely hit by a bus, and then the corresponding emotions they would feel (or simply just the emotions, without the imagery).
Phrasing "getting run over by the lottery" sounds like a good humorous, compromise. But "winning a bus" sounds like you're actively rooting for the person to be hit. I'd stick with the former. ))
I don't think the analogy holds in the same way. To me, the bus factor represents how many people you can lose for an extended period of time before there are no subject matter experts left for a particular topic. If you've got 8 people on the team, but only 2 of them know how to do a particular thing, the bus factor is 2.
It's not about having enough people to do the work even if someone quits, it's about having enough people that know how to do something that we aren't losing chunks of knowledge if someone quits (or dies, or gets fired, or gets sick, or etc).
It doesn't make sense to me to treat people as part of a conduit bus that are interchangeable as long as there are enough people.
>removing engineers won't instantly crash the product. It'll happen slowly
It's amazing to me how many people following the Twitter saga, some familiar with or actually working in technology, thought that Twitter would crash within days of the engineers being fired. And because it didn't, the job cuts are justified.
I agree those were odd takes. I've likened firing most of the engineers to taking your hands off the wheel in the car. It won't crash immediately, but it doesn't mean the car can go driverless.
With that said, there are differences between internal systems and something like Twitter on the public internet. I assume that Twitter is a system under constant attack. What happens when the next log4shell level vulnerability comes out?
If Twitter went another month without an outage, how would you adjust your opinion? How about a year?
The car analogy is amusing, but how much does it really hold up? Have we ever seen another major social media company drop this much of its staff in one go? I certainly can’t think of an example. I think we’re in somewhat uncharted waters here.
A driverless car won’t last long, we know that for a fact. I think it remains to be seen how long a bloatless twitter can last. I’m personally optimistic.
> I think it remains to be seen how long a bloatless twitter can last.
It's also really hard to define 'last' though. Does 'last' mean just for up-time? Does it mean up-time without a major security incident while maintaining the same DAU? Does it mean business as usual on all fronts except number of employees? We know that Twitter already had some security issues with their God Mode admin panel.
I really wonder, for example, whether no angry ex-employees still have access to critical systems or data. This is usually pretty well regulated in large organizations like twitter, but since they've lost the majority of their staff, who knows when the people looking after that left?
> I think it remains to be seen how long a bloatless twitter can last.
I'm not convinced Twitter had a ton of bloat. (Most of the teams actually involved don't seem to think so). Just because Elon can't understand something, doesn't make that thing "bloat".
Twitter definitely had a few weird features that could be cut (the audio podcasting thing, for example). But calling most of Twitter microservices "bloat" is about as dumb as calling a cars Seatbelt and Airbag and Crumple Zones and that spare tire in the trunk "bloat" -- it's only "bloat" if you assume all people will always be perfect and no one will ever make a mistake anywhere, and nothing bad will ever happen.
I think it's well understood that when people refer to Twitter's "bloat" they're talking about its bloat of extra employees, not its extra microservices.
I've told this example before, but my close friend and roommate did a year at Twitter in 2019. He was tasked with implementing versions of simple, relaxing JS games like Tetris which would be used by Twitter's moderation staff when they felt they had accumulated too much stress and needed a break. He was making $300k USD to copy an open-source version of Tetris to be played by paid staff! This was actually one of his more engaging projects, most days he said that he wrote 0 code at all. He got great reviews and was being encouraged by his manager to pursue the senior track; from what I could observe from the outside, there was a complete misalignment of goals at pretty much all levels.
Perhaps we disagree which is totally fine, but I think this type of allocation of eng resources absolutely counts as "bloat". You may see that work as comparable to a seatbelt or crumple zone, but I personally see it as more comparable to all the other expensive, useless nonsense which plagues modern cars.
This is an interesting example but I have to wonder if it may inadvertently highlight a disconnect between the parties involved.
I'm no expert in this area but many stories have surfaced over the years regarding high levels of stress and (in extreme cases) meaningful declines in the mental health of social media content moderators.
I recently encountered a twitter thread from an early LiveJournal employee (1) @rahaeli "about Trust & Safety work, the toll it takes on you, the things you see, and the human misery, suffering, and death that happens when you fuck it up, including murder and child sex abuse." and I am inclined to believe that, while relatively simple and un-engaging, managers and end-users likely would have found it important.
> He was tasked with implementing versions of simple, relaxing JS games like Tetris which would be used by Twitter's moderation staff when they felt they had accumulated too much stress and needed a break.
Oh my god, this is incredible. Thank you for this story.
I have several college friends (developers) that ended up at twitter. Some of them loved it, some of them hated it, one universal thing I pretty much heard from all of them is that they typically did actual work 15-20 hours week.
Now overall I wouldn't say this is unusual in tech. I know plenty of people at other major tech companies, including other FAANGs, that say the exact same thing. But just because its appears to be a industry wide issue does not make it okay. It seems to me like that this would signify bloat. You only have enough work for your developers to be working half time.
Whats the problem with working only half of the time? Work is also a social enviroment and the «not working » part of work a realy important part. If the staff would work 100% of the worktime they would be fucked. Most people I know prefere a good working condition than working ultra hard for some sadistic asholes.
I don't even know where to begin with this comment? So you have no problem being paid for only half the time right? Or are you saying you deserve to steal from the company you work for?
There are a lot of assumptions baked into your position.
1. The employer pays for your time, not your expertise or output.
It'd only be stealing from the company if the company cares about hours worked over output. If we explore this concept in a theoretical sense it's clear that it doesn't hold up.
You have two candidates
One candidate has 20+ years of experience doing the exact thing you want them doing. This candidate says they'll work for you for $100k/yr, and they'll work 10 hours a week, complete all the relevant tasks, and very very rarely cause catastrophic errors or user-impacting bugs.
The other candidate is fresh out of school and says they'll work 50 hours a week. They'll complete the same amount of work as the first candidate, but they'll write more bugs, there will be more planning mistakes causing feature delays, and there's a reasonable chance of catastrophic failure due to debugging-in-prod shenanigans. They are also asking for $100k/yr.
Which candidate is better? Under the assumption that employers pay for _time_, the second candidate is better, but I'd argue most companies should prefer the first candidate.
2. More hours worked produces more or higher quality output.
There's a reasonable amount of research and practical anecdotes that disputes this recently (see companies that have gone to 32-hour 4-day workweeks with no reduction in productivity). Enough that at least, this point is seriously in doubt.
3. Twitter maybe pays a senior engineer $400k/yr under the expectation of their output for 40 hours, and if they get less it wasn't a fair deal.
This is a reasonable take, but Twitter (like most for-profit companies) theoretically has a performance evaluation system, managers, deadlines, etc. They're paying an engineer some amount of money for some amount of output. If that engineer produces that amount of output, Twitter is happy, the engineer is happy, there's no issue. If the engineer working 20 hours or less per week caused them to not meet their goals, then Twitter has the right to fire that employee. They don't, so that implies that they're happy with the arrangement.
4. The employee's salary is equal to their expected output/profit
If the Labor Theory of Value is correct, then companies derive their profits almost exclusively from the labor of their employees.
In order for an employee to "steal" from a company by under-producing work, they would have to earn more in salary + benefits than they earn the company from their work.
This is necessarily not the case (on average) in a for-profit company, because if the company makes a profit and uses those profits to grow or to issue dividends to shareholders, they have earned "_surplus value_" from the employees' labor (on average).
---
In any case, it's not necessarily true that you're wrong, but your comment was fairly dismissive and confrontational. There are a _lot_ of cultural and individual assumptions baked into how we exchange salary/wages for labor, and it's worth examining those before firing off moral judgments at one another for not working hard enough or working too much or whatever.
A lot of what you are saying is theoretical but doesn't hold up in the actual work place (at least in the US).
1. In the US at least we know a majority of the workers are hourly. (https://www.forbes.com/sites/johncaplan/2021/03/12/americas-...) So I would absolutely argue those people are paid for their time. I understand the scenario you laid out, but in a big corporation I do not think they would look at it like you are. They would simple look at who is costing the most. Both are costing 100k/yr so compare their output. Anytime their are massive layoffs, lots of important people are let go. This is because often times the people doing the firing, do not know the employees. They are simply lines on a spreadsheet to them. So in your situation above after a mass firing often times the candidate fresh out of school will be the one left.
2. I personally completely agree with you here. From my experience there is absolutely a burn out point. However major corporations do not see it this way at all. They absolutely believe throwing hours at problems brings about solutions. It doesn't matter to them if its making existing employees work 4 extra hours a day or hiring a new employee. It matters which is cheaper in the end. If the position is salaried its cheaper to have existing employees work more. Look at what musk is asking at twitter. If it is hourly, often times its easier to part ways with the burned out workers and hiring new ones. Look at amazons turn over rates.
3. I agree that Twitter gets to decide the expected compensation for work. But musk now owns twitter. He gets to decided what twitter does and doesn't expect from its employees. He made it clear he is very unhappy with how the arrangement was, and what he expects in the future.
4. I would disagree with you here. An employee can steal from a company many ways. For example, an employee could steal the source code from some twitter service that is considered a company secret. In this case, Twitter is paying an employee x amount for y work. Musk has decided what y work is. If an employee decides to not fulfill y work, and still take there whole pay, how is that not a form of stealing? If they're hourly employees they call it timesheet fraud. The idea that an employee can only steal if the are paid more than they bring in is pretty interesting but I think we would be hard pressed to find a single major corporation that views it that way. I think close to 100% of them would go after an employee spending 50% of their time working on non work related tasks.
A vast majority of the assumptions are based on them being the norm in corporate America which twitter is apart of.
> In the US at least we know a majority of the workers are hourly.
We're talking about IT employees. Unless someone was a contractor, most IT employees (white collar workers in general) are exempt, and not tracked or paid hourly.
I guess my question is - what are they doing for the rest of the week? Are they straight up just not doing anything, or just not writing code? I honestly don't think I could be writing code 40 hours a week, but I think that the code I do write when I'm writing it is the better for my taking the rest or zone out time. I also don't know if my work would be of a decent quality if I was suddenly asked to be typing all 40 hours a week.
Cannot speak to all of them but I know a couple work on sides personal projects the rest of their time. Doesn't matter if they are in the office or at home.
No one is asking them to write code 40 hours a week. That is not realistic in coding. Most developers' jobs include a lot more than coding. They are saying they spend 15-20 hour a week on all job related work. Whether that is coding, code review, engineering, testing, research, documentation, meetings ect.
maybe the thinking is tear it down to just bare bones and life support and then rebuild? Although, and I say this as a semi-fan, it's really exhausting trying to understand Musk heh.
Company soft killed the product, everybody left, they didn't hire anyone to replace. We went from 20 engs to 3. Worst codebase ever made by ex FANGs hotshots who thought they understood something about system architecture. Very "clever" and complicated. Data consistency issues happening everyday, likely due to misuse of messagging queues. Chargebacks being ignored and mailed in physical letters every month. A couple of millions going through the platform every year.
My task was to run a team of mostly juniors maintaining and adding features to that mess.
I had no clue what that codebase was doing. We just left things as they were, fixing fires as they came. Nothing too bad happened.
Slowly built a leaner replacement for some components. We simplified things over time and we even rebuilt some of the knowledge of the old platform, which helped with the daily outages.
The issues started happening not as often. Eventually. I moved on from that company, removing again a big chunk of knowledge. Over time I've heard tales of other people coming in and rebuilding that knowledge, over and over.
> Worst codebase ever made by ex FANGs hotshots who thought they understood something about system architecture. Very "clever" and complicated.
I call these sort of folks (very) smart juniors. Probably can whip leetcode or whiteboard tests like few others (after some preparation). Then you let them roam wild on your product, because they're of course experts. Then some period passes, and/or they leave, and you can only cry. Complex hard-to-grok approaches to simple problems that have tons of caveats/edge cases, no documentation of work done and why it was done as it was, because they are oh-so-cool and such lowly tasks are for peasants.
Either they have no clue how long term sustainable company looks like or they don't care, in any case not a good fit.
These days I go in opposite direction as probably most here - FAANG type of company (or cargo-culted startup) is a big fat warning sign when hiring. Unless I would be working at some wannabe another FAANG startup, which I am not, I would sure as hell make sure the person can actually deliver long term improvements for everybody and not just upp their CV with another bleeding edge technology and move, leaving more damage than added value and making everything worse.
> If Twitter went another month without an outage, how would you adjust your opinion? How about a year?
It's a tricky one, because on one hand it increase my trust that their system was built robustly, but at the same time the passage of time would increase the chance of unseen/unaddressed "wear an tear" (bot figurative and literal) that might be going unaddressed, or under-addressed. But we have no view into that.
We won't really know until they suffer a major problem whether or not they have enough staff yet to keep sufficient maintenance going that such an event doesn't cascade into something much worse and/or whether or not they will be able to recover from it in a reasonable amount of time.
Horrible systems can survive, but often they survive through sheer luck.
Beyond merely surviving, Twitter has to compete with other companies and enhance their product over time.
What happens if someone like Bytedance decides to launch a Twitter clone that does everything better?
Meta takes a lot of criticism for being Meta but you have to hand it to them, they've copied every other innovative social media feature that has shown up in competitors' products.
Does Twitter have enough staff to launch a major new feature?
The problem is that Twitter is now saddled with debt. It probably can't make investments in new products, and we can speculate whether it has the necessary headcount in the sales staff needed to maintain advertiser relationships that pay 90% of its revenue. [1, 2] Enterprise clients usually expect a dedicated CSM relationship.
That's exactly what happened to Toys R Us, which slid into liquidation due to the debt saddled on it by its private equity firm rather than a fundamental issue with its business.
In 2021, Twitter had a net loss of $221.4 million, and in 2020 they lost over $1 billion. Revenue was around $5 billion in 2021, about 90% from advertisers. [1]
The company (not Elon) is now loaded with $13 billion in debt. [2]
> Last year, Twitter’s interest expense was about $50 million. With the new debt taken on in the deal, that will now balloon to about $1 billion a year. Yet the company’s operations last year generated about $630 million in cash flow to meet its financial obligations.
>
> That means that Twitter is generating less money per year than what it owes its lenders. The company also does not appear to have a lot of extra cash on hand. While it had about $6 billion in cash before Mr. Musk’s buyout, a large portion of that probably went into the cost of closing the acquisition.
>
> That gives Mr. Musk little wiggle room, Mr. Pascarella said. “They are essentially going to take all the financial resources of the company and just pour it into servicing the debt,” he said.
I personally don't see how Twitter gets out of this without bankruptcy. I think Twitter can go the next 5 years without an infrastructure outage and still likely ends up bankrupt.
ByteDance I could see, but Meta and Google are hamstrung by the innovator’s dilemma to a degree: Both companies have the money and the technical smarts to pull it off, but their management structures would wreck it immediately because it poses a threat to other internal empires.
My loose guesstimate is that Elon has cut $2-3bn in expenses so far. I don't know the revenue hit yet, but $1bn in interest expense with $2.6bn in cash flows seems entirely sustainable
I think the income statement shows that it would be difficult for Twitter to find $2-3 billion of expenses to cut and still remain a competitive, functional company. You're talking about cutting total expenses by over 50%, including expenses that are not salary.
2021 total expenses of $4.8 billion on a revenue of $5 billion. $1.8 billion cost of revenue, $3 billion operating expense. Out of the operating expense, $1.175 billion is in Selling & Marketing Expense. $1.25 billion in R&D.
Let's say Twitter cuts R&D to $0, that's a $1.25 billion savings. How long can a social media company remain competitive putting $0 into R&D? Are there any examples of any software-adjacent company surviving that spends $0 on R&D?
If they cut SG&A, that will impact revenue negatively. Activities like marketing and sales have an ROI. You spend money to make money. Twitter was spending $600 million a year on SG&A in 2014 when they had less than half their current level of daily active users (DAUs). Twitter has a product, its daily active users. If they can't sell that product to its customer (advertisers) because it doesn't have enough sales staff to physically make the required phone calls (yes, they do that sort of thing with large advertisers), they risk entering the death spiral.
Cost of Revenue: I don't think this line item can be cut beyond a certain level. This is the direct cost of delivering the product. Anything that isn't employee salary can't be cut very easily. Twitter can't turn off servers and sell off data centers without impacting the product.
Twitter had more than 7,500 employees in 2021.
If they cut 80% of staff, that's 5,000 employees gone (and don't forget that those cut employees will still count as 1/4 of an employee for the next year due to the severance payment).
If each employee represents $300,000 in total expense (a very generous estimate), that's only $1.5 billion in savings, and for the first year the savings is only $825 million due to the severance payments.
Okay, maybe Twitter can raise revenue with Twitter Blue. They'll need to pick up a smidge over 15 million paid Verified users in order to cover the $1 billion interest expense. Twitter currently has 400,000 Verified users, ~240 million DAUs, so they need 6.25% of their global user base to purchase a subscription at $8/month. Out of Twitter's DAUs, you'll need to omit most users from countries that won't generally pay $8/month, like India, Brazil, and Indonesia. [3]
For reference, Spotify costs $1.46/month in India.
We could make a rough guess at this by assuming that the 155 million people in the US, Japan, and UK will pay $8/month for Twitter Blue, so you'd need close to 10% of the affluent user base paying for Twitter Blue.
For a benchmark, Discord makes almost all of its revenue from Nitro subscribers. In 2020 it had 14 million DAUs, with $130 million in annual revenue, which means with the $99/annual fee Discord had 1.3 million paid subscribers. In other words, about 10% of Discord users are paid subscribers. [1]
If Twitter can get the same subscriber rate with Blue, it'll reach $1 billion they need to pay off their loan interest, assuming Twitter Blue incurs no additional cost.
The problem is, it can't do that without that pesky R&D that we were talking about earlier. Discord is built around the concept that Nitro offers tangible benefits to paid membership. What features can Twitter build with a skeleton crew that will convince 10% of their users to pay for the service? Twitter Blue currently represents very basic functionality. [4]
I haven't even talked about the fact that I'm just talking interest payments for the debt, and the fact that Twitter wasn't profitable to begin with!
> Okay, maybe Twitter can raise revenue with Twitter Blue.
This made me laugh. Twitter Blue will have to be great for people to pay. Out of all your numbers, I wonder how many of those 155M users in the US actually post anything that would find value in Blue? My guess is that we're seeing Pareto in action where 80% of the Twitter content comes from 20% of the users.
I heard an interesting take today from someone in the social space. Basically as soon as they heard the price paid for Twitter, they knew it was over. It's just too much debt to both get out from under and move the company in the right direction.
I've started to come around that maybe it is that simple. Musk saw the debt payment and just cut the most easily cuttable expense in the short term - people. Bankruptcy seems inevitable.
I 100% agree with you. I don't even know how seriously Musk can try to avoid bankruptcy.
The leaked company communications are damning. Here's your brand new owner, richest guy in the world, implying the company might not survive another year or two, for a company that's been public, mature, and stable for years. It's completely insane.
Like I alluded to my comment, Discord's revenue story is essentially a best-case scenario type of benchmark. It's very unlikely that Twitter or anyone else in the social media space can replicate Discord's paid user share. Discord has been designing their product around a specific niche and optimizing for non-advertising revenue since inception. Twitter is designed for advertisers. It was never intended to be a paid product. They'll be lucky if 1% of their users bite.
I told you what I think he cut. I also said I don't know what that does to their revenues, but you don't either.
If you buy the narrative that 80% of employees were bloat and you can cut them without impacting top line, then it will work. If you think he's gutted essentially activities that will sink the ship, then it won't
> Twitter can't turn off servers and sell off data centers without impacting the product.
A friend of mine tried to tell me that Elon had some type of new computing technology that would let them turn off most of their servers because they could fit all of the tweets in less space.
He blocked me after I told him he had confused Elon Musk with gzip.
I think your friend gets at why people think Twitters IT costs can be cut though (ignoring his idea about compression): They think Twitters product is mostly the delivery of tweets.
But's not. Twitters product is ad inventory and their ad placement platform, and the engine driving engagement that boosts their ad inventory by reordering the timeline to keep people scrolling. Those are the hard parts.
"Just" delivering and storing tweets is easy. If you ignore the nasty business of the moderation. And most people have never even visited analytics.twitter.com and seen how much data is available to them about their own tweets, much less looked at ads.twitter.com and seen how precisely they can be targeted, and the precision Twitter offers in what kind of things you can pay for (engagement, follows, media views, clicks). And they've certainly not tried running ad campaigns in those categories, and seen how good Twitter are at showing your ads mostly to people doing what you're paying for.
Musk's actions makes me wonder how well he understood the complexity of Twitter too. Surely he must have looked at those other aspects of Twitter before he made his bid.
> If Twitter went another month without an outage, how would you adjust your opinion? How about a year?
I think it's hard to just use time as a measure. If there are no security issues and they add no features, then things should run fine. Which, ironically points to how solid the team was that was fired.
Of course if Twitter not only limps along, but thrives in this new setup then I'll definitely change my opinion. Being in the US, this might end up the case while Twitter for the rest of the world falls apart.
Keep in mind, that I think Twitter was bloated and needed a big shakeup. Randomly dumping people and those who tried to correct me is not the heuristic I would have used.
With that said, Twitter still has 2 huge problems. No vision and saddled with an enormous amount of debt. Right now, Musk is taking the PE approach to cut and milk what's there. The problem is, there isn't much to milk.
yes, maybe a car without mechanics rather than without a driver. drivers can be really bad, too. so bad that they forget to steer or steer badly, etc. but the steering wheel is categorically the interface made for non-engineer usage.
> I've likened firing most of the engineers to taking your hands off the wheel in the car. It won't crash immediately, but it doesn't mean the car can go driverless.
This is an excellently apt analogy, in light of Twitter's new owner.
That's what people keep telling themselves. Truth is, everybody is replaceable. And usually, loss of institutional knowledge takes a while to show real effects.
Having been on the side of thinking I was absolutely irreplaceable on a team, I was more or less proven right when I left and the team failed to deliver anything at all for about 10 months, even having to shut down existing products, and is now limping along.
The truth in retrospect is that it was my fault (and my upper leadership's) that I wasn't replaceable. I created a knowledge silo around myself since I wanted to move fast and figured I could prevent the team from being bogged down in complexity if I just handled it myself and while that worked in regards to delivering out-sized results for the available bandwidth, it also was a risk that materialized as described above. So while I do believe that everyone should be replaceable and it's their responsibility to be, it's not always the case and products can live and die by it.
Saying everyone is replaceable is a first-degree approximation. The next level of detail would talk about, at least: (a) how much knowledge gets lost; (b) how many person hours -and- "wall clock time" it takes to rebuild that knowledge; (c) the amortized cost (or benefit) of losing that person
It depends on how well or poorly run the business is.
I worked at a company where everything hinged solely on one guy working from another country. When he left, loss of institutional knowledge took about three days to show real effects as things also came down crashing.
I worked hard to make _myself_ replaceable for when I left, it was a pretty good exercise, but me having that degree of freedom was symptomatic of the problems of the company.
how can you possibly say everyone is replaceable while next giving an example of a case where it's not only not true but potentially so invisible in its consequences as to make it hard or impossible to satisfactorily replace someone? a "person" can be rebuilt but not replaced. companies and societies aren't actually bose-einstein condensates. everyone is replaceable is just what different types of people say as a coping mechanism. it's a disgusting thing to promulgate too. maybe if someone is equivalent to a cog they can be replaced but humans are not exactly standardized cogs. what you mean to say, and may get in the habit of repeating instead, is that we don't necessarily need a specific person even though we are accustomed to them. it still might take you 10 years to find someone who does 80% of the same things as the other person while the missing 20% is what made that person such a unique hire in the first place. i find it a disgusting phrase that is so typical of the view of a society like ours today that is losing view of the value of life.
The dangerous piece of this is executives who don't have a solid grasp of how things are operating can assume that a product still running after the departures is still successful. Validates their decision making even as things deteriorate in the background.
That said you can replace people and build back that institutional knowledge -- both loss and gain take significant amount of time.
There are fairly widespread reports of technical issues. I’ve seen a few about 2FA failing, strange undefined behavior in the ad serving platform, and others.
Of course it could go either way but the jury is currently out. It’s entirely possible that severe company-impairing technical breakdowns are already in progress and unrecoverable.
I'm still on the fence. As an engineering manager, I tend to attach faces to those "jobs". So seeing cuts, I imagine a ton of people that had to go home and tell their friends/family/etc that they no longer had a job.
On the other hand. As an engineer, we tend to attach way too much self importance to our roles. Like if we're not there entering the "numbers" 4 6 15 16 24 32 every 108 minutes, the entire business is going to crumble. So... this is one I'm going to watch with a keen eye.
One need look no further than the O'reilly book on Microservices. First chapter I believe. We call ourselves "engineers" but the truth is, real engineers have real consequences when their designs fail. (buildings fall, bridges fall, etc). It's not a unique take. Now I'm not going to argue about engineering vs developing vs programming as that opens a massive can of worms.
>> Like if we're not there entering the "numbers" 4 6 15 16 24 32 every 108 minutes, the entire business is going to crumbl
> Never I have encountered an engineer that thought that.
there are people all up and down this page saying effectively just that. well i guess i'm assuming most of these people are engineers in the software sense of the word.
It's great to hear your concern with the actual people. That tends to get lost for some reason. And I agree we tend to attach too much importance to our roles, but the flip side bears a certain amount of truth, though on a timeline more like 108 days than 108 minutes.
I definitely walk a fine line of wanting people to know how much I value them and how valuable they are to the business as a whole. But, the harsh truth is, every single one of us is replaceable. I've just seen that play out too many times in my career. People with all kinds of domain tribal knowledge walk out the door. Everyone holds their breath as if it'll be the end. And a month or so later, we're still afloat. (I still hate to see good people leave)
Nowhere do I assert that people do not have the desire to accomplish things. Perhaps you're misreading my statement about "believing the building will fall down without us" incorrectly?
Yes. Your team members care about what they do, and think it's important. You don't think it's important and you view them as replaceable. Your intent is quite clear.
1. You're wrong. You have no idea the value I place on both my direct reports and other engineers at my current and previous companies.
2. I should have looked at your comment history before my first response. I really would like to assume good intent. I don't think I'm going to find it.
1. True, the only information I have about your perspective is what you've shared here.
2. I'm not sure to what you're referring?
When a manager comes on here and says "these workers think they're important but they're just shuffling numbers around" that deserves to be called out.
> The job cuts are clearly justified because of the extremely toxic "work culture"
I'm very curious in what way the stuff Elon is doing is not creating toxic work culture? >Minimum< 40 hour work week. 100% office work. Working into the night. Being fired for pushing back.
I'm not denying there may have been toxic work culture before, but I don't see how replacing it with another form of toxic work culture is to be celebrated.
Also, writing this:
> There is no active independent thought around this subject to be found. Only vitriol and bile.
And then spewing a bunch of vitriol and bile really distracts from any point you were trying to make:
> akin to witnessing a fat child throwing an epic tantrum laying flat on a mall floor
> Straight up Cartmenesque.
> clearly justified because of the extremely toxic "work culture" / cult
Up until recently 100% office work was the norm. It still is for most people. Many employers are quite adamant about it, not just Elon.
40 hours per week is also known as the standard work week. A bog standard j-o-b. Those are not reasonable examples of any sort of toxicity.
If these painfully regular employment terms are not competitive with other offers nobody is forced to pick twitter as their employer.
> but I don't see how replacing it with another form of toxic work culture is to be celebrated.
Walk on by. Find a company more suited to your temperament that will give you the pay and perks you think you deserve.
> And then spewing a bunch of vitriol and bile really distracts from any point you were trying to make
> because mommy didn't buy them an xbox
I assure you the child was quite literally stating xbox as the reason for his tantrum. Loudly. Whilst withering on the floor. I'm simply giving an eye witness account.
Toxic to me is joining a company under certain terms of employment only for some billionaire to come in and drastically change those working conditions for the worse. I frankly don't give a shit if others have it worse, there's always others that have it worse which is why I didn't take a job at somewhere worse.
You are basically describing every single takeover. When a company changes ownership it is unusual if there isn't a drastic shake up of things including working conditions. I don't understand why this one is a such a big deal other than the employees are throwing a temper tantrum. This type of thing is happening all the time all across the country yet people are discussing it like this is some new huge thing.
If as an employee you do not like the shake up, you always have the right to seek work other places. If you are indispensable (lottery/bus test) you may have some negotiating power but most of the time when up against a big corporation you are just SOL.
Every takeover? That's funny because when my employer was bought by a bigger company they increased benefits to ensure that they wouldn't lose their best employees.
yeah i think that's a perfectly reasonable view to have. If the job changes to something other than what you signed up for and you're unhappy then I feel like you have an obligation to yourself to find something else.
The 20-hour-full-time-from-home-six-figure tech job wasn't a new paradigm or a shift in c-suite work/life balance philosophy. It was 0% interest rates coupled with asset inflation leading to tech companies having more cash than they could possibly figure out how to spend. A transient financial anomaly in a whirlwind of larger economic conditions.
Now that the fed has been (rightfully)pulling the rug, people shouldn't be surprised if tech work goes back to the 60/70 hour grinds it was infamous for.
The projections were the work of Alan Marling, a Bay Area activist who declined to give his age or comment on his profession. He stood nearby — donning a Captain America face mask and a Sunrise Movement beanie — as passersby, many of whom were tech workers getting off work from nearby offices, stopped to take videos and photos.
> If the current state of affairs at Twitter keeps up, it'll probably be a slow descent into chaos.
It's not like Twitter was bug free before. How many times it annoyingly refreshed the timeline while I was reading something, or when it shows notification that it failed to send the DM, and when you retry it says "you've already wrote this", or you open the reply dialog, but it freezes, has no send button at all, so you have to re-open it. All of this was happening to me pretty regularly long before Elon came along.
As we all know, just hiring more people is not necessarily the solution to every problem, and to me it seems it was exactly what Twitter tried to do in the past. Now they deconstructed it to the bare bones, which will clearly show what are the core problems and requirements. They basically turned Twitter back into a startup. And from that new starting point they can hire again to cover the needs as they arise. If they succeed it will be a huge success as they'll end up with far more optimal team (and huge savings), and of course, if they fail to catch up with problems it will be a huge failure. We'll see how well Musk can manage it...
You're right on this potentially being the worse purchase of all time. The purchase price of 44 Billion was overvalued. If the only thing Twitter is supposed to do is to move tweets around then there was a ton of bloat of developers with a massively bloated purchase price. If Twitter had quality R&D products that could compete with TikTok, etc., then there may not have been a bloated workforce nor bloated purchase price.
Twitter had 7,500 employees. most of the roles you mention (security engineers, code reviews, data architecture, HR, internal audit teams, content moderators, scrum master) are not bloat. So the question is what are the other 7000 people doing?
I worked for a bank with 28000 total engineers. I know because part of what i did was crunching github and gitlab data and detecting double accounts (and deleting account from people not there anymore). I'd say 3/4th of them were just cost of doing business. Including me (my role wasn't really necessary, the stuff i did was cool and everything, but a bit useless).
At this scale, having someone take care of deleting double/expired accounts is just good hygiene and I would not consider that useless. Forgotten accounts are a security risk.
forgotten account that a former employee can re-activate or login with? I would think banks would be fined by regulators or risk their license over even a single instance of that.
Any off-boarding procedure fails - at that scale, if even one in one thousand fails, you can expect a couple of failures per year. ensuring these get caught is part of "solid offboarding".
That's not an excuse. If a bank can make sure that they do not mistakenly just deposit free money into people's accounts by mistake, they can ensure that a person who has left (possibly on bad terms) does not keep access to critical systems.
That's an entity that specializes in RISK MANAGEMENT.
There was a Twitter thread (ironically) on the front page recently where a Reddit founder explained why anyone will always suck at it. I recommend looking it up.
You need to pay for content moderates to prevent lawsuits. Lawsuits are very expensive but less so when you say we had thousands of people working on it.
It's just evidence that a thousand moderators (plus many, many engineers for automated moderation systems to reduce the load) is still nowhere near enough to not suck at moderation on Twitter scale.
the power law applies to any big organization. 20% of the people do 80% of the work, whilst 80% of the people are just there for "support".
whatsapp was run by a team of like 20 people or something when they got acquired for $20 billion. for a simple software product, you don't really need that many people. in fact, more people often means bad software. you just need a small group of very talented engineers to run the product and add new features when necessary.
big (and especially public) companies often times need to hire a lot, just to look like a real company.
now that twitter is private, elon has no responsibility to public investors and can focus less on looking like a real company and more on doing what needs to be done to cut bloat/costs and improve product
I've heard stories of teams that were more or less dedicated to nothing useful except policing the company, product, and users for woke violations. All these "affinity groups" to ensure voices were "being heard" when decisions were being made or some garbage like that. In essence, at some point there was a takeover and senior leadership either agreed with it or simply didn't want to get into trouble like Google did with these things, so they just went along with it all.
it showed up over night and now the phrase adult day care is everywhere. Where do y'all get your talking points from? Is it a rss feed I haven't heard about?
I'm a parent who has had children in day care for the last four years. The parallels between daycare for our toddlers and what many startups and big tech companies have provided and allowed to transpire are very clear to me and many of my peers. As a sibling comment stated, we've been using the phrase within our circles for many years. Those circles? Private group chats and friend groups. Mostly due to the climate in tech of mob mentality, groupthink, and the reaction against thoughts that run afoul of the orthodoxy of the week.
Your choice of wording suggests that you find the phrase or concept offensive, which would be understandable. But attacking the phrase as a talking point is shallow.
On the Internet, a new or rare term can suddenly start spreading like wildfire if it (a) is applicable to the topic everyone is suddenly talking about, and (b) perfectly captures something that people already recognise. It’s just life. It doesn’t mean everyone is an idiot or wrong or whatever. It’s just a word that works in the situation.
Yep, I understand how virality works. But virality works through networks. So when a word starts you can understand stuff about it through which network it originates from.
This can be something as silly as fan groups of a certain tv show, or as dangerous as propaganda from a foreign nation.
Adult day care has been used to describe tech companies for well over a decade in my circles. I usually saw it associated with start-ups in the past, but due to the recent trend of influencer-driven marketing videos (e.g. https://www.youtube.com/watch?v=AgaZsbXeddY but there are much more egregious examples) for large tech companies it has seen a resurgence.
what circles are those? I cannot seem to find any references to the term beyond this year.
> for large tech companies it has seen a resurgence.
it seems more like a bad attempt at insulting people who are trying to make work less shit. i will save my opinion at whether those attempts are more or less useful, but seems entirely another dumb culture war
I'm sorry, but allowing employees to put in a couple of barely-productive hours so THEIR work is less shit is utterly insulting to the people who strive to make companies function. Work is work -- if it wasn't, then it would be called something different. I would perhaps feel otherwise if compensation took this and a multitude of other intangible factors into account, but it doesn't so the derogative sticks for me at least.
I mean some jobs requiere more or less training. No one is a competent product manager at 22, so of course that girl is going through orientation and training and not doing 8 hours on her first day.
My senior PM, who has experience in several big companies, and was well aware of our team stll took a couple weeks to get up to speed with how everything worked etc.
I don't think her onboarding day counts to call Google an adult day care. And seeing things like free lunch and a treadmill desks are just tools companies like google use to keep you longer in the office and maximise your time not things to take care of you. Its more work, but mango flavoured.
Work is work, but the same way we think of weekends as normal, or 8 hours as a normal amount of time we could think about 4 days, or having a gym next to our work as work. Being sad at work isn't a feature in my opinion
https://twitter.com/anothercohen/status/1584636815281033216?... is a better example. It's not real, since this is a very poorly executed marketing exercise where they contracted an influencer-type, but it and videos like it are still driving the recent influx of "adult day care." Despite not being real, it's real-adjacent enough to not come across as satire. I have certainly witnessed plenty of people spend their days like this in tech companies.
Also note that I never personally said Google is adult day care, I was simply pointing to a culprit in the trend.
> still driving the recent influx of "adult day care."
perhaps we jsut have different definitions of adult day care. But that girl had a coffee and some snacks in the office. Which seems entirely reasonable. Being able to work from the rooftop might be a bit "fancier" but tbh I have seen dudes take work calls in rooftop bars for years and no one calls Morgan Stanley an adult day care.
I thought the adult day care was more related companies having talking sticks and mental health days. Which sure some do, and I can find it silly sometimes, but who knows how dysfunctional that team was and whether that works for them. I am sure if I explained to my parents what Agile is, they would also laugh in my face. Our work can look silly to others but work for us.
She didnt show any of her work but i dont think a video of her adjusting roadmaps and having a call with the stake holder of the API team their deadline is blocked by would really make for good instagram content.
I have a friend who was a mechanic, and he used to always do a drive around the block when a hard job (like an engine change) was done to make sure it was all alright. He would tell you about driving a nice car etc, but I didnt just imagine he didn't work the rest of the day, that was just the part he shared. Same reasoning with that PM. (Funny that both of them were PM thought).
idk the whole "adult day care" seems needlessly antagonistic for how people make work be less garbage. Specially holding against them things like free food which are "perks" only offered to keep people working longer
One problem with a humorous (yes, I am not all serious all the time) comment like this is while there could be some degree truth to it -- provide a source if so, please -- it smacks of exaggeration.
I think Twitter was stuck in a not-especially profitable niche. They shift into fast-mode to get out of it and find a better spot, then they can shift back into stable mode once they occupy a better equilibrium.
That said, there are lots of bugs in Twitter now, today, when they presumably had the benefit of being in stable mode for a long time. For example, Twitter regularly refreshes and loads new tweets while I'm reading them, pushing the tweet I was in the middle of reading out of view. That seems like a pretty silly bug to exist in a mature product. I regularly reach a state where I have to kill the app and relaunch it because all of the "back" commands just minimize the app instead of taking me back to the timeline. I could go on.
They replaced their somewhat productive engineering workforce with completely unproductive interest payments. I’m not convinced that this will lead to a better spot.
But regarding the bugs, I’m totally with you. Same here. I use Twitter only in the browser. Browse long enough and the page reloads as if it ran out of memory.
Yeah, but there's always the option of showing a "fallback" when loading and inserting content in the middle of a feed - one link at the top "go to last bottom of top" and one at the bottom "go to last top of bottom".
Have you implemented a system which stores hundreds of billions of pieces of media content and makes different slices of them immediately available to hundreds of millions of users?
It's bloat to devs who just want to write code. You're kinda confirming the point I was trying to make. Anyone on that list performs an important role at a company of their scale.
I'm currently the first and sole architect on a product that was built by only devs. I really know why I exist.
No but the people who made clickhouse did. I don’t think 1000s of engineers were required for that.
Where’s the 1000s of engineers for Postgres? Most stuff that works is made by a handful of people. Look at io_uring it’s basically one guy at Facebook…
There are tens of thousands of engineers maintaining and supporting postgres and clickhouse in all the organizations where they are used, the vast majority of which are not on Twitter's scale.
Tens of thousands across all the organizations where postgres and clickhouse are used, maybe (though even then probably not!).
Not tens of thousands at any one organization supporting postgres and clickhouse.
No single organization needs even hundreds of people to support these apps. You just need a good architecture and a handful of dba's, developers, and sysadmins... maybe.. depending on your scale. At many smaller orgs you can probably get away with one.
This 100%.
You don't need 10'000 engineers to run billions of queries.
It's not like the SREs are running the queries themselves.
Your job as an SRE is to design and run computer systems, not to become the computing platform yourself.
I am entirely missing how Clickhouse solves the problem Twitter is solving out of the box, do you care to explain? Sub-question: how does Clickhouse provide monetization for a product?
You focused mostly on additive bloat, there's also multiplicative bloat in the form of multiple teams focused on building separate versions of the same product to increase likelihood of success and empire building where leaders don't actually have a remit large enough to support the team size they have, but they have woven a narrative that defends the necessity nonetheless. Put everything together and teams are very easily 6x+ larger than they absolutely need to be to get a product into market.
> I am convinced you can run the entire tech stack with a team of a 100 people
Please tell us in detail about the Twitter stack.
Because I always find it fascinating how people think they can estimate the effort to maintain it whilst having next to no understanding what so ever of the tech stack.
I assume the reason for statements like that goes something like:
1. A single person can run a mastodon instance in their spare time. Spinning up some containers for the app, a background worker and a database is quite simple.
2. Modern devops tooling makes it fairly trivial to spin up 10k instances of a container instead of 1, by just altering a number in a k8s manifest somewhere.
3. Ergo, a single person equipped with modern tooling (and sufficient funding) could spin up any number of mastodon instances.
4. Twitter is just a big mastodon instance.
5. Now that keeping everything up is sorted, add another 99 devs for feature development and you are done.
Now this is obviously faulty logic because points 3 and 4 are very false, but they look reasonable enough at first glance.
That is a straw man. There are other perfectly valid interpretations and distributions of a statement that 100 could maintain it, like this:
15 database admins
10 linux sys admins
5 kubernetes specialists
10 windows tech support
25 front end developers
15 back end developers w/ Scala
10 machine learning experts
Whether that makeup could or couldn't do it is a different question, or whether it would be a different mix; all of that is up for debate, but the 1/99 ratio is just one very specific, extreme, and laughable mix for anyone who has supported a system of any real size.
How is that valid for creating and maintaining Twitter? 15 backend engineers are not even sufficient for any single facet of Twitter. Plus, there are other initiaves that Twitter takes on that are probably outside the scope of what you consider Twitter. Take a look at just their open source initiatives: https://opensource.twitter.dev/projects/. Go to, e.g., Finagle's Github page and look at the number of contributors and commits. While you may disagree that all of this is essential for creating and maintaining Twitter, there are problems that creep in when your platform supports billions of users that are not solved either correctly or at all by existing libraries and services; plus, open source projects serve as important recruiting collateral for engineers, especially for companies that do not have a set of widely used tools that entrench engineers (e.g. AWS, GC, Azure).
Creating a husk of an app that looks vaguely like Twitter and supports hundreds of users is a weekend project for anyone with a modicum of talent; building a platform that supports billions of users and is monetized well enough to support itself is an entirely different beast.
> 1. A single person can run a mastodon instance in their spare time. Spinning up some containers for the app, a background worker and a database is quite simple.
There are some major pieces missing from your analysis: legacy code and infrastructure, and lack of good documentation.
These can make for a massive hairball of complexity that can swell the number of people needed to support it.
This reminds me of a talk I once saw by a Netflix SRE, who showed a crazy convoluted mess of a diagram with thousands of crisscrossing lines going everywhere, and him screaming "No one understands Netflix!!!"
If you think Netflix is bad, you should see architecture diagrams for one of the major ad platforms; 30+ service sub-systems get encapsulated as a tiny block of the massive diagram. Anyone who thinks complex systems can be built and maintained by a skeleton crew hasn't worked on complex systems or hasn't been exposed to their full scope.
also, iirc a database based architecture was why the failwhale image was made. iirc the threw out databases and went with something more analogous to email with tweets represented as files on file systems. That was was a lonnnng time ago though, they may have reinvented the architecture a dozen times since then.
I don't know why you are being down-voted. Premateralized flat files pushed out to heavy-duty CDNs would get you very, very far. I would think that dealing with CSAM, death-threats, and other things that would get your platform into trouble would be the more difficult problem to manage. I could very easily see the technical parts of Twitter being run with a few hundred people. I don't know what else would be involved for legal, finance, marketing, sales, etc. But I doubt it is 8,000+ people.
I couldn’t give you details, but I do know Instagram had 13 employees when Facebook bought them. Is Twitter really two-orders of magnitude more difficult to run?
Well, let’s see. IG was acquired by FB in April 2012 and had somewhere above 10M users at the time. Around the same time, Twitter had around 140M accounts in US alone, nearly 500M worldwide. Do we want to continue the apples vs oranges comparison further. Happy to keep citing scale differences. :)
Internal sources only, but topics like regionalization, localization, bespoke caching implementations, hardware-level optimizations, content moderation, policy adherence (e.g. GDPR, CCPA), long-term monetization (especially if supporting advertisers as a direct customer), 3P support, public APIs, documentation, SRE (a fledgling product doesn't need 5 9s), analytics (internal and for advertisers and 3P partners) and for the most part, security, are non-exhaustive examples of things you can mostly ignore when your product supports 10M users that are unavoidable when your product supports half of the US, a good chunk of the rest of the world, and other large businesses that consume you at large business scale.
> Well, let’s see. IG was acquired by FB in April 2012 and had somewhere above 10M users at the time. Around the same time, Twitter had around 140M accounts in US alone, nearly 500M worldwide.
So increasing your user base by one order of magnitude requires increasing the number of employees by more than two orders of magnitude? Rate of employee acquisition should probably never outpace rate of user acquisition, so I think that's a pretty clear sign that something was off.
Was instagram selling advertising to giant brands yet when they were bought? Sales staff balloon because they are really good at selling themselves to hiring staff and also because any brand that pulls in more than $10M in revenue expects to be treated like the only king of the world in pretty much every interaction and literally requires handlers, and the number of handlers required scales with the size of the brand.
You can potentially solve the handlers issue by being ruthlessly cutthroat. If they want hand-holding, they can hire a third party to manage advertising on Twitter. They probably already do in fact, so if you are a third-party, do your job and know your tools.
As for sales staff being good at selling themselves, agreed, so maybe Musk's ruthless firing spree will end up as a good thing. Maybe.
>You can potentially solve the handlers issue by being ruthlessly cutthroat
No, that just results in those businesses and brands leaving you, unless you can provide them a LARGE revenue stream that is impossible to get anywhere. A large brand will absolutely give up a little money just to spite you and your company for not treating them like god.
That's true, but also hardware and software hasn't stood still in the past decade.
I'd definitely like to hear more about the scale differences. So far at best, you've accounted for one order of magnitude. How do you explain the second?
In my mind the issue is less “is it conceivable that a small team can run a bare bones version of the Twitter app with hundreds of millions of users” and more “can a small team manage a big distributed system that was designed to be managed by many different teams, with no handoff period”.
The person I responded to was specifically talking about the tech stack. I wholeheartedly agree that the difficult part of running a service like Twitter lies in the soft problems.
But the profitability requieres a big tech team regarding ads and recomendation algos etc.
And moderation requieres humans but also a big tech team regarding bots, known offenders, etc.
it's not exclusively a tech problem but in a tech company tons of those responsabilities will be handled, addressed and solved by product and tech teams. And on 10 people, no matter how smart, they got no chance
Child safety advocates have praised Elon for his quick action in removing inappropiate child photos from Twitter, as well as dealing with the hashtags traffickers use. Something people have been asking for years for and little was being done.
Unless you wish to ask for people to openly post child porn guidance you're gonna have to rely on the people actually following the issue for years. For what it's worth business insider claims they checked said hashtags and the content was gone[0]. Said user is allegedly going to release today an article in a corporate outlet that did fact check it[1] so we'll see.
I have no reason to distrust her and can't see any reason for her to lie about it.
> Musk responded to the tweet saying that the issue is "Priority #1."
Obviously it's "top priority"! Is there any other acceptable answer? This answer means nothing except that he saw the question.
The reason for distrust is that the Q people have been insisting that Trump was secretly engaged in an enormous battle with pedophile rings, and using the typical, constant, and normal arrests of pedophile rings as proof. Now, another claim of a right-wing hero bravely picking up the sword and vanquishing the forces of evil, but again, no evidence! No proof at all. We just have to take her word for it, eh?
Of course, she can't give out the hashtags and allow independent verification, yet somehow hordes of pedophiles already know these hashtags? (How are they publishing those to each other, and if they have such a channel, why aren't they using that instead of Twitter?) So who is she keeping the hashtags from then?
>Obviously it's "top priority"! Is there any other acceptable answer?
the answer means nothing, yes, the supposed good action that triggered the question is the important thing. The only relevant part of the article is them claiming to have verified it, so it's Business Insider's word added to hers.
I'm not even going to touch the conspiracy theory madness, it's irrelevant, and the person in question has zero signs of being afflicted by it, her entire existence in the platform seem to have been focused on actually working against the issue.
Its also been pretty widely publicized twitter's issue with CP[0][1][2][3], and given how hastags are the way you find things in the platform it's natural that's the way they'd do it.
> Of course, she can't give out the hashtags and allow independent verification, yet somehow hordes of pedophiles already know these hashtags?
She is giving it out to journalists and apparently they are confirming it, it's not a tragedy that someone does not want to amplify possible child abuse, actually teaching even more pedophiles how to find it.
> How are they publishing those to each other, and if they have such a channel, why aren't they using that instead of Twitter?
Twitter is protected by its sheer scale, is huge making detection harder than dedicated sites, and is far far more stable, safe and accessible than some darknet website they'd have to run themselves on average.
> So who is she keeping the hashtags from then?
From the general public to not further humilliate the victims, from other pedophiles because its not some hivemind, the list goes on.
Not everything is about Trump, and you're falling for the same level of conspiracy if behind all this out of all this you see pizzagate Qanon.
I honestly don't understand how anyone reaches 2022, given all that we've experienced in the past several years, with enough faith in journalism remaining intact to pay any credence to "X happened, but we won't show you any evidence. Trust us."
For me, such a story is totally meaningless. It conveys no information about reality either way. The chances of truth or falsehood are exactly equal.
We have different epistemologies.
> from other pedophiles because its not some hivemind
Did you think critically about this? If so, for how long?
How are the pedophiles teaching the sooper-seekrit hashtags to each other? Is in the manual "So You've Decided to Become a Pedophile" that they send to new members of the vast conspiracy?
The problem with these conspiracy theories is that any inspection of how they might actually operate, day to day in the real world, is always neglected and handwaved away.
The whole point of a hashtag is that they spread virally, or are obvious terms. The very idea of a secret hashtag is an oxymoron.
When people say this it's because they assume there's not stupid ass politics or moat building.
THEY could probably do it with 100 people, YOU cannot.
100 people is most likely within the ballpark for a group of people whose sole purpose is to write and maintain twitter's tech stack. Unfortunately, that is not NEARLY the sole purpose of most people in businesses and that adds all kinds of productivity hits.
What happens is that people like yourself become convinced that's the only way to operate.
The question is not how many people to maintain an app with the functionality of Twitter, it's how many people to maintain Twitter. Twitter has to maintain Twitter's actual app, not an app that you built to be maintainable by 100 people. You can't say whether that's possible without knowing how Twitter actually works.
Not tech but I'm convinced that just answering LEO requests from all over the world takes at least 500 people. That's without their managers, payrolls, etc.
Likewise, bringing in Ad money would be a few more hundreds, because you need to chase leads in all countries.
Getting the Ads to work? That's tech and I'd be surprised if it was less than 100 people, too.
Tracking ads conversions, targeting, refinement of models, latency reduction , data analytics, ads sales, sales ops, etc. yeah easily 1000 employees all in.
I was only thinking of getting ads to work (from displaying ad previews to announcers to actually delivering them). If you add all the sales, analytics and performance, yeah, that climbs quickly.
For a second I was confused how Starlink requests were relevant for Twitter. Then I realized you meant 'law enforcement officer' not 'low-earth orbit'.
as someone who worked at an SSP, that is nowhere near what is needed for an established platform. There were a handful of engineers(10 if we are being generous including data science) keeping the lights on. The rest of the time was spent trying to prevent "senior" engineers from pushing code attempting to execute a sql query per each of the billions and billions of daily api calls.
Is there a reason to integrating ads into WhatsApp would require more than another 50 people? Twitter ads are certainly do not appear very complicated. The most complicated thing about Twitter is scale, which is why the comparison is made with WhatsApp.
> recommendations,
Does Twitter have recommendations? From what I understand, the front page was actively curated - that is, a human chose stories to put there. I guess you could count the god-awful default feed ordering as "recommendations", but there is nothing advanced about it.
> bots
If WhatsApp doesn't have bots, it's the only social media/chat app I've ever heard of that doesn't. What is needed for this other an an API?
> had to provide tooling for governments, regulators, content moderators etc.
I'm sure at least some of this exists for WhatsApp. Nevertheless, how many additional employees does this have take?
I am not sure why there is so pushback against the idea most companies are overstaffed. For the most part, yes, everyone has "work" to do. But most of the work is fundamentally unproductive. It's this way throughout the economy, but a few tech companies probably do represent extreme cases. I think the best argument for their case is that most of them are very profitable anyway (not Twitter, somehow), and they might as well throw money at thousands of people to do stuff in case one of them accidentally does something that ends up being wildly profitable. I am fairly neutral on the whole thing; I strongly dislike Elon, but I also think Twitter was horrifically mismanaged. While I doubt Twitter will come out better than it is, the idea that firing most of such a large organization would necessarily result in the immediate collapse of a mature product does not say much about the people that were fired.
I'm more sympathetic to the idea that it would get even worse over time, but I don't think there's anything necessary about this. You could focus on resolving longstanding issues while pausing most new work and probably come out perfectly fine.
> Twitter ads are certainly do not appear very complicated.
You see a couple of ads mixed in your feed; behind that there's a big machine selling that space to advertisers and mixing it into the timeline of every user based on whatever profile Twitter has created for you. Then the advertisers want to know how their ads are doing, or they'll stop buying them…and you'll probably need to have salespeople to get them to put money into your ad system in the first place.
> I guess you could count the god-awful default feed ordering as "recommendations", but there is nothing advanced about it.
Just because you don't like the ordering doesn't mean it's not advanced.
> I am not sure why there is so pushback against the idea most companies are overstaffed.
Twitter could be overstaffed. In fact it probably was overstaffed. But it's not overstaffed in the tune of of "it should be 10 people working out of a garage".
> You see a couple of ads mixed in your feed; behind that there's a big machine selling that space to advertisers and mixing it into the timeline of every user based on whatever profile Twitter has created for you. Then the advertisers want to know how their ads are doing, or they'll stop buying them…and you'll probably need to have salespeople to get them to put money into your ad system in the first place.
This is not crazily complex, bleeding-edge tech. This is something fairly well-understood and at any rate done by a lot of teams in a lot of places. (Twitter's ad profiling also seems awful. Maybe I am hard to pin down.) Probably the most complicated part is coming up with data to make advertisers think their campaign is working. (I am extremely skeptical most ad spend is actually worthwhile.)
> Twitter could be overstaffed. In fact it probably was overstaffed. But it's not overstaffed in the tune of of "it should be 10 people working out of a garage".
I agree 10 is too low for anything but bare-bones keep-the-lights-on-this-month maintenance, but it seems likely you could have a great and functional Twitter run by ~200 employees. I've seen more done with less.
Just as one data point that might tell you why you are misinformed - Twitter's AI team frequently publishes at the biggest venues in AI research and do a wealth of machine learning research on the data and processes they have. Some of that is used in advertising, among other things (recommendations, anti-spam, detecting abuse).
There are very few teams doing advertising at the scale of Twitter, saying "done by a lot of teams in a lot of places" is accurate just like "programming is done at a lot of places so why is programming hard".
No doubt you can have big teams doing highly complicated work.
That doesn’t mean your AI system performs better than a simpler one. Or that the system is useful in the first place (recommendations.) I’m not saying they were sitting around twiddling their thumbs. I’m saying the vast majority of Twitter staff were not actually improving the Twitter product noticeably to users. They were doing highly complex, cutting-edge engineering that was make-work.
If Twitter tech was so advanced, why were they losing so much money?
The complexity of your product has nothing to do with whether it is profit making or not. If that was the case, you wouldn't have loss making products in the AI space nor would you have profit making products in the garden shovel space.
Advertising is a hard problem that not many companies have solved at the scale of Twitter, that is what I am trying to get at. There are not too many social media networks out there which have hundreds of millions of users and billions of data points, and it's very misleading to say that work done in such a scenario is "something fairly well-understood and at any rate done by a lot of teams in a lot of places", when literally they're the only ones with Twitter type data outside of a couple of other Chinese social networks.
> The complexity of your product has nothing to do with whether it is profit making or not.
Yes, this is my point. All this incredible AI engineering did not actually make Twitter a better product. They could have just as well not spent the money. The work was ultimately futile for Twitter, even though it might have advanced our understanding of AI and have incredibly practical applications elsewhere. Conventional measures worked fine.
How is revenue the only metric for "scale"? It sounds like you really don't know what you're talking about if when comparing technical complexity, your metric to go to is how much money something makes and not how many user accounts need to be served or the geographical complexity of running a real time view consistent across the globe. By that metric, is Walmart or Saudi Aramco's tech stack more complicated and larger scale than a software company's?
This right here is why you can discount most replies on HN right off the bat. The "I can make software X in a day" posts are 99% bullshit because the posters making them have idea what business reality look like. If their program gained any popularity they'd be in a panic the first time the FBI dumped a warrant in their lap and their full stack developer is now spending a week with the lawyer trying to figure out how to untangle their data while the customers that paid for ads are yelling the metrics API went down 2 days ago.
Someone needs to interface with governments and law enforcements when they request data in criminal investigations. Someone needs to interface with lawmakers when new legislation is passed. Someone needs to handle data privacy requests from Europe. There's a lot of people working on this, or were at least.
Instagram had around 10M users at acquisition too when it was acquired a decade ago. IG has way way more staff now that they have scale. Must we continue to compare Apples and Oranges?
Agreed, there's decent starter comparables in the space.. IG, FB, DC, Goog all have public numbers. I've cobbled these together in the past week talking through with old friends from Goog and others. Please correct!
IG 13 employees at 30m users. Couldn't find # of servers.
FB had 10k servers in 2008 and 100m users, 850 employees.
I believe Doubleclick had ~500-1000 servers for ~10b daily impressions in mid-2000s.
Those numbers are all on circa 2010 hardware, so.. divide by a decade of performance doubling every 2 years (conservatively), or ~5x fewer servers in 2020.
The government takedown stuff, from personal experience, is tiny on the systems side; much more about moderators and expensive legal staffing.
These are very rough estimates, but I've heard 250k servers for Twitter.. that's much more on par with Goog/Amzn/Msft serving clouds at ~1m+ machines. That's a mystery to me.
What's their core business? Losing money on a platform where people can post racist comments?
Or do they have to earn money without getting sued for being used to spread CSAM and being a platform for harassment?
Because the first is very easy with 50 people. Elon can keep sinking money into it and never earn a dime (see Truth Social, they seem to be doing well!). The last is a lot more complicated and requires an ad platform, ad sales, content moderation, documentation writers, support agents, management, scrum masters, SREs, purchasing, et cetera.
IT people are really good at ignoring everything but the tech stack. Like tech is the only thing needed to run a profitable business. (It is... But not to run a 40B valued profitable business... And Twitter wasn't even profitable at all.)
> What's their core business? Losing money on a platform where people can post racist comments?
they've been running the same exact business for years now, if what you say it's true, maybe the answer is yes?
I never heard of Twitter making profit, so maybe their core business was "Losing money on a platform where people can post racist comments"
There was abundance of both, AFAIK, long before Musk
Remember when Dorsey tweeted nazi propaganda and then said his account was compromised? (which if true it means at Twitter they don't know what they are doing, if false, well...)
Remember when journalists wrote articles titled "Twitter is a Nazi haven for the same reason its CEO claims no bias" because Dorsey never actually distanced himself from the worst of the worst the platform hosted? Fearing he would be labeled as "too liberal"?
Remember when he started spamming crypto-bro propaganda?
Remember when a spy from Suadi crown worked at Twitter helping to uncover activists using the so called "free" platform and after he was discovered and reported to authorities, Saudi prince Alwaleed bin Talal bought 4.61% of Twitter shares?
Did the situation got better with more and more employees or it just revolved around banning prominent accounts? (not that I necessarily disagree with the reasons behind it, but if that is the best solution they've found, after years of fine tuning, they could have done it before with much less people involved)
> Or do they have to earn money without getting sued for being used to spread CSAM and being a platform for harassment?
it's easily provable that the 7,500 employees did not improve things on that front.
> IT people are really good at ignoring everything but the tech stack
that's a really odd proposition.
Looks to me that Twitter was in bad shape already, despite thousands of non tech employees, Facebook it's in no good shape either, despite tens of thousands of them, basically the only thing still working as intended in those companies is the tech stack.
I guess the real question you're asking is "why non IT people are so bad at doing their job"?
Not my opinion though, I never said IT over other departments, I simply said nobody is able to explain what makes WhatsApp so special that a hundred people can run it while Twitter requires 75 times that and still doesn't work as well.
Chats have a finite upper limit of participants, some accounts on Twitter have 100m+ followers. Storage is limited to buffer of not yet delivered messages, and avatars/stories.
They are entirely different technological challenges.
Sure, but nobody is upset if he sees twit 1 min later than someone else, since most of the time you can't even tell, a lot of people would bitch about 1 min latency on chats.
So twitter can afford to deliver those tweets with higher maximum latency than WhatsApp.
And it's scaling when you need to keep low latencies, that really kills you, at least in my experience.
WhatsApp and twitter’s latency calculations are on different things.
Twitter’s latency stems from calculating what tweets should show on a given request. Even if you try to show tweets from 1 minute ago, it’s hard to cache that stuff using traditional systems because of the fan out. If an account with 50 million followers tweets, you need to update 50 million timelines. How do you do that quickly?
And you would have to define maximum latency, is it seconds, minutes, hours? because you can’t have the timelines be inconsistent for too long as that leads to some people getting news faster than others.
now you have to deliver them, exactly one time, to each recipient or groups of recipients, through different network topologies, with different challenges and vastly different bandwidth and latency guarantees, in exact order, while also keeping track of who is online e who's not, and distributing that information in real time, only to the edge nodes that should know about it, all of that fully E2E encrypted but stored (indefinitely?) in case the recipient is currently offline and unless that recipient blocked the sender.
let's agree that both companies solve hard problems and that it's not the technical difficulties that make the two companies sizes so different.
right, because WhatsApp is a small company that makes very little money and has virtually no users and it mostly does not work nor scale...
We could all write that in a single weekend, if only we had no family to spend time with.
It's interesting how the prospective shifts when people are told "yeah, that's impressive per se, not very impressive compared to what the others are doing"
It's like discounting Sputnik 1 because the Russians did not employ an army of people selling ads, but just the people necessary to launch a satellite in orbit, which is actually the real achievement.
Anyway, from the news: Nearly 1200 software engineers left Twitter last week
Suddenly the Twitter engineering team sounds not so capable, which clearly is not the truth, the truth is that if you have hundreds of managers, you'll end up with hundreds of small teams competing to boost the ego of the manager, usually wasting thousands of man hours on miniscule returns (if not losses) while those power point slides will help someone else to get promoted for the new project that nobody uses.
Been there, done that, I don't know why a demographic so well versed in the dichotomies of the tech industry such us the users of HN is so baffled by the claim that 2,000 engineers for a single company that does what Twitter does is a complete waste of human potential.
Elon Musk is a person I would never work for and I think he's not even a good entrepreneur, but one thing he does right: he calls the shots and then executes them.
He said he would fire people and he did, many helped him by leaving on their own, which left Musk with the responsibility of proving he was right.
If Twitter will still be up and running in a year time, we can be sure that there were 1,200 engineer too many working there.
because, honestly, who really believes that the "influencers" will actually leave for the fediverse, where they'll have to work hard and compete with mere mortals, while they could keep cashing from advertisers to promote shit to their already established audience?
nobody believes that.
Also because the fact that Twitter will sell less ads in the next future doesn't mean that advertisers won't spend that money on Twitter, they will simply not pay Twitter, but the Twitter users. For them it's exactly the same thing, for Twitter celebrities it's a giant opportunity.
1 message broadcasting to 10 silo'd people is a completely different system than ensuring a celebrity tweet gets to 100 million people within a couple minutes, which is what twitter considered their task.
Who is defining and managing the PRDs and the roadmap in this scenario? What about user testing and customer development? By “graphic design”, do you mean User Interface design? If so, who is responsible for the user experience beyond just UI?
The point I’m trying to make is that it takes some effort (beyond just the plumbing) to create an experience that folks actually want to use on an ongoing basis
> Who is defining and managing the PRDs and the roadmap in this scenario?
Mainly the project manager.
> What about user testing and customer development?
Testing could be done by everyone, mainly the project manager, although you could perhaps add a tester to the team. Don't know what customer development is.
> By “graphic design”, do you mean User Interface design? If so, who is responsible for the user experience beyond just UI?
Yes, UI design. The graphic designer would be primarily responsible for UX, but also the project manager and developers.
This is based on my experience with mobile app development. Right now, it's a fairly complex product, users often say that it's much better for many use cases than the alternative by Google, which I guess may be developed by a 10x larger team.
We are not talking future development, just maintaining status quo.
Rode map is just: keep it working, or whatever EM twitted about hour ago :)
But I agree it's way to optimistic number. You need at least a few for each platform, just because of bus factor, you also need people to keep in touch with Apple/Google reps etc., making sure bills are paid etc.
I imagine it will mostly be just minor tweaks and no major features, you can easily do both mobile targets by cca 10 people, not working crazy hours. There are plenty of successful apps with smaller teams, that make it work.
Eh, tossing the direct comparison to Twitter out here because they seem to have gone too far the other way, this is much like saying "I can make 1,000,000 screws a day, so assembling at least 100 cars a day should be easy".
Well graphics design isn't hard. I'm sure one of the Devs could do that. And I'm sure there are tools to convert an iOS app to/from Android, so one dev can do both.
And a website is easy. You could do it with 1 person.
But Elon is such a machine, he could keep it running by himself.
You still have to read them to work out that user X saying they can't log in is the same as user Y saying authentication is failing at A stage, issue is present on B and C platforms version D.E
Further the issue of 'site using the wrong font' is not equally as important as 'site is exposing private user data'. And you're likely to get many people reporting the former, and maybe only one reporting the latter, so a sample isn't likely to catch the really important stuff.
Indeed. Just translating it could be a team of at least N+0.5 people, where N is the number of languages you want to translate your product to (the +0.5 accounting for translation verifications). Even if you outsource it that just slows down feature deployment IME.
The people shouting loudly about how Twitter must have been so bloated are really just shouting their obvious inexperience working at global scales or their localized ambitions.
Could there be too many employees at Twitter? Sure. Most companies have dead weight.
The number who were "extra" is probably not 9/10ths the employees though.
> I am convinced you can run the entire tech stack with a team of a 100 people.
This is because you don't see the complexity. What you see as a Twitter user is a fraction of what's actually there.
You have to build a platform for ads. Not just serving ads, but allowing advertisers to prepare their collateral, preview them, get their results, and be billed. So that's an entire content and invoicing platform separate from your main feed.
And since your platform is all user generated content, you've got to build a moderation pipeline. A place for users to make reports, but also an interface for your content moderators to view content and make decisions. Oh, and while you're there you'd better build a portal for law enforcement to make data requests, along with your DMCA takedowns. Oh yeah, DMCA - that's another whole thing you've got to worry about.
Then the EU comes along and needs you to build something to support your GDPR obligations. Then India wants something similar, but only for its citizens. Your users also want verification, so better build that platform for securely verifying accounts and awarding checkmarks.
It snowballs. Was Twitter's engineering group bloated? Probably. Most large companies are. Could you run the whole Twitter tech stack as it exists today with a hundred people? Absolutely not.
Ads isn't just serving, but targeting. The better you target, the more your ads are worth. When you have $5B of advertising, a 0.01% improvement is breakeven for $500k of fully-loaded comp. So you should add as many engineers / data scientists / etc as can generate an 0.01% annual improvement. Or maybe you want to take 3x their annual salary: that's still an 0.03% improvement in ad relevance.
Separately, some commenters here are flatly delusional about the effort to ship a site, android and ios apps, internal mod tools, help docs, support, and legal docs in 34 supported languages. Not to mention obeying laws in all the countries that implies.
Or image and video hosting! With recoding of videos, resizing of images, and the management of what is surely petabytes of images and videos with very high reliability! That is not a 1, 2, or 3 person job to do well.
A lot of commentators are either not engineers, incredibly poor engineers or utterly ignorant to how much work it is to manage a global company, as you've mentioned. Doing anything at the scale of Twitter is a nightmare crossing multiple languages, laws, domains and expertise much of which we'll only see start burning when governments step in and start taking chunks out of Twitter.
Anyone who has used Twitter, have you seen any evidence they do this beyond extremely basic geographical targeting.
Like people keep listing off all this stuff when we’ve all used the site and can see if it does have a team working on it then they’re not doing it to the levels of their competitors.
>> "These are some of the interests matched to you based on your profile, activity, and the Topics you follow. These are used to personalize your experience across Twitter, including the ads you see. You can adjust your interests if something doesn’t look right. Any changes you make may take a little while to go into effect."
They're not good at identifying interests and making them targetable, but they try.
To be honest, growth. I work at a tech company of a similar size, and keeping the lights on, so to speak, from a systems perspective is something we can do with a few engineers. By far the bulk of employees at my company are in R&D, sales, business operations, infrastructure growth, etc. If you build a resilient architecture, keeping something like Twitter simply running wouldn’t take that many people.
>If you build a resilient architecture, keeping something like Twitter simply running wouldn’t take that many people.
It takes an army of engineers to build a resilient architecture at Twitter's scale.
And why are we even talking about "keeping the lights on"? Elon is claiming he's going to build a better video platform than YouTube, complete with better tools and for creators, for crying out loud.
He also said we’d have robo taxi’s by now. I’m not sure how much faith you can put into anything he says because so little of anything he has said has been true. Hell, if he told me I was fired, I probably wouldn’t believe him. Sadly, HR probably would.
i see your point but you should always put > 0 faith in Musk IMO. No robo taxi's but, even after being laughed out of the room quite literally, landing orbital class boosters is common place now. I don't doubt there's the talent at Twitter required to do amazing things and Musk seems to find ways to push and motivate talent to crazy levels of results. I'm not saying things like a more youtube than youtube is destiny but don't completely write it off.
I didn't say 0 faith, I just said I wasn't sure how much. :p
SpaceX is an interesting result, but it is starting to look more like a fluke than what we'd expect from him. Perhaps rocket scientists "get him" and understand the stakes better, after all, if you screw up a rocket you could die in the aftermath.
Tesla is certainly similar stakes -- except one mistake can kill every passenger on the road instead of your fellow employees. Twitter is basically 0-stakes, and is unlikely to take itself that seriously. If they screw up, maybe a bad government won't get overthrown somewhere, or trolls will take over the internet. But mostly, people will survive -- or rather, the outcomes are far more abstract.
He doesn't seem like a good fit for this kind of environment.
A script that takes text from input field and displays it along with previous submissions in chronological order is trivial; the difficulty comes from serving millions of people.
For US only? This sounds like the comment complaining about app size when an Uber engineer showed up and talked about handling 100s of methods of worldwide payments. Many in the US seem to think the internet only exist in the US, and that US rules apply everywhere.
Googling "how many people work at stack overflow" gives a number less than 400. If true I think you could put together and run Twitter with less than 400 people from an engineering stand point. However Twitter probably has a much higher soft skills head count for moderation and what not. Though it wouldn't surprise me if a lot of that was contractors not direct head count.
I don't see any comparison at all. Does Stack have to maintain mobile apps? Their moderation is mostly handled by users, but Twitter must write and maintain tools for handling that. I doubt anyone is trying to upload illegal porn to Stack. Latency of 30 minutes would be noticed by very few Stack users. Twitter probably really needs at least 10x the SREs of Stack.
Yeah, but that 400 is everyone at stack not just engineering. A team to put together a mobile app should be less than 10.
>Their moderation
Above I assumed their moderation team was probably larger than their engineering team, and mostly contractors. Thus I kept my estimate to the size of their engineering team.
The whole point of the article is twitter was designed to be resilient. (and it shows, twitter has great uptime). And the whole point of resiliency, beyond not negatively impacting customer experience, is to buy engineers time to fix things when stuff breaks.
What we are watching is a massive failure event right now and the question really is if there's enough time for twitter management to fill in the gaps before there's an outage.
if 80% of knowledge is missing due to 80% of people being gone then the team who built it failed to document or automate themselves out of a job, meaning even other people on the team in the good ol' days would still have faced the same issues and merely had to chase down the original authors. that doesn't sound like it would be completely true. maybe 5-20% of un-documented knowledge walked out the door? completely rough guess. but even 0.5% of knowledge can sometimes be critical.
There's something poetic about NASA sending a rocket to the moon at the same time as Elon Musk is fucking around with Twitter instead of sending a SpaceX rocket there too.
I think you're misinterpreting the comment you're replying to. They would agree with you that the tiny SRE team described in the article sounds very effective, and likely have a lot to do with why the site is still up and running currently. Work like that should continue. But if 1-3 people can have that degree of impact, what are the other 8000 doing? (Again, this is just me attempting to interpret the point made by the parent, not trying to make one myself.)
Once you get the automation going the number itself doesn't matter that much.
You might have 200 different apps (hell, we have close to that, only 3 people in ops) but competent team will make sure they deploy in same way and are monitored in same way.
And once you go from "a server" to multiple servers, whether the end number ends up being 20 or 200 isn't that important till you start hitting say switching capacity, and if you're in cloud that's usually not your concern anyway.
Our biggest site (about dozen million users, a bunch of services and caching underneath, few gbits of traffic) took zero actual maintenance for 2022, "it just works", any job was implementing new stuff. It took some time to get to that state but once you do aside from hardware failures it "runs itself"
> Our biggest site (about dozen million users, a bunch of services and caching underneath, few gbits of traffic) took zero actual maintenance for 2022, "it just works", any job was implementing new stuff. It took some time to get to that state but once you do aside from hardware failures it "runs itself"
Nobody is adding changes that blows out the DB? or add some inefficient code that burns CPU much faster?
It's not 1-3 people. The entire SRE team globally - including the technicians and the engineers with server access - is easily going to be in the hundreds.
The SRE manager is in charge of keeping it all running. He isn't running around the world swapping out servers. He also isn't sitting back with his feet up thinking "All done - now how are my Pokemon doing?"
It's a dynamic process with quality monitoring, budgeting and reports, post-mortems, continual experiments to see if uptime can be improved, and redesigns as hardware and software change.
It's part of the backend, but is only loosely coupled to the content management and delivery system, the ad machine, moderation, marketing, and so on, all of which are going to have similarly complex structures.
it doesn't follow. The article posits that "many people think twitter headcount was bloated" then proceeds to describe a (presumably) really efficient work of a small SRE team. These two parts seem completely disconnected from each other - neither one proves, disproves or follows from the other - so it's unclear why the former was mentioned at all.
His personal experience was zero bloat. He was one deep in a critical function for the company. He isn't saying this proves without a doubt that there is no bloat, but he didn't see any in his time there. It's seems like a reasonable addition to the conversation to me.
It's a shame that most the conversation going on here are extrapolated arguments based on this article and another anecdotes. The problem starts when the ones making their points let beliefs on "how things should be" stronger instead of "how they really were".
No, I think the article makes it very clear what the value and function of SRE is. The point of the comment you're responding to is that the author was the only one doing this—not a team of ten, not even a team of two. This is Twitter's whole cache system! Probably the most important part of their hardware stack, in terms of "is the site performing well for users". There are other SRE needs at Twitter, but not that that many. What were the other 9k people at the company? It begs the question.
software doesn't break down from heat. An app I write today will run until the hardware dies. I have a palm_os app I wrote in 1998 that still runs perfectly.
"software doesn't break down from heat. An app I write today will run until the hardware dies. I have a palm_os app I wrote in 1998 that still runs perfectly."
In an organization of any appreciable size, things change all the time.. and I'm not just talking about code (for which you could have a code freeze in an emergency situation like this), but the external systems you're connected to could change for reasons completely out of your control. Content changes can break stuff because of bugs in your code. Legacy systems could require all sorts of ongoing tweaking and maintenance. And, yes, heat can break your software if the server it's running on overheats.
Agreed.. but lets say you fire 99% of your engineers, and declare a code freeze (because there's no-one left to write code)..
Then in theory.. if you own the hardware and you've locked down the libraries... That code could keep running for a long time. Agreed it's not a Palm app, but with everything locked down, I'd argue it's safe
But now I can third party stuff changing. Payment processors and such. Those don't happen fast though, and 100% not so fast that a company the size of twitter can't work out a sunsetting.
To the heat can break software if the server it's running on overheats. I have a feeling twitter's has a system in place to scale out the faulty server.
My point was, comparing code to a car is silly. A car needs maintenance. Code in code freeze does not.
Software bit-rots. That app from 1998 doesn't interact with today's world. As the world evolves around us, needs change and software has to change to keep up. That's not to say there aren't companies out there that rely on some ancient Windows 98 software program running on similarly ancient hardware, because there are. But Twitter as a piece of software isn't some static thing. It's needs are constantly changing and the software has got to keep up.
Your PalmOS app doesn't run on any modern hardware except under emulation. (Which is sad, I loved my Centro and held onto it for as long as I could.) The last release of PalmOS was in 2007, 15 years ago. Most hardware from that long ago is dead, and thus your software is dead. broken down by entropy to the hardware.
Agreed. But my comment was in comparison to a car needing maintenance. If nothing changes and I drive my car for 5 years without taking a look under the hood, it will be a mess. If not a stitch of work is done on it, I'm in trouble.
If however I have an app and I don't look under the hood for 5 years, it could still run as good as it did when I locked it down. As you said some companies run on apps written for windows98. Those apps are still working as they always did.
I don't think it needs are constantly changing. Like it could freeze for weeks/months. Leave existing bugs and put versions in lock.
I do agree that it will eventually need to change, but that's where selective hiring comes in. Oh system X isn't great. Lets find a team for that, all else remains black-boxed.
Even discounting external changes, any reasonably complex system needs maintenance because time moves on and new interactions happen.
How many SSL certificates (internal or external) need re-issuing per month? Some of that can be automated, but in an organization as large as complex as Twitter some will be bespoke and manual, and a code freeze won't stop the clock.
How many new CVEs per month apply to Twitter's services and tooling? How many race conditions or other bugs are lurking, just waiting for the right time or traffic pattern to emerge? Twitter can't freeze inbound traffic without dire consequences.
Twitter is like your car, except that it's always running.
To be honest I was very surprised to hear what a cache SRE was working on. It sounded like he had to build all of handling of hardware issues, rack awareness and other basic datacenter stuff himself. Does it mean that every specialized team also had to do it? Why would cache engineer need to know about hardware failures at all, its datacenter team's responsibility to detect and predict issues and shutdown servers gracefully if possible. It should be completely abstracted from cache SRE, like cloud abstracts you from it. Yet he and is team spends years on automation around this stuff using Mesos stack that they probably regret adopting by now.
I feel like in this zoomed in case of twitter caches what they were working on is questionable, but the team size seems to be adequate to the task, so my takeaway is that like any older, larger company Twitter accumulated fair amount of tech debt and there is no one to take large scale initiative to eliminate it.
>As someone paraphrased, a car without breaks and steering wheel works just fine until you hit the first bend.
On the other hand, a car without a second and third steering wheel, 20 windscreen wipers, and an oven in the back, keeps running just fine, even after the first bend...
Have you also taken into consideration the death rate in smaller planes, much higher than commercial craft. The moment you attempt that with a 'high capacity' craft you'll suddenly find yourself regulated out of existence by a book written in blood.
I'm not saying they're bloat. They're a tradeoff between cost (man hours of work) and lives. They've increased the cost of small planes tenfold but have saved tens of thousands of lives.
And it's almost inevitable. That's why the overwhelming "Elon is so dumb" narrative seems so odd to me.
"Major harm," to me, would be either bankruptcy or a competitor overtaking significant chunks of Twitter's users. Even if Elon literally has to re-hire half the roles he fired for, or Twitter is down for a few days or a week, or they struggle to get advertisers for a little while, that's nothing for the long-term. In six months, the chances that it'll look like these firings were a bad idea are minimal.
And likewise with the "the only realistic way to moderate an online platform is the way Twitter was doing it" narrative. In six months, all it takes is the ship to still be floating without the old moderation to prove it out, and that's by far the most likely outcome.
> "Major harm," to me, would be either bankruptcy or a competitor overtaking significant chunks of Twitter's users.
With the takeover, Twitter has been saddled with massive new debt and would have to become significantly more profitable to service it. Instead, they are losing ad revenue, and any savings from reduced head count will not manifest four at least 4 months because of the severance payments for the laid off workers.
Musk could of course keep the company going as a hobby for a few more years, but he strikes me as more likely to cut his losses early — preferably by finding somebody to take the problem off his hand, but, if necessary, by bankruptcy.
> Even if Elon literally has to re-hire half the roles he fired for
I have this nagging suspicion that the kind of people who line up for an opportunity to work without equity, lousy job security, and send weekly code printouts to the CEO are not necessarily the highest quality hires in the field.
> or Twitter is down for a few days or a week
The Twitter user base has quite a bit of inertia keeping them from moving to competing platforms — lots of people I follow have made backup plans on Mastodon, but not many have moved their focus there. But Twitter becoming literally impossible to use would probably be the one thing precipitating such a process.
> they struggle to get advertisers for a little while
And after that? Are the major brands going to decide that they like advertising on a Gab-lite Twitter after all? Or is Twitter going to strike it rich with ads for dietary supplements, gold coins, non-perishable meals, and tactical pants?
> In six months, the chances that it'll look like these firings were a bad idea are minimal.
The targeted firings were bad enough (To paraphrase someone else "Imagine ranking engineers by # of lines of code added, and then keeping the ones with the HIGHEST number"), but the voluntary departures are likely to be even more consequential. After the way the workforce was treated, what are the odds that the people left are the "best", rather than the ones with a lack of alternatives (e.g. for visa reasons), or Musk fans who may or may not be qualified?
This is exactly why this thread is full of slightly insecure comments making vague predictions. I'd suggest to most of them to get off HN and back to work, now is the time to make yourself useful!
Twitter is basically a real-time database where everything is interconnected. It's one of the harder things to scale because it doesn't allow for easy segmentation.
No, it really isn't. It is perfectly normalizable.
Every twitterer is a newsletter. Most hardly ever tweet and sporadically at that.
The followers are subscribers. They hardly even see most tweets they subscribe to as the whole thing is quite ephemeral. Same as never reading all your emails (very few people are inbox zero freaks).
The timelines are just that, an email inbox. It is very soft "real-time" at best.
Tweets older than 48 hours can just be archived to a blob store and served as a static website. Most people consume it as such, logged out and from the browser.
You think brainyquote.com is hard to scale? Think of twitter.com as unbrainyquote.com.
All the other froufrou built on top of it is more complicated but 2007 style twitter is dead simple. That is its (completely accidental) genius.
There is no shortage of systems that are x10-100 times more "web scale".
The approach you're describing (fully materialized inbox model) simply doesn't scale. I say this as someone who worked on social media databases and scalability for a decade. Otherwise, every time Obama or Musk tweets, you need to do one hundred million writes. That amount of write amplification is completely ludicrous and would crush any system.
At minimum you need to do a hybrid approach which special-cases the more widely-followed users. This problem has been well-known for quite a long time. Yahoo's "Feeding Frenzy" whitepaper came out in 2010, but the concepts were definitely known before that; I remember hearing about hybrid activity feed designs in 2009 from colleagues who formerly worked on LiveJournal.
> Otherwise, every time Obama or Musk tweets, you need to do one hundred million writes. That amount of write amplification is completely ludicrous and would crush any system.
I knew somebody would bring that up. Currently there are fewer than 500k "verified users". Not that many people have 1 million+ followers and they don't tweet all that often.
> At minimum you need to do a hybrid approach which special-cases the more widely-followed users.
Great, so we are in agreement that for 99.75% of users it is all quite trivial.
I'll tell you what has happened since 2010: hardware has gotten a lot faster. 1 million iops is not a big deal anymore. Keeping that in mind you should refresh your assumptions.
We built a 2U Intel Xeon server system capable of 80 MILLION 512B random read I/O operations by combining the latest 3rd Generation Xeon Scalable Processor (code-named Ice Lake) with Intel Optane SSDs.
122M IOPS in 2U, with > 80% of the system idle. Easy.
Just to put this into perspective, at 4K random reads, this is 144GB/sec of bandwidth from storage, at 36M IOPS.
Fancy 512b random reads? You now get ~120M IOPS.
You could saturate streamed network traffic on 11-12 100Gbit NICs.
You can't just hand-wave away those edge cases. That's not how anything works.
If your potential write amplification for a single operation is anywhere remotely near a factor of 1 million, you have a serious problem and need to completely change your approach to the problem, and use different data structures and algorithms.
Hardware hasn't gotten that much faster really. PCIe flash cards started to get used over a decade ago -- yes, modern storage is better than those were, but not by a huge multiple. And meanwhile max CPU frequencies today aren't much higher at all. What we have instead is more cores. And a lot more RAM per box. Faster networking, sure. But none of this lets you get away with massive write amplification from choosing an overly-naive algorithm.
And iops aside, even just the storage capacity from full inbox materialization (along with necessary indexing overhead) will bankrupt you, especially on that blazing fast storage you keep talking about. Keep in mind everything needs to be replicated to multiple regions / data centers for DR/HA, as well as keeping the data closer to users to lower the latency.
I'm not making "assumptions" that I need to "refresh". I've literally spent the majority of my career working on this stuff at extreme scale, both in 2010 and today, and all times in between.
> And iops aside, even just the storage capacity from full inbox materialization (along with necessary indexing overhead) will bankrupt you, especially on that blazing fast storage you keep talking about.
The twitter firehose is usually bellow 50MB/s. 200GB a day of tweets will bankrupt no one. An 100TB Nimbus Exadrive that does 100,000+ iops costs about $30,000. 1 year of tweets. Thousands of twitter employees fired probably saves $3+ billion/year in salaries, I'm sure they have a hefty enough hardware budget.
> Keep in mind everything needs to be replicated to multiple regions / data centers for DR/HA, as well as keeping the data closer to users to lower the latency.
Does it really? I don't think so. Not with tweets.
For DR you can stream into a blob store like s3 in the background and have an automated process that stands up a fresh shadow cluster from it every couple of hours. Hardly costs anything with this volume of data. That is cold data, doesn't need the fancy blazing fast storage.
That's a single copy of all tweets. Not fanning out to up to 100 million inboxes. Completely different problems.
You keep citing raw hardware speeds of a single machine, yet we're talking about the feasibility of a distributed system being able to sustain random bursts of write amplification factor of 100 million across a decentralized database, with ideally exactly-once write semantics even if a failure occurs mid-way -- and that's all in addition to whatever the normal baseline write activity of all "normal" users with more reasonable follower counts. Again, completely different problems.
> I don't think so. Not with tweets.
So in your design, if the singular data center that maintains all users' inboxes goes offline for a long period of time, the entire product just goes down. And you think that's acceptable for a business valued in the tens of billions of dollars?
You seem absolutely convinced that a massive social network can be run on a shoestring budget with tiny staff, and no amount of evidence from someone like me (who actually worked on this stuff in depth, and posts with my real name, and expertise in profile) will convince you otherwise, so I suppose I should just stop replying to you.
> yet we're talking about the feasibility of a distributed system being able to sustain random bursts of write amplification factor of 100 million across a decentralized database, with ideally exactly-once write semantics even if a failure occurs mid-way
I think my assumptions are just a lot more relaxed than yours. This isn't a trading platform I don't see why you need exactly-once write semantics.
There are only 6 users with 100m+ followers and they avg a lot less than a daily tweet. @BBCWorld is #50 and it drops to 38m accounts. #1000 has 2 million followers.
> And you think that's acceptable for a business valued in the tens of billions of dollars?
Elon by his own admission grossly overpaid for it. Twitter has hardly ever eeked out a profit, it is not worth tens of billions of dollars. Nothing much would happen if it went down for a bit except maybe bored journalists would report on it thus, as ever, driving even more users to the website. But that is neither here nor there.
More to the point: if Elon and Obama and Bibster tweeted in the same minute (what are the odds) you would, gasp, have to stagger the fan out of the updates. That's alright too, for Twitter. It isn't really actually real time.
Those follower counts are also grossly inflated and as you understand yourself only a small fraction of them are online using the app at the same time as the person is tweeting. By the time they do check they might never even see the tweet.
To the people offline you don't need to fan out in a timely manner.
In short I believe the write amplification is much closer to 1 million than 100 million even with the pathological cases. And beefy enough hardware can handle those peaks.
Here's another way to think about it: Elon has 118m followers and just posted twitter has 260m daily average users. He is a bit like Tom from MySpace, half the users on the website are subscribed to his updates (not exactly really but for simplicity).
I think it is perfectly alright if it takes a full minute until all those users see his latest meme. It is very unlikely that even a quarter of all his followers are using the app during that exact minute, so we're talking 30m writes in 60 seconds. Big whoop.
> You seem absolutely convinced that a massive social network can be run on a shoestring budget with tiny staff
I would bet a budget of say <$1 billion/year and 100 engineers for the core functionality as is.
> no amount of evidence from someone like me (who actually worked on this stuff in depth, and posts with my real name, and expertise in profile) will convince you otherwise
Neither one of us presented any evidence, just opinions as outsiders, as part of an informal conversation. An appeal to authority isn't an impressive argument, I am also from this industry and with similar experience.
There is no need to take things personally. I think we just have a very different estimation of just how much activity twitter sees at peak and how strict the requirements are.
> I don't see why you need exactly-once write semantics.
World leaders use Twitter. It's a major international one-to-many communication platform. If tweets are lost or duplicated, it makes the platform look unreliable (because it literally would be) and as well as potentially making the tweeter look incompetent for posting twice. World leaders don't like to look incompetent, that can cause really bad things to happen...
> @BBCWorld is #50 and it drops to 38m accounts. #1000 has 2 million followers.
Even a write amplification factor of 100,000 is extremely problematic for the fully-materialized inbox model. A lot of prominent twitter users have followings larger than that.
> To the people offline you don't need to fan out in a timely manner.
So now you're adding additional systems on top, in order to scale. That's good, I guess you're starting to see that the problem is more complex than just spraying out every tweet to every follower's inbox. Now consider that when you actually build and scale a system like this, you'll need to keep doing that in a bunch of different areas, and the complexity keeps snowballing.
> And beefy enough hardware can handle those peaks.
There's no way to fit every users' fully-materialized inbox feed on one machine, so we're definitely talking about a large distributed storage tier / database here. Will you use "beefy" hardware for every single shard of your inbox storage tier?
> It is very unlikely that even a quarter of all his followers are using the app during that exact minute, so we're talking 30m writes in 60 seconds. Big whoop.
Once again, this really isn't like doing 30m write ops on a single box. It's queueing the writes via RPCs across a huge storage tier, while also needing some way to handle timeouts, retries, failovers on either side of the operation. All while the "normal" background level of thousands of tweets per second is happening from everyone else.
> An appeal to authority isn't an impressive argument, I am also from this industry and with similar experience.
> There is no need to take things personally.
I've literally built a reverse-chronological social network activity feed implementation, which successfully scaled to over 110 million posts/day. (For sake of comparison, Twitter was around 500 million tweets/day at that time, so this was def smaller than Twitter, but still quite large.) It did not use an inbox model. Took many months of my life, some of the most rigorous work I've ever done. My teammates and I evaluated several alternative designs, including fully-materialized inbox, running all the numbers in depth and building several prototypes. The takeaway was that a naive fully-materialized inbox would be completely and ludicrously infeasible in terms of necessary hardware footprint.
Separately, I've also spent years working on database infrastructure at extreme scale, including one of the largest relational database footprints on earth. I have a very good sense of what this requires. Yes, I'm posting "opinions", but they are based on many years of direct personal expertise.
Scaling a social network involves a massive number of challenging problems. Faster hardware doesn't magically make these problems go away. And while I haven't worked at Twitter, up until this month I knew four infra/backend engineers working there, and they're some of the best engineers I've ever known in my 17 year career.
I'm taking your comments personally because your comments are offensive. You're blindly saying I need to "refresh [my] assumptions" about a topic I'm literally an expert in. You're claiming Twitter could use some completely asinine overly-simplistic feed model, as if no one else ever thought of that, which would strongly imply every infra engineer at Twitter must be an idiot. In another subthread on this page, you wrote "The job cuts are clearly justified because of the extremely toxic work culture / cult" and it is necessary to "replace every single person who worked there and the entire tech stack". Seriously, WTF? These are hard-working humans with lives and families, they don't deserve this shit from their employer, and certainly not from offensive pseudonymous randos who have no idea what they're talking about. Have some empathy.
A lot of things in a typical social media service can be segmented trivially. The hardest part is the feed because it requires querying many database servers at once, but then it can be cached and served quicker.
Isn’t cache invalidation the hardest part of the feed? Once a user with millions of followers tweets, doesn’t that essentially invalidate millions of caches? That doesn’t sound like a trivial problem.
Because you can't edit tweets, the database is monotonically increasing (append-only) so scaling it isn't that hard. Cache invalidation basically just adds a new item. Deleting a tweet actually would be the harder/more expensive operation, but it's also less common.
Editing a tweet is also easy because the feeds would only store its id. You'd only need to invalidate one cache where the tweet object itself is stored.
That small team seems to have been running the caches for other teams, by using infrastructure provided by another team, in two massive datacenters operated by other teams, using monitoring tools managed by another team, and a ticketing system run by another, on hardware purchases by another team…
All just to put caching in front of services that actually do anything.
That being said it's not like twitter is a massively complex product with lots of different features. I can imagine you could keep it running with a skeletton team. Liasing with ads buyers excepted.
a bunch of people I talk to say it is massively complex but typically fail to explain how, especially given the super-glacial pace at which they added new features for 15 years. And yes, this article kind of doesn't disprove the bloat at all, unless every single SRE quit? but that is also not what's stated in there.
I'm pretty sure one of the most complicated things is preventing automation of content - bots. which would be an arms race type condition. Bots you want to prevent - people not using twitter's api to do their bot stuff. Why would people not use twitter's api?
1. who would trust twitter not to change API and make code worthless
2. people who want to do stuff Twitter doesn't want you to do in an automated fashion.
Even "simple" things become complex when they need to scale to billions of users around the world, handle a high rate of traffic, deal professionally and legally with all the different jurisdictions they operate in and all the advertisers they serve, etc.
I imagine the infrastructure implemented is fairly complex, this post does outline that it's not a super simple operation. It also alludes to there being more business units than just the core application.
> I was partly expecting the rest of the article to explain to my why exactly it wasn't just bloat
Same here. I guess his header was on point in why Twitter is still up; but I was also interested in hearing about why Twitter actually needs all those people. If it can be run with 50-80% of the staff gone, that does sound like some bloat at least.
Slack space leads to innovations, like developing infrastructure automation and improving capacity planning. SRE as a practice needs slack space for operations teams to work on improvements and fixes in addition to BAU fault fixing, deployments and patching.
I'm certainly no supporter of "lean operations" with minimum staff etc, and fully agree that you need people that are well rested (for the lack of a better analogy) to do great stuff. But I do think that some of these internet giants do have to many people working there; wasn't LinkedIn 14_000 strong when Microsoft bought it?
I've always felt that the American model of doing business is based on how we optimize network traffic, i.e. double the amount of data until failure; then turn in down a bit. Fire on all cylinders until people are truly worn out, then turn down the pace a bit. Haven't worked in the US so I'm pulling this info out of thin air...
Well, half of the staff prepares for some conference, meetup, tech talk, or helps organize one, or does 20% time, or sits in unnecessary meetings, or sits in necessary but inefficient meetings, or is on PTO or on unpaid leave in some retreat.
Plus above a certain headcount the communication overhead becomes seriously large, so just to compensate for the lost velocity you need to break out into smaller more agile more autonomous teams, which further increases the coordination requirements (thus the comms overhead), but allows overall throughput to scale.
And the leading edge technologies commonly used require a large headcount to begin with. (I mean just to start running something twitter/linkedin sized requires at least 1 engineer/million users, so a few hundred folks is a given. You need someone who understands networking, from BGP to TLS to VPN to whatever, internal IT, CI/CD, ops/SRE ... at that scale if you use anything, you need an expert for it. You use Kafka with a hundred millions of users? You might need at least a few people who actually know what the fuck a partition means. Unless you want to just directly give all of your money to Jeff in the form of egress fees, you might need folks to setup CDNs, and whatnot.
So without naming names, ina big password manager company (around a hundred million users?) a few years ago there was a certain rewrite project. 3+ people worked on it for 8-10 months, then it was put on hold temporarily. And then obviously nobody speak of it ever.
It happens that there are inefficiencies that for months not one line goes into production from certain individuals.
It was bad management, yes. But if good management was easy to find then we would be talking about different things :)
I'm puzzled by this statement. Do you think of resiliency as waste? That twitter would have been fine without it?
The article makes a point that the reason Twitter is running ok on 20% of personnel at this moment is exactly because it was build to be resilient, not because the personnel was bloated. A large part of this so called bloat, the 80%, was responsible for Twitter to be running right now. Calling this bloat implies it is actually not important for Twitter to be available all the time (or at all).
>>If anything, the article might actually persuade me that it was all bloat.
Not for me
This is almost exactly like the new manager coming in, noticing that the floors and surfaces are all clean, all the systems work, the trash is emptied, etc., and so deciding that the entire maintenance staff is unnecessary and firing them.
The place doesn't become a decrepit pigsty the next morning; it slowly degrades.
Same for these systems. They were designed, built, tuned, and maintained over the course of years to go from requiring constant manual intervention to running largely unattended and with a good buffer of ready hardware and automatic failover for failures. That "largely" in "largely unattended" is doing some very heavy lifting.
The system WILL require human intervention to keep running, and more than just a skeleton crew. The only question is whether it will happen before the new crew gets up to speed to handle the inevitable degradation.
This does NOT mean that the SREs were bloat - it means that they were doing an excellent job and could safely take a break. We're now in just the two-week vacation zone - same as if the entire SRE team went on a holiday. We'd expect it to work. Now let's see what happens in two months.
In addition to the many great comments here, remember that super star engineers don't exactly fix problems from day to day. They fix the problems before they become problems.
The engineer was doing stability planning for 6 months out for the purpose of cost optimization. I guess we can assume that the costs of infrastructure is about to go up and reliability is about to go down in the coming months.
I have been under the good faith assumption that most (though definitely not all) of the employees that have departed Twitter were probably necessary and valuable to the company. I left the article with the same impression as you. This single person did this very important job, seemingly well, and didn't appear to be drowning in the work. What were the other 8-9k doing?
> But it goes on talking about this 1~3-person cache SRE team that built solid infra automation that's really resilient to both hardware and software failures.
... for the Cache component. There are many others.
Musk might have been right about some things. There probably was some degree of bloat. But to say he's badly mishandled this whole saga is a gross understatement. It is very difficult to utterly kill a site like Twitter; the fact that we're even considering that as a realistic possibility shows just how badly.
I think Musk is used to Tesla and SpaceX, which are both companies that a lot of people are (or at least were) excited to work for because they believe in the mission and what's being created. Plus there aren't many alternatives if you want to do that work. Twitter really isn't like that for most people; a Twitter developer has many other options to do similar work. Add to that the fact that he's both cranked up the intensity of the abuse and that it's more visible to everyone, and you can't expect a lot of good people to stick around. And despite the fact that it might coast for quite a while on the back of excellent work in the past, eventually you do need good people to keep a business going. (This is leaving aside the direct impacts of his actions on users and advertisers!)
>the fact that we're even considering that as a realistic possibility shows just how badly.
Do we have much to go with to "really consider it", or is it just sensenationalist headlines, as Musk went ahead the accepted orthodoxy on Twitter and content moderation and so "he must pay"?
That would be the same news outlets that built him up as the real life Tony Stark and propped up his quite ordinary companies, one of which is building electric cars like everybody else, just doing it in style as well-off upper-middle class people's toys, and another is doing space tech that was already a thing we were pursuing 40 years ago with a little modern engineering thrown in...
>Why aren’t the other space programs/companies landing rockets?
Economics, pure and simple.
SpaceX was heavily capitalised during an era of easy money, so they have cash to burn advancing this technology. Everyone benefits. But there is no indication they are anywhere near profitability. So why would other space programs do this, when they can reap the benefits (tech and cargo) of a VC and government subsidised company?
Or do you think the SpaceX engineering team is the only one on the planet capable of doing this? I'm sure they're awesome, but that's a stretch.
Perhaps they haven't gotten as good NASA contracts and subsidies as Space X.
Note also how "launching rockets" or even going to the moon and back, was something we routinely did 40 to 50 years ago.
Heck, we even tried reusable rockets, and even had space shuttles. They were cancelled for political reasons, just like moon missions and NASA programs were. Not for lack of progress or technology.
The shuttle being “reusable” is ridiculous when compare with falcon 9 or heavy. The entire booster pack was discarded and the shuttle basically needed to be heavily refurbished…
I’m amazed people in good faith will minimise what SpaceX has done and pretend like “no one else cares”.
NASA has basically zero capability anymore (given they just launch their only rocket) - which was completely disposed of.
Do people really think all these other programs/companies think it’s “better” to throw the rocket out?
Why are rocketlab and others working on recovery and reuse then? Why has blue origin failed to get to orbit? Surely we’ve been doing this for 40 years, what’s wrong with them?
The shuttle SRBs provided most of its thrust and they were certainly recovered, refurbished, and reused. As was the main engine, given that it was attached to the shuttle.
which signs? They launched a new feature (blue checks for $8) and had to turn it off immediately because it was bleeding money and ruining the platform and they have less ad revenue booked for next year than they had at the same time last time.
I don't think one should judge the new twitter course yet, but "well, the site is still up" is a very bad measure of success.
It's way too soon to tell. On a dramatically smaller scale my team went through a big drop in headcount. Day 1 the impact was nil. Day 10 the impact was negligible. Day 30 some minor problems were identified. But it wasn't until about Day 90 that we had our first outage and Day 270 that we had our first lengthy outage.
>Was Musk right? All signs so far are pointing to, yes.
Huh? He's been in charge for, like, two weeks. Did you think it could implode the instant the engineers received pink slips? Let's give it a year before we say he was right.
Way too soon to declare that Musk was right. I don't even think signs are pointing there. Twitter is bleeding some of its most valuable users, the content creators, to things like Mastodon. There do appear to be cracks happening at the edges. Bots and hate speech do appear to have increased.
Thing is, I think Twitter was bloated and it needed a kick in the rear. Pre-acquisition I heard the same from many I follow. How Musk has gone about it has been the problem. Ignoring his perpetual hates, he had a decent amount of goodwill the day the deal closed. Then, he squandered it with all his antics. A transparent content moderation board turns out to be a game-able Twitter poll. Blue check for all was completely missing any point. No one wants a blue check for money w/o the associated verification. Verification for all would have been awesome.
Ads quality has dropped from what I've seen. It looks like people are pulling out, albeit slowly. MBAs will be studying this, but how things are going means we may look back and see this as Twitters Yahoo/AOL moment when it sells for a few billion in a couple years.
Did Musk fire the right people? Or did he slash based on obedience, mood, whim? In what situation would one want THE senior SRE for the cache systems (which are unique and affect twitter across the board) GONE? There's a reason why managers were trying to claw back some engineers and employees after the big layoff. Sure you chop off a few fingers and keep working.... but lose a thumb?
What do you mean by "right"? He grossly overpaid for a business that isn't profitable and likely made it even less profitable. Even if it limps along and some husk of Twitter survives, it's hard to see Musk as making the right move here.
I guess you were also declaring Putin a tactical genius for reaching Kyiv so quickly after invading Ukraine? Military historians will be studying that advance for years.
And so is the cache setup. It’s permanently (and deliberately) running at less than 50% utilization to prevent an issue that comes up only once every 5 years (according to the author).
Of course its all bloat. Software runs on computers, not engineers. The default assumption for software is that it will go on working. The state might devolve, but the software is exactly as reliable right now as it was before. I'm no friend of Elon, and I think its hilarious to think that he can be king of twitter, but all these people talking about "code entropy" are certifiably insane.
Big tech maintains talent so that they won't use their knowledge of the system to produce an identical competitor without the technical debt or investor liability of the original.
I read it as the exact opposite: the reason Twitter is still ok* is not because all these people were just browsing reddit at work. You can't just gut Twitter to run on only a couple of hundred people and still expect the same results in the longterm.
Twitter was not a leftist commie welfare company as Musk and its fans want it to be. It was actually the fine work SRE (amongst others) put into it that makes it still tick along as it does...for now.
* actually some things are already breaking, but it will take some time for the real damage to surface on a technical level
And seriously, the gaming industry is infamous for its "year round crunch mode". It can be done. Many sectors are "hard work" (ie. public sector, fundamentally non-tech gigacorps with no real in-house IT competence, etc.)
Twitter was an innovator. I remember getting into Scala party because of some open source code library from them back then, I don't know, probably around Scala 2.8.
That was absolutely unnecessary for their core business. All of this can be done in bad C++, or plain old Java 7, or whatever, for low cost, built in Bangalore.
Now that would of course likely not withstand this kind of change storm that Elon caused. But that's a different story. (Also what Elon probably doesn't understand on a technical level (but likely guessed from a business aspect), is how flexible Twitter is. In this regard he hit arrived at the right time. Twitter is fundamentally the same as it was, now it's time to innovate with it, and the architecture he found is exactly the right one for that.)
I guess different definitions for "bloat" but how is it bloat to have a tiny team taken care of a fundamental piece of infrastructure? if the team is now gone, an issue there would mean hours of downtime. If that's acceptable then yes, it was "bloat".
I’m suspicious that most of the value in these systems comes from a small fraction of the effort and many technology jobs boil down to knowing you’re a huge cost center and putting on a performance to hide that.
This, in my reflection, is that one insightful comment that should be higher up even it came to late. Twitter userbase did not expand significantly in the last couple of years. Revenue increased. Why did cost increase so much?
Of course it was bloat. This whole “twitter is going to crash and burn” thing is a weird fantasy. Most likely it will just be run more efficiently by far less people.
Well, WhatsApp had ~50 employees and Instagram around ~15 when FB acquired them, and they were around the same order of magnitude of complexity as Twitter.
The only concern Id have is that by having so many people, your design probably comes to rely on them whereas a smaller team would be forced to make the system easier to maintain.
Personally, if I were Elon, I’d build an entirely new backend and point the clients to that rather than trying to incrementally improve what they have.
Get 50-100 10x engineers that are loyal to Elon, with big equity stakes, and crush it
A lot of people who don't understand why Twitter owns two datacenters point to "complexity" as an argument and completely disregard scale. It turns out that massive scale adds a lot of complexity to a system, particularly around many-to-many pubsub systems (like social media). It also means a lot of features, like compliance with government regulations around the world.
WhatsApp had a Billion users and 50 employees. You can say that Twitter is incrementally more complex due to wider broadcasting, but it’s also 1/5 the user count.
Everybody stating this number of employees is necessary to maintain a Twitter scale system is simply wrong.
And as technology progresses, fewer and fewer people are needed to maintain the same size system
Many just don’t want to believe that the leverage in employees favor in the tech sector is fading fast, and their labor is not going to warrant 500k comp with marginal effort anymore
Whatsapp, discord, telegram, and all other chat apps have a very easy time scaling. Using user count as a metric and comparing them to Twitter is pretty disingenuous. Instagram is a much better comparison, which had 1/50th the user count of Twitter at its acquisition.
Chat apps are well known to rely on infrastructure that scales up exceedingly well. That is why there are so many of them, and why they all have 100 tech employees or less.
Instagram at the time of the acquisition could have run on 20 servers plus S3. Today's Instagram, along with today's Twitter, does a lot of work that is super-linear in user count, and has something like 2000 engineers. Timeline building is reportedly O(n^2) in user count. The scale difference has a huge effect.
Notice that chat apps have no equivalent to "timeline building." The worst scale factor a chat app has is linear.
Ok, multiply instagrams employees at time of acquisition by 50x and you still get an order of magnitude smaller employee base than Twitter.
And obviously employee count shouldn’t scale linearly with user count. No idea where you get that idea?
To argue that a 1B+ user app of moderate complexity can’t be run by a few hundred employees is simply wrong.
The end game for all web software is going to be managed services with automation built in such that it requires almost no manual intervention (absent logic bugs). This gets easier and easier to accomplish as cloud capabilities grow. And cheaper.
The long term trend for tech is obviously towards fewer employees and bigger impact per employee through leveraging higher level abstractions. Look at what you can build as an individual today vs 20 years ago and it’s quite obvious. Unless you think technological progress will suddenly halt
You are missing that you actually pay a lot for those abstractions. Once you get to Twitter's or Instagram's scale, it starts to pay to DIY the foundations of your technology, because the people to build it are cheaper than the cost to buy it. Building also has the benefit of aligning your concerns across abstraction layers, which pays huge dividends in efficiency.
I don't think it's obvious at all that the long term trend is in favor of few employees with "high impact through abstraction" when the highest-impact people I know at big tech companies are ones who build the underlying technologies behind those abstractions. They routinely bring in 8-figure cost savings per engineer per year compared to using OTS or open-source technologies.
Headcount should scale with actual complexity of the problem being solved, which is not AT ALL linked to a user's perception of "complexity." Twitter is far more complex than you are giving it credit for, and a big part of that complexity comes from its scale and the fact that it is cheaper for them to eschew "higher level abstractions" than to use them.
Everything that is happening with Twitter proves Musk is yet another wealthy idiot who doenst know shit about shit, except how to blow his own horn. Musk is simlpy lucky that Twitter used to have such excellent engineers like this SRE so that the site isnt yet on fire.
Musk lead teams at the only company who can land commercial rockets and the first company to profit from making electric cars. To say he is a wealthy idiot says more about you than it does about Musk.
So Musk hired the right people. I mean he made some brilliant decisions over the years. Does that mean that everything he does is a guaranteed success? There are plenty examples of ultra successful people in history who were blinded by their own success and became a little bit unhinged over time..
At this point Musk is behaving just like a twelve year old’s idea of a mega billionaire. Hyperloop, flamethrowers, self driving cars now Twitter... It’s great that he’s having fun I don’t think there is much to it than that.
I was partly expecting the rest of the article to explain to me why exactly it wasn't just bloat. But it goes on talking about this 1~3-person cache SRE team that built solid infra automation that's really resilient to both hardware and software failures. If anything, the article might actually persuade me that it was all bloat.