Their abstraction for job scheduling (Tupperware) is about 5 years behind - something like Borg or EC2/EMR. Something like that is a fundamental reason why fb can’t do cloud as it is right now.
Plus, the infra teams do operate like product - impact at all costs. Which to most management translates to short-term impact over technical quality.
It would be a true 180 in terms of eng culture if they could pull off a cloud platform. An example is how they bought Parse and killed it, while firebase at google is doing extremely well.
Having said all that, I think focusing on impact over technical quality is probably the right business decision for what they were trying to do at the time - drive engagement and revenue.
Killing of Parse definitely showed that Facebook didn't want to be in PaaS/IaaS. Now, there's going to be severe trust issues with any PaaS/IaaS with a Facebook branding.
Parse was ahead of its time, good Engineering, nice documentations and possible the first platform to make building backend/database system for an app seamless. Even when Facebook killed it, they did a great job at open-sourcing the platform.[1]
I had migrated my existing services to open-source Parse and even built a stateless application with >250,000 users with it few years back. But if I had to build the same backend now, I'll probably do it with Go/pgSQL instead of NodeJS/MongoDB as in Parse.
Honestly this was the first thing I thought when I read the headline. No one using enterprise cloud services is going to trust Facebook with their data and mission critical systems.
Well Google has invested more than $30 billion in the could last 3 years. This wait until Google is going to pull the plug is getting a bit annoying. I welcome the competition with AWS.
Yep, I have to agree. Facebook in some ways is incredibly ahead of the game, and in other ways, is shockingly behind - I was baffled when I started working there.
From what I’ve seen their dev tools internally are incredible- Facebook engineers constantly are saying that github is crazy behind and I’ve witnessed things that they can do that would make that true. Other times... they still have pretty old school hookups and getting stuff provisioned can be a pain (but if you can make a compelling business case this can all change)
this is true, but a lot of resources put into a new team that was completely walled off from the rest of the company, in terms of roadmaps and eng culture, could overcome this problem.
I think someone at Amazon in the mid-2000s would not have necessarily looked at their internal stack vs. other companies and picked them to become the breakout cloud winner, but when I was there (late 2000s), the AWS team was kept quite separate, and aside from things like S3 the retail side of the company wasn't really using much in the way of AWS infrastructure yet.
Amazon starting a cloud offering is quite different from anyone coming later. Even though lot of open source infra exists, expectations from dev/companies in 2015+ are sky high. If FB really wanted to have an offering, it needs to be specialized in something specific. May be Oculus game server hosting or something tied to FB/WhatsApp/Insta features
Not GP, but GCP's GUI is incredibly confusing and difficult to use. It is clearly a variety of products (poorly) stitched together. That has nothing to do with Compute Engine, and AWS is just as bad (if not worse).
My new projects will all be on Azure because Microsoft has pivoted to a company with reliable investment in UI, developer-friendliness, and long-term support. Serving businesses has also been a core of their company from the beginning, whereas Google seems unwilling to provide the human support required and AWS is clearly at odds with Amazon's primary culture.
It does seem like the underlying products in GCP are very good, but they mostly seem to replicate other offerings from Azure and AWS.
There are also those such as myself who instead feel that Azure's UI is by far by worst of the big three, and that GCP is not bad at all, and that AWS is big, complex, and ugly, yet practical. The long term support story of AWS is also great.
> The long term support story of AWS is also great
I agree in general, but specific services can be a problem. New server images have broken my apps, waiting for AWS to update support for languages or RDBMS can be painful, documentation is generally confusing (mostly because there are 10 ways to do every simple thing), and I bump into bugs regularly.
Thanks for the feedback -- I'd be curious to understand what about the Azure UI you find most striking. For obvious reasons, I don't have much experience with it.
I'm generally fond of our console, particularly the tools that show both the API and CLI invocations for most UI activities. On the other hand, I work on core virtualization, so the things I want to express tend to be very simple ("make me a giant VM", "make me a gianter VM", etc. :)
Thanks for the comment -- I think (although I honestly don't know for certain) that the goal is for the names to convey something about what the service does. In particular, so that in the context of the catalog the general purposes of the various services are at least _somewhat_ obvious.
The docs are always a work in progress -- anything you can recall trying to do with GCP that was particularly inscrutable (can't fix everything in one go, but one fix at a time is better than nothing :)
Not the person you responded to, but I just joined somewhere new and it's clear Firebase was the biggest mistake / is the biggest source of technical debt. It was helpful for bootstrapping 6 years ago, but now it stands out as the worst part of our tech stack. For our small shop, we will not be able to migrate any time soon.
- Most frustrating are the outages we have no control over. It seems like every other day, the server running our firebase instance mysteriously disappears from the internet. We can ping `my-app.fibaseio.com` but receive no events from it. This lasts anywhere from 5 to 25 minutes. Usually there is no blip in the firebase status page. The main devs have given up trying to debug it.
- We are nearing the resource limits of the paid plan for a single realtime database. We're working on splitting our one database into a master and sharding what we can.
- The particular eventual consistency and transaction semantics require workarounds everywhere. If you want an atomic transaction, you have to perform it on some common parent, but you are also encouraged to keep your data model flat and normalize as much as possible. Our integration test suite is tiny and horrendously slow because we cannot rely on Firebase to be timely, so there are huge timeouts everywhere that regularly take ages. Every past attempt at tuning the timeouts to be shorter eventually causes spurious failures a week later.
- Unless your data closely models the kind of chat application Firebase was built around, you end up needing a real database eventually. Not just to perform real queries with joins and complex logic, but also to essentially maintain indices that anemic mobile clients can use because the firebase filtered query semantics are limited to a single key. Now you need some kind of daemon to shuffle data in and out of your real database. Unfortunately, your real database is missing tons of foreign key constraints because that's the easiest way to handle firebase's eventual consistency.
That's just my take a month after joining somewhere intimately coupled to firebase.
It's a combination of a lack of developer bandwidth, fatigue, and typical difficult google customer service. Previously it was just a sole backend developer who also did the server operations/SRE type stuff plus a couple front end web developers. To them, it quickly became just a fact of life after repeatedly failing to find a solution. The customer service people don't like that, because these short down times always happen during standard business hours (only time our service is ever used). Complaints about disrupted sales pitches or customer demos go ignored.
I'm joining as the second backend developer, so maybe while I'm still green and have the naive bravery, I will go give it another shot next time it happens. I don't know much about advanced networking, but we suspect there's some kind of partition happening frequently between our servers and our production firebase instance. Uncertain how to debug that and how to work around it. It usually affects all of our servers, which are located in a single data center. I suppose next time I will try to track down and inspect the socket being used by the firebase SDK, but I don't really know what to be looking for at that point.
Mostly database-related issues as far as I understand. What I’m wondering is: is it related to Firebase only or more would it apply more generally to any NoSQL databases provider?
So I've been using it in production for about 4 years. I still have a couple of projects there, unfortunately.
Firebase lets you grow super easily at first but the bigger your project the more of a problem it becomes in terms of development.
The biggest problem is that databases are extremely inadequate for anything that goes beyond prototypes or very simple uses cases. There are no relations, very basic querying / filtering, no ACID, etc. your logic to interact with the databases becomes super tedious. Fauna is by far a much better serverless DB.
There is huge vendor lock in and their client libraries are huge. The JS library is like 200kB minified and gzipped. You can in principle use REST to interact with the backend but you lose most interesting features (realtime, sync, etc).
Cloud Functions are extremely tedious to work with. Most of your functions cannot be tested locally and they take forever to upload. Sometimes even minutes. Sometimes deploy gets stuck and don't you dare cancel the upload otherwise it will take more time to be able to upload again. With Cloudflare Workers you can do all in local dev and deploy takes seconds. With Zeit Now you can also make all dev locally.
The Firebase Console is a huge piece of bloated JS made in Angular 1 IIRC. It's also very basic as far as functionality goes.
Firebase hosting is good. Nothing to complain but all other options are equally great (Netlify, Zeit, Cloudflare Workers Sites, etc).
Not positive if it's still an issue or not but they had a Geopoint type in Firestore but I incorrectly assumed you'd be able to do geo searches because of that. Their proposed solution was to pipe the Firestore docs to Algolia indexes and use their service. I did that and I love Algolia but don't assume anything because it's Google.
When you say that Tupperware is 5 years behind Borg, what do you mean? It would take 5 years to add current Borg features? It lacks features Borg had 5 years ago? Or it would take 5 years to reach parity with Borg as launched, i.e. it's actually 17+ years behind?
Also, the majority of their resources are web machines, running HHVM. These machines are just serving monkeys, which are highly optimized for that workload.
5 years might actually be right on the mark? I think Borg and EC2 et al. had all of this functionality 5 years ago, but I'm not sure it was all there more than 5 years ago.
Their abstraction for job scheduling (Tupperware) is about 5 years behind - something like Borg or EC2/EMR. Something like that is a fundamental reason why fb can’t do cloud as it is right now.
Plus, the infra teams do operate like product - impact at all costs. Which to most management translates to short-term impact over technical quality.
It would be a true 180 in terms of eng culture if they could pull off a cloud platform. An example is how they bought Parse and killed it, while firebase at google is doing extremely well.
Having said all that, I think focusing on impact over technical quality is probably the right business decision for what they were trying to do at the time - drive engagement and revenue.