I haven't heard of either of those companies. I don't even fully understand what Databricks does. But it's clear that they have no problem shutting down a production database offering with 30 days notice, and have the gall to title this action "Investing in the Developer Experience". If this doesn't send a message that you shouldn't trust them with anything important, I don't know what would.
Just install Apache Spark they said. It will be fun they said.
If you have the money, having a managed Spark instance with a bunch of added features can be a big win for some. There is a lot that goes into Spark maintenance.
It also apparently includes some performance optimizations because they control both the hardware and software. And Delta Lake is pretty cool, and hosted MLFlow integration.
Back when I was a customer (before Photon was released, also during) they had a very good tuning, in the order of around 2x faster for the workloads we had at the time (very large graph computation and a “simple” filtering)
It is Apache Spark. It's a framework that allows processing large amounts of data in parallel on a cluster of computers.
You can use batch processing, streaming, do machine learning and graph jobs. You usually use Scala, Java, Python or R to write your code. The code is executed in Scala, so it all gets converted to it. For example, in Python you'd use PySpark and that gets written down to its scala equivalent which is then executed.
I mainly work in Python, so I'm going to talk about some features there. But it support dataframes and exposes the data in Spark DataFrames. You build operations and those slowly build a DAG. It's not until you either execute, save or request to see the data that it actually starts executing the DAG after optimizing what it needs.
If you need something that spark doesn't support, you can use regular python, but because it won't get converted to spark, it'll run on only one node and be limited. So you have to rewrite your code optimizing for it.
You can process some data in memory, you can use disk, you can use databases. Either as source or targets.
A use case can be, load the raw data as it comes in, transform the data to your intermediary states, then write out different tables based on what they need to do.
---
It's a framework that has an engine to manage code running on clusters, a language to interact with the data, abstractions and optimizations of the code, ways to store the data, checkpoints for optimizations, and other things.
It probably was an acqui-hire. If the product was growing at a VC investible rate, they wouldn't have sunset-ed the product. Alternatively, may be they are going rebrand it into something that aligns with databricks.
Going to beat a dead horse, but 30 days to migrate your database over ? I hope nobody was seriously using it in production, otherwise it's going to be a fun month for them.
Reluctantly agreeing with you. So.. you can’t trust a small shop because an MBA corp dev team at some enterprise shop is always lurking around the corner. But if you go to the behemoth instead, you can get equally screwed because you don’t mean anything to their bottom line (see exhibit Google). The commercial software “service” industry is really fucked. I don’t want another tech bust, but we sure as hell deserve one.
I feel like we learned 20 years ago that buying proprietary software has a bunch of problems, so we switched to open source software. But in the last 10 years, we started buying software services, and now we have all those problems back again (corporate stability, vendor lock-in, principal agent problems, etc.). Maybe we will learn how to run our own software at some point without fully staffed teams of SREs?
Because along the way the people creating open source software also have bills to pay, and found out that living outside Hotel Mama comes with own set of caveats.
Databricks is very much like Microsoft or Oracle - it is not sold by technical merit but by sales slides for CTOs. It is unfortunate but this will not impact their bottom line at all, because technical people already overwhelmingly don't want Databricks.
Damn, 30 days is quick. I found out about https://neon.tech but then quickly ran into a major bug, and then thankfully found out about bit.io, which is what I use for https://dittoed.app.
Looks like I will have to go back to neon (they fixed the bug).
If anyone has other ideas, I'm all ears. Project is hosted on Cloudflare and they have D1 now, but Dittoed uses a little bit of PostGIS.
Neon sounds good for you. I'd wager any kind of managed database is fine, so the question is if you enjoy the features/cost savings Neon brings. Otherwise I cannot recommend using a managed DB enough because that's the best 20 bucks you're gonna spend.
We’ve been moving our workflows out of Databricks to PostgreSQL to save a ton. Wonder if what they’re going to do with this would have been handy at the time.
same, I'm a paying customer and liked it. I don't have 'production' level demands but was doing lots of prototyping and testing of ideas. Very easy to use and reliable enough to count on.
But I also needed to read their Ansible files to understand how they manage their infra better. Those are deleted now, but luckily you can just look at the history (commit that deleted it: https://github.com/neondatabase/neon/commit/0d3d022eb1fe4a42...)
Yeah I'm not complaining - I mention it because if you're a bit.io customer that wants to migrate to another serverless Postgres solution you won't be able to do it by "just" running the linked helm charts.
(In my opinion) when someone says here's the helm chart I assume running "helm install $THING" would give me a running version of $THING, so it's more so no one has the wrong expectations (like I would)
Neon CEO. We are certainly not going to sunset Neon any time soon. We are extremely well funded and also growing super quickly. Expect some exciting announcements soon!
Neon at least has open-sourced their core offering, which provides a migration path for folks who make bigger bets on their platform. So yeah, there's every possibility they'll go away at some point, but unlike a lot of SaaS offerings, it's all Postgres over the wire and under the hood, so you have plenty of migration options (OSS, another managed Postgres vendor, Aurora, Cloud SQL, etc.)
Neon CEO here. Definitely. Of course Neon storage is a distributed system and you need to know how to run it. But a) we can help b) Percona is a trusted partner of us that can support self hosting for you.
I think there are a lot of different reasons why people may want to use a service like bit.io, but if you want a database with data in it to code against, run tests against, reproduce production related data-bugs, and run e2e tests against then check out https://www.snaplet.dev.
This is interesting... possibly a move by Databricks to try and build on their "data lakehouse" concept to counter the recent "Fabric platform" announcements at MS Build.
Databricks coined the "Delta lake" concept and are still (just about) leading the way, but Fabric has the potential from MS to take away that marketshare. Databricks need to improve their "serverless SQL" offering, and add a serious "data warehouse" component alongside the lake.
Fabric may eat some of the descriptive analytics portion of Databricks’ lunch, but for core data engineering workflows there is nothing in the Fabric—or Synapse or Power BI—ecosystem that comes close.
There are other fatal flaws to the Spark implementation in Synapse that I think carried over to Fabric. Worst one is the clunkiness/inability to run multiple notebooks concurrently on a cluster.
What benefits does one get from using bit.io or other equivalents compared to the AWS built in Aurora? is their offering different and I'm just confused by the jargon?
Is the time it takes to spin up a database really a primary concern for anyone but a hobbyist? Hopefully one would take far more than 10 seconds to address the actual concerns of database work: backups, replication, upgrade procedures, access control, settings tuning, required extensions, etc.
If anything, companies are drowning in a proliferation of siloed datastores and most are highly motivated to fix that situation; the exact opposite concern of "quickly spin up a new database".
"Serverless"...this word is so thrown around nowadays that it lost its original meaning. Same way the phrase "we're like a family" transitioned from a beloved one in 50's to its thrown away in 90's meaningless all the way to today be considered a red flag when you hear such a word at a hiring interview, the "serverless" word is in its late 90's nowadays. One decade and will become just another red flag.
The meaning is pretty clear: you don't manage compute, it scales up elastically based on demand, even all the way to zero. Ideally, it reacts quickly enough to changes in demand that you don't need to worry about it. Serverless is basically the original promise of the cloud.
It's not as clear as you think, because companies are watering it down. Just have a look what "serverless"-branded services AWS published the past years.
Take "OpenSearch Serverless" for example: They claim "you only pay for the resources consumed by the workload", but even if you have an OpenSearch Serverless collection you don't use, you pay at least ~$690/month (and that's not even accounting for stored data)!
Distributed apps. No central server involved. Peer to peer is one. Or each app is a server too and the information propagates in ripple like style. You connect to me, we sync, then another connects to me and this way the info that only you had now he has it too (and me of course). That's the original serverless idea. Not this walled garden crap with "cloud". Cloud is just a computer that is not yours and anything you put in there it's no longer just yours (or in most cases when you lose the account is no longer yours, period! - HN has plenty of horror stories from Google, Amazon, Microsoft that shit on people and call it rain).
> Distributed apps. No central server involved. Peer to peer is one. Or each app is a server too and the information propagates in ripple like style. You connect to me, we sync, then another connects to me and this way the info that only you had now he has it too (and me of course). That's the original serverless idea.
I don't remember those ever being called "serverless". Certainly "peer to peer" or "distributed" have a lot more traction.
well i for one am very happy i never found out about bit.io, which looks amazing and is something i would have used instead of fly.io unmanaged postgres.
Disclaimer: I was a non paying user and used it just to try out some code in dotnet entity framework and postgresql (at $work I only ever get to touch sql server but for hobby projects I thought it would be nice to do something that doesn't require paying Microsoft).
Bit io is awesome.
It just works.
I mean so does elephant but
bitio has more storage.
I never got very far with my learning and never did tadvanced db concepts like cross apply though so it was just simple entities and tables but it worked just fine and the best part, no credit card required on file.
Fly sounds nice but I don't feel so good about having to give them my credit card number...
Been playing with CockroachLabs (CockroachDB Cloud) as a cloud db platform, and relatively happy with my testing so far. It isn't completely pg compatible, and do wish they'd expose a web based query interface with better connection pooling characteristics.
That said, mostly PG compatible data types, indexes and queries, horizontally scalable with pay for what you use, free and reserved tiers.
Are there good examples where an acqui-hire works out for the acquiring company? Seems like the acquiring company's culture is almost always at odds with the company being acquired and it causes the high performing teams they paid dearly for to leave.
I don't understand what a company hopes to gain doing stuff like this as the (long term) incentives don't seem to align.
If they truly don't want the tech - meaning this is a straight-up acqui-hire - then the employees of the acquired company continue to have a job, and ideally some sort of bonus or earn out for staying N months or years. It is a nicer landing than bankruptcy.
The executives of the firm being acquired usually don't come, unless they have some skillset the acquiring company needs. But they (hopefully) get a cash bonus for the successful acquisition.
What does the acquiring company get out of this transaction though? What's the return on investment here if folks end up leaving or are completely checked out during their rest-n-vest? You can spend millions with a high end boutique consulting firm that will most likely be more accountable and productive than an acqui-hire.
Firebase (which I was part of). Dunno if you count it as an acquihire if the product survives but I am pretty sure we did what the big G hoped we would
if you have a specific task at hand, it may be easier to buy a team that works well with each other and has expertise in a certain area to accomplish a specific task. not expecting them all to stay on forever
I understand that. But bit.io gave you a postgres database. Supabase gives you a firebase alternative that just happens to be built on top of postgres.
Supabase gives you a full Postgres database, we position ourselves as a Firebase alternative because we offer a few other bells-and-whistles. The database is just postgres[0] and so it has more compatibility than bit.io offered[1]
I don't mean to take away from getting hired at databricks.
But my understanding is they essentially got hired at Databricks? Maybe got a paycheck to do it?
Meanwhile they shuttered and abandoned the product and all customers.
Is it really the goal to make an mvp and plan to get noticed and acquired vs actually making a product, customers can migrate in a month or not we don't care?
The product they made was literally meant to be a reliable solution. 1 month for all customers to migrate away? Really? That's assuming they see the announcement today too, it would be so easy to miss the email/hacker news post.