Hacker News new | past | comments | ask | show | jobs | submit login
Databricks acquires serverless Postgres vendor bit.io (databricks.com)
220 points by embirico on May 30, 2023 | hide | past | favorite | 117 comments



I haven't heard of either of those companies. I don't even fully understand what Databricks does. But it's clear that they have no problem shutting down a production database offering with 30 days notice, and have the gall to title this action "Investing in the Developer Experience". If this doesn't send a message that you shouldn't trust them with anything important, I don't know what would.


> what Databricks does

It's an ancient African word that means "I am because I can't install Apache Spark".


Just install Apache Spark they said. It will be fun they said.

If you have the money, having a managed Spark instance with a bunch of added features can be a big win for some. There is a lot that goes into Spark maintenance.


It also apparently includes some performance optimizations because they control both the hardware and software. And Delta Lake is pretty cool, and hosted MLFlow integration.


Databricks built a proprietary vectorized accelerator for Spark they call Photon. It's not just that they've tuned OSS Spark especially well.


Back when I was a customer (before Photon was released, also during) they had a very good tuning, in the order of around 2x faster for the workloads we had at the time (very large graph computation and a “simple” filtering)


Databricks is a company by the people that built Spark.

They've extended and their platform does a lot now.


What is Spark?

I assume that’s Apache Spark, which is described as a “ unified analytics engine for large-scale data processing”

Still not clear for me what to use it for :-/


It is Apache Spark. It's a framework that allows processing large amounts of data in parallel on a cluster of computers.

You can use batch processing, streaming, do machine learning and graph jobs. You usually use Scala, Java, Python or R to write your code. The code is executed in Scala, so it all gets converted to it. For example, in Python you'd use PySpark and that gets written down to its scala equivalent which is then executed.

I mainly work in Python, so I'm going to talk about some features there. But it support dataframes and exposes the data in Spark DataFrames. You build operations and those slowly build a DAG. It's not until you either execute, save or request to see the data that it actually starts executing the DAG after optimizing what it needs.

If you need something that spark doesn't support, you can use regular python, but because it won't get converted to spark, it'll run on only one node and be limited. So you have to rewrite your code optimizing for it.

You can process some data in memory, you can use disk, you can use databases. Either as source or targets.

A use case can be, load the raw data as it comes in, transform the data to your intermediary states, then write out different tables based on what they need to do.

---

It's a framework that has an engine to manage code running on clusters, a language to interact with the data, abstractions and optimizations of the code, ways to store the data, checkpoints for optimizations, and other things.


Wow you are right. The blog post doesn't even mention it but the home page https://bit.io/ does.


Slight oversimplification but Apache Spark is basically the "open core" to Databricks' commercial platform.


It probably was an acqui-hire. If the product was growing at a VC investible rate, they wouldn't have sunset-ed the product. Alternatively, may be they are going rebrand it into something that aligns with databricks.


> But it's clear that they have no problem shutting down a production database offering with 30 days notice

Maybe there is no production db left from paying customers?


The homepage suggests otherwise, but who knows: https://bit.io/


> I don't even fully understand what Databricks does

The naming is really confusing. When I brick my console it's broken. I'm not sure I want to brick my data :(


Going to beat a dead horse, but 30 days to migrate your database over ? I hope nobody was seriously using it in production, otherwise it's going to be a fun month for them.


Putting business critical data on a mom and pop service called bit.io was the first mistake.


Reluctantly agreeing with you. So.. you can’t trust a small shop because an MBA corp dev team at some enterprise shop is always lurking around the corner. But if you go to the behemoth instead, you can get equally screwed because you don’t mean anything to their bottom line (see exhibit Google). The commercial software “service” industry is really fucked. I don’t want another tech bust, but we sure as hell deserve one.


I feel like we learned 20 years ago that buying proprietary software has a bunch of problems, so we switched to open source software. But in the last 10 years, we started buying software services, and now we have all those problems back again (corporate stability, vendor lock-in, principal agent problems, etc.). Maybe we will learn how to run our own software at some point without fully staffed teams of SREs?


> Maybe we will learn how to run our own software at some point without fully staffed teams of SREs?

Call them platform engineers?


Because along the way the people creating open source software also have bills to pay, and found out that living outside Hotel Mama comes with own set of caveats.


The real concern is for those who don't get the memo until day 31


> Then, final database exports will be available for download through July 29.

Hope nobody was using bit.io as a set and forget solution...

Which I thought was the entire point of the cloud hosted databases?


> Your databases will continue to work through June 29.

This is crazy. 30 days to migrate? Hope nobody is taking a holiday in the next couple of weeks.


I'm surprised databricks (effectively) is willing to shutter a database service with 1 months notice.

What does that say about their own products? What if you integrate their products and are locked to their platform without any easy migration options?

If they lose interest on one of their own services, you very well may have 1 month to move, and 2 months to have a chance at keeping your data.


Seriously. Well, I guess their customer roster isn't all that impressive. Sounds like they're willing to burn them.


The problem is the signal it sends to Databricks' other customers about how they will be treated in future.


Databricks is very much like Microsoft or Oracle - it is not sold by technical merit but by sales slides for CTOs. It is unfortunate but this will not impact their bottom line at all, because technical people already overwhelmingly don't want Databricks.


Is it effectively Databricks that is shutting down an infrastructure solution on short notice?

I'm not thinking about whatever legal technicalities could be debated by lawyers, but what real-world truth is.


You thought correctly.


Damn, 30 days is quick. I found out about https://neon.tech but then quickly ran into a major bug, and then thankfully found out about bit.io, which is what I use for https://dittoed.app.

Looks like I will have to go back to neon (they fixed the bug).

If anyone has other ideas, I'm all ears. Project is hosted on Cloudflare and they have D1 now, but Dittoed uses a little bit of PostGIS.


Have you tried supabase?


Neon sounds good for you. I'd wager any kind of managed database is fine, so the question is if you enjoy the features/cost savings Neon brings. Otherwise I cannot recommend using a managed DB enough because that's the best 20 bucks you're gonna spend.


AWS RDS, AWS Aurora or Retool database?


We’ve been moving our workflows out of Databricks to PostgreSQL to save a ton. Wonder if what they’re going to do with this would have been handy at the time.


Where is the saving coming from if i may ask ? Are you guys using Databricks offerings or a self managed spark cluster ?


I'm willing to bet they are moving from databricks offerings, considering their pricing is insanity.


If your data fits into Postgres, using Databricks is just plain waste of money.


Anything you'd be missing about Spark from doing so?

Would like to know more about the cost tradeoffs, also. Please elaborate.


Corresponding statement from bit.io here: https://blog.bit.io/whats-next-for-bit-io-joining-databricks...


2 months ago they had a blog post titled "bit.io’s new pricing Always available. Guaranteed performance. No surprises."

Surprise!


I wonder how much money is enough to give the middle finger to all your customers? Really disappointing to see.


That's a bummer, I really liked using bit.io for little experiments. That being said i never paid for it so i can't really complain.


same, I'm a paying customer and liked it. I don't have 'production' level demands but was doing lots of prototyping and testing of ideas. Very easy to use and reliable enough to count on.

Stinky.


Enjoyed using bit.io. Excited to see what Adam, Jmo and crew do at databricks.

Was easy to export my dbs from bit.io -- did so this morning.


Get ready for your bill to go through the roof.


Are there alternatives to the database service with similar API or is it leaving a gap in the market?


Neon is an awesome serverless Postgres: https://neon.tech/


Till the time they are also bought and "sunsetted." That's the problem with all these shiny startups.


Not this one though, they are open source so someone else would start to offer new hosted instances: https://github.com/neondatabase/neon

Some Helm charts: https://github.com/neondatabase/helm-charts

It could potentially be one of their partners:

Vercel https://neon.tech/docs/guides/vercel

Hasura https://neon.tech/docs/guides/hasura


> Some Helm charts: https://github.com/neondatabase/helm-charts

For the record though, they're not enough to run neon today[0] - this has been a "problem" since neon was announced here[1].

[0]: https://github.com/neondatabase/helm-charts/issues/35#issue-...

[1]: https://news.ycombinator.com/item?id=31540691


I found their docker-compose more helpful than their chart: https://github.com/neondatabase/neon/blob/main/docker-compos...

But I also needed to read their Ansible files to understand how they manage their infra better. Those are deleted now, but luckily you can just look at the history (commit that deleted it: https://github.com/neondatabase/neon/commit/0d3d022eb1fe4a42...)


I found installation instructions for Neon including Ansible files here: https://percona.community/labs/serverless-postgresql/docs/in...


That is super helpful and also kinda weird that it's on the percona website and not somewhere in the Neon docs


Yes, you'd have to do some own work to set up a direct competitor from the provided pieces.

They have published a new piece which is how they vertically autoscale Postgres in Kubernetes: https://github.com/neondatabase/autoscaling


Neon CEO here.

Autoscaling with live VM migrations is quite cool. Here is a blog post on it: https://neon.tech/blog/scaling-serverless-postgres

And yes, the code is open feel free to use it!


Yeah I'm not complaining - I mention it because if you're a bit.io customer that wants to migrate to another serverless Postgres solution you won't be able to do it by "just" running the linked helm charts.

(In my opinion) when someone says here's the helm chart I assume running "helm install $THING" would give me a running version of $THING, so it's more so no one has the wrong expectations (like I would)


I keep expecting Vercel to be bought by Sitecore, given how they are replacing all the .NET stuff and pushing Vercel everywhere.


Neon CEO. We are certainly not going to sunset Neon any time soon. We are extremely well funded and also growing super quickly. Expect some exciting announcements soon!


coW? So devs can have their ephemeral databases in seconds at no cost? Only the reads and writes they use and the “delta” storage of their thin clone?

Bitenporal support?

Smart anonymization (like Tonic.AI)?

Looking forward to hearing the announcements


Neon at least has open-sourced their core offering, which provides a migration path for folks who make bigger bets on their platform. So yeah, there's every possibility they'll go away at some point, but unlike a lot of SaaS offerings, it's all Postgres over the wire and under the hood, so you have plenty of migration options (OSS, another managed Postgres vendor, Aurora, Cloud SQL, etc.)


Neon CEO here. Definitely. Of course Neon storage is a distributed system and you need to know how to run it. But a) we can help b) Percona is a trusted partner of us that can support self hosting for you.


Reassuring to know about the collaboration with Percona and that professional support for self-hosting is available if ever needed.


Second Neon, they know what they're doing. It's not their first rodeo.


I don't know about serverless, but it's hard to beat Crunchydata for PostgreSQL these days. They're my goto.


Just make sure you've got your pricing structure & terms negotiated and agreed with them well in advance of putting it into prod.


Render's offering is really good. Backups, read replica, many common extensions included. Fairly cheap. https://render.com/docs/databases


[disclosure: I'm the founder of Snaplet]

I think there are a lot of different reasons why people may want to use a service like bit.io, but if you want a database with data in it to code against, run tests against, reproduce production related data-bugs, and run e2e tests against then check out https://www.snaplet.dev.


Amazon serverless Postgres aurora.


This is interesting... possibly a move by Databricks to try and build on their "data lakehouse" concept to counter the recent "Fabric platform" announcements at MS Build.

Databricks coined the "Delta lake" concept and are still (just about) leading the way, but Fabric has the potential from MS to take away that marketshare. Databricks need to improve their "serverless SQL" offering, and add a serious "data warehouse" component alongside the lake.


Of all the stupid tech terms in the world, for some reason “data lakehouse” grates horribly in my head every time I hear it.


I hope the marketer who came up with it got the lakehouse they were dreaming of.


Fabric may eat some of the descriptive analytics portion of Databricks’ lunch, but for core data engineering workflows there is nothing in the Fabric—or Synapse or Power BI—ecosystem that comes close.

There are other fatal flaws to the Spark implementation in Synapse that I think carried over to Fabric. Worst one is the clunkiness/inability to run multiple notebooks concurrently on a cluster.


I'm perusing the Fabric docs and they are using Delta Lake, Spark and Azure Databricks as part of that solution


Fabric does not use Databricks, but both Databricks and Fabric rely heavily on Delta. Let's just hope that they remain compatible.


Ah, so we're in the "extend" portion of the process


This might be a play against Snowflake Unistore (https://www.snowflake.com/blog/introducing-unistore/)


What benefits does one get from using bit.io or other equivalents compared to the AWS built in Aurora? is their offering different and I'm just confused by the jargon?


It takes < 10 seconds to go from no account to database w/ bit.io


Is the time it takes to spin up a database really a primary concern for anyone but a hobbyist? Hopefully one would take far more than 10 seconds to address the actual concerns of database work: backups, replication, upgrade procedures, access control, settings tuning, required extensions, etc.

If anything, companies are drowning in a proliferation of siloed datastores and most are highly motivated to fix that situation; the exact opposite concern of "quickly spin up a new database".


*took, as its shutting down


Feature or liability depending on the market segment. To a ton of enterprise customers this is a nightmare.


"Serverless"...this word is so thrown around nowadays that it lost its original meaning. Same way the phrase "we're like a family" transitioned from a beloved one in 50's to its thrown away in 90's meaningless all the way to today be considered a red flag when you hear such a word at a hiring interview, the "serverless" word is in its late 90's nowadays. One decade and will become just another red flag.


The meaning is pretty clear: you don't manage compute, it scales up elastically based on demand, even all the way to zero. Ideally, it reacts quickly enough to changes in demand that you don't need to worry about it. Serverless is basically the original promise of the cloud.


It's not as clear as you think, because companies are watering it down. Just have a look what "serverless"-branded services AWS published the past years.

Take "OpenSearch Serverless" for example: They claim "you only pay for the resources consumed by the workload", but even if you have an OpenSearch Serverless collection you don't use, you pay at least ~$690/month (and that's not even accounting for stored data)!

https://aws.amazon.com/opensearch-service/pricing/


What was the original meaning? When I hear "serverless" I think basically:

1. I don't have to think about or manage any servers

2. Usage is metered at a very fine-grained level (per X requests to the API/per GB of data/etc)

3. No fixed cost. You only pay for usage.

Was there a different meaning originally?


Distributed apps. No central server involved. Peer to peer is one. Or each app is a server too and the information propagates in ripple like style. You connect to me, we sync, then another connects to me and this way the info that only you had now he has it too (and me of course). That's the original serverless idea. Not this walled garden crap with "cloud". Cloud is just a computer that is not yours and anything you put in there it's no longer just yours (or in most cases when you lose the account is no longer yours, period! - HN has plenty of horror stories from Google, Amazon, Microsoft that shit on people and call it rain).


> Distributed apps. No central server involved. Peer to peer is one. Or each app is a server too and the information propagates in ripple like style. You connect to me, we sync, then another connects to me and this way the info that only you had now he has it too (and me of course). That's the original serverless idea.

I don't remember those ever being called "serverless". Certainly "peer to peer" or "distributed" have a lot more traction.


well i for one am very happy i never found out about bit.io, which looks amazing and is something i would have used instead of fly.io unmanaged postgres.


Disclaimer: I was a non paying user and used it just to try out some code in dotnet entity framework and postgresql (at $work I only ever get to touch sql server but for hobby projects I thought it would be nice to do something that doesn't require paying Microsoft).

Bit io is awesome. It just works. I mean so does elephant but bitio has more storage. I never got very far with my learning and never did tadvanced db concepts like cross apply though so it was just simple entities and tables but it worked just fine and the best part, no credit card required on file.

Fly sounds nice but I don't feel so good about having to give them my credit card number...


Been playing with CockroachLabs (CockroachDB Cloud) as a cloud db platform, and relatively happy with my testing so far. It isn't completely pg compatible, and do wish they'd expose a web based query interface with better connection pooling characteristics.

That said, mostly PG compatible data types, indexes and queries, horizontally scalable with pay for what you use, free and reserved tiers.


Is this an acqui-hire?


Assume any acquisition without public terms done over blog post is an acqui-hire. But given the market that's still an accomplishment!


Are there good examples where an acqui-hire works out for the acquiring company? Seems like the acquiring company's culture is almost always at odds with the company being acquired and it causes the high performing teams they paid dearly for to leave.

I don't understand what a company hopes to gain doing stuff like this as the (long term) incentives don't seem to align.


If they truly don't want the tech - meaning this is a straight-up acqui-hire - then the employees of the acquired company continue to have a job, and ideally some sort of bonus or earn out for staying N months or years. It is a nicer landing than bankruptcy.

The executives of the firm being acquired usually don't come, unless they have some skillset the acquiring company needs. But they (hopefully) get a cash bonus for the successful acquisition.

Everything is negotiable of course.


The only way I have seen this work out is to give the aqui-hired team a ton of equity in the new company so that they don't jump ship immediately.


What does the acquiring company get out of this transaction though? What's the return on investment here if folks end up leaving or are completely checked out during their rest-n-vest? You can spend millions with a high end boutique consulting firm that will most likely be more accountable and productive than an acqui-hire.


Firebase (which I was part of). Dunno if you count it as an acquihire if the product survives but I am pretty sure we did what the big G hoped we would


if you have a specific task at hand, it may be easier to buy a team that works well with each other and has expertise in a certain area to accomplish a specific task. not expecting them all to stay on forever


It's fairly common to not disclose the terms publicly.


bit.io is shutting down its service and telling all its customers to find a new solution, so very likely, yes. https://bit.io/


How was bit.io different from say supabase


It's an actual database.



I understand that. But bit.io gave you a postgres database. Supabase gives you a firebase alternative that just happens to be built on top of postgres.

- https://supabase.com/docs/guides/getting-started/features

- https://docs.bit.io/docs/getting-started


(supabase ceo)

Supabase gives you a full Postgres database, we position ourselves as a Firebase alternative because we offer a few other bells-and-whistles. The database is just postgres[0] and so it has more compatibility than bit.io offered[1]

[0] https://github.com/supabase/postgres

[1] bit.io compatability: https://docs.bit.io/docs/supported-sql


Just dug more and I see it.

Foranyone looking, their documentation on how to connect, https://supabase.com/docs/guides/database/connecting-to-post...


Also to piggyback on this, supabase deactivates unused (unpaid) instances just like planetscale does for MySQL.


Congrats to Adam and the bit.io team!


I don't mean to take away from getting hired at databricks.

But my understanding is they essentially got hired at Databricks? Maybe got a paycheck to do it?

Meanwhile they shuttered and abandoned the product and all customers.

Is it really the goal to make an mvp and plan to get noticed and acquired vs actually making a product, customers can migrate in a month or not we don't care?

The product they made was literally meant to be a reliable solution. 1 month for all customers to migrate away? Really? That's assuming they see the announcement today too, it would be so easy to miss the email/hacker news post.


Well, it is going to be incredible journey for customers now.



Is there a class of problems where Databricks Spark might be the best perf/cost solution?


Aha consolidation in the space

I think I've seen this before


i might have been the only person using bit.io and saw the potential there. applied for a job but got rejected. meh


Congrats Adam, Jmo & team!


a great and incredible journey it has been.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: