MongoDB acquires Voyage AI

jamesrr39 · 2025-02-24T21:34:40 1740432880

Genuine question: I appreciate the comments about MongoDB being much better than it was 10 years ago; but Postgres is also much better today than then as well. What situations is Mongo better than Postgres? Why choose Mongo in 2025?

rudolph9 · 2025-02-25T01:45:26 1740447926

Don’t choose Mongo. It does everything and nothing well. It’s a weird bastard of a database—easily adopted, yet hard to get rid of. One day, you look in the mirror and ask yourself: why am I forking over hundreds of thousands of dollars for tens of thousands' worth of compute and storage to a company with a great business operation but a terrible engineering operation, continually weighed down by the unachievable business requirement of being everything to everyone?

g7r · 2025-02-25T07:02:22 1740466942

I have experience using both MongoDB and PostgreSQL. While pretty much spoken here is true, there is one more scalability aspect. When a fast moving team builds its service, it tends to not care about scalability. And in PostgreSQL there are much much more features that prevent future scalability. It's so easy to use them when your DB cluster is young and small. It's so easy to wire them up into the service's DNA.

In MongoDB the situation is different. You have to deal with the bare minimum of a database. But in return your data design has much higher horizontal scalability survivability.

In the initial phase of your startup, choose MongoDB. It's easier to start and evolve in earlier stages. And later on, if you feel the need and have resources to scale PostgreSQL, move your data there.

beAbU · 2025-02-24T22:31:59 1740436319

Mongo is Web scale.

leowoo91 · 2025-02-24T23:36:08 1740440168

instagram use postgresql and still web-scale (unless this was satire)

terafo · 2025-02-24T23:40:15 1740440415

https://www.youtube.com/watch?v=b2F-DItXtZs

plasma_beam · 2025-02-25T02:13:22 1740449602

I have not watched that video since 2013 (wow!) and it is still hilarious.

riku_iki · 2025-02-25T18:17:41 1740507461

they obviously didn't use vanilla postgres, but built some custom sharding on top, which is untrivial task (implementation and maintenance(resharding, failover, replication, etc)).

koakuma-chan · 2025-02-24T22:38:08 1740436688

Choose Mongo if you need web scale.

threeseed · 2025-02-24T22:06:40 1740434800

a) MongoDB has built-in, supported, proven scalability and high availability features. PostgreSQL does not. If it wasn't for cloud offerings like AWS Aurora providing them no company would even bother with PostgreSQL at all. It's 2025 these features are not-negotiable for most use cases.

b) MongoDB does one thing well. JSON documents. If your domain model is built around that then nothing is faster. Seriously nothing. You can do tuple updates on complex structures at speeds that cripple PostgreSQL in seconds.

c) Nobody who is architecting systems ever thinks this way. It is never MongoDB or PostgreSQL. They specialise in different things and have different strengths. It is far more common to see both deployed.

delusional · 2025-02-24T22:29:02 1740436142

> It's 2025 these features are not-negotiable for most use cases.

Excuse me? I do enterprise apps, along with most of the developers I know. We run like 100 transactions per second and can easily survive hours of planned downtime.

It's 2025, computers are really fast. I barely need a database, but ACID makes transaction processing so much easier.

redwood · 2025-02-24T23:19:34 1740439174

MongoDB has had ACID transactions for many years. I encourage folks to at least read up on the topic they are claiming to have expertise in

theamk · 2025-02-25T18:06:36 1740506796

They failed every single Jepsen test, including the last one [0]

granted, the failures were pretty minor, especially compared to previous reports (like the first one [1], that was a fun read), but they still had bad defaults back then (and maybe still do)

I would not trust anything MongoDB says without independent confirmation

[0] https://jepsen.io/analyses/mongodb-4.2.6

[1] https://aphyr.com/posts/284-call-me-maybe-mongodb

Tanjreeve · 2025-02-25T07:36:41 1740469001

Reputation matters. If someone comes to market with a shoddy product or missing features/slideware then it's a self created problem that people don't check the product release logs every week for the next few years waiting for them rectifying it. And even once there is an announcement people are perfectly entitled to have scepticism that it isn't a smoke and mirrors feature and not spend hours doing their own due diligence. Again self created problem.

winrid · 2025-02-25T10:20:44 1740478844

Last I checked they still didn't even implement pagination on their blog properly

touche_bag · 2025-02-25T11:12:11 1740481931

100? I had a customer with 10k upserts incl merge logic for the upserts while serving 100k concurrent reads. Good luck doing that with a SQL database trying to check constraints across 10 tables. This is what Nosql databases are optimized for... There's some stand-out examples of companies scaling even mysql to ridiculous sizes. But generally speaking, relational databases don't do a great job at synchronous/transactional replication and scalability. That's the trade off you make for having schema checks and whatnot in place.

delusional · 2025-02-25T17:32:26 1740504746

I guess I didn't make myself clear. The number was supposed to be trivially low. The point was that "high performance" is like the least important factor when deciding on technology in my context.

scosman · 2025-02-24T22:25:46 1740435946

A) Postgres easily scales to billions of rows without breaking a sweat. After that shard. It’s definitely negotiable.

threeseed · 2025-02-24T22:29:31 1740436171

So does a text file.

Statements like yours are meaningless when you aren't specific about the operations, schema, access patterns etc.

If you have a single server, relational use case then PostgreSQL is great. But like all technology it's not great at everything.

scosman · 2025-02-24T22:42:56 1740436976

The use a text file.

In all seriousness, calling Postgres’ scalability “not-negotiable for most use cases” is wild.

threeseed · 2025-02-24T23:10:34 1740438634

What's wild is you misrepresenting what I said which was:

"built-in, supported, proven scalability and high availability"

PostgreSQL does not have any of this. It's only good for a single server instance which isn't really enough in a cloud world where instances are largely ephemeral.

tristan957 · 2025-02-25T00:22:09 1740442929

Do you mean ephemeral clients or Postgres servers?

g8oz · 2025-02-24T23:58:14 1740441494

If multiple nodes are needed, then why MongoDB and not a Postgres compatible distributed product like CockroachDB or YugabyteDB?

jamesrr39 · 2025-02-25T21:49:37 1740520177

Thanks for these comments, I appreciate it.

Although I would point out:

> scalability [...] no company would even bother with PostgreSQL at all

In my experience, you can get pretty far with Postgresql on a beefy server, and when combined with monitoring, pg_stat_statements and application level caching (e.g. the user for this given request, instead of fetching that data on every layer of the request handling), certainly enough most businesses/organisations out there.

jeremycarter · 2025-02-24T22:10:38 1740435038

Great response. All arguments are valid and fair.

riku_iki · 2025-02-24T21:52:21 1740433941

Mongo is real distributed and scalable DB, while postgres is single server DB, so main consideration could be if you need to scale beyond single server.

throw14082020 · 2025-02-24T21:55:26 1740434126

Ahhh, this sounds familiar! https://www.youtube.com/watch?v=b2F-DItXtZs

riku_iki · 2025-02-24T23:32:30 1740439950

things still can be true, even if being wrapped into meme videos by haters..

itake · 2025-02-25T07:06:30 1740467190

Postgres has replicas? Most people use those for reads and a master writes.

Tostino · 2025-02-25T21:15:29 1740518129

This can take you really damn far.

I've been playing with CloudNativePG recently and adding replicas is easy as can be, they automatically sync up and join the cluster without you thinking about it.

Way nicer than the bare-vm ansible setup I used at my last company.

codr7 · 2025-02-24T23:29:54 1740439794

Calling MongoDB a real database compared to PostgreSQL is hilarious.

MongDB is basically a pile of JSON in comparison, no matter how much you distribute and scale it.

riku_iki · 2025-02-24T23:35:10 1740440110

I think there is no distributed db on the market available with features parity to PgSQL. Distributed systems are hard, and sacrifices need to be made.

jwr · 2025-02-25T02:57:32 1740452252

sigh

See https://jepsen.io/analyses for how MongoDB has a tradition of incorrect claims and losing your data.

Distributed databases are not easy. Just saying "it is web scale" doesn't make it so.

riku_iki · 2025-02-25T06:07:52 1740463672

Are you aware:

1. That PgSQL also has issues in jepsen tests?

2. of any distributed DB which doesn't have jepsen issues?

3. It is configurable behavior for MongoDB: can it lose data and work fast, or work slower and do not lose data. There is no issues of unintentional data loss in most recent(5yo) jepsen report for MongoDB.

jwr · 2025-02-25T06:46:44 1740466004

Distributed databases are not easy. You can't simplify everything down to "has issues". Yes, I did read most Jepsen reports in detail, and struggled to understand everything.

Your second point seems to imply that everything has issues, so using MongoDB is fine. But there are various kinds of problems. Take a look at the report for RethinkDB, for example, and compare the issues found there to the MongoDB problems.

riku_iki · 2025-02-25T06:54:52 1740466492

> Take a look at the report for RethinkDB

RethinkDB doesn't support cross document transactions, problem solved lol

theamk · 2025-02-25T18:17:51 1740507471

PgSQL only defect was anomaly in reads which caused transaction results to appear a tiny bit later, and they even mentioned that it is allowed by standards. No data loss of any kind.

MongoDB defects were, let's say, somewhat more severe

[2.4.3] "In this post, we’ll see MongoDB drop a phenomenal amount of data."

[2.6.7] "Mongo’s consistency model is broken by design: not only can “strictly consistent” reads see stale versions of documents, but they can also return garbage data from writes that never should have occurred. [...] almost all write concern levels allow data loss.

[3.6.4] "with MongoDB’s default consistency levels, CC sessions fail to provide the claimed invariants"

[4.2.6] "even at the strongest levels of read and write concern, it failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations"

let's not pretend that Mongo is a reliable database please. Fast? likely. But if you value your data, don't use it.

riku_iki · 2025-02-25T20:30:01 1740515401

In attempt to understand your motives in this discussion, I would like to ask question:

* why you are referring on 12yo reports for very early MongoDB version?

jwr · 2025-02-27T01:54:54 1740621294

This discussion refers to the entire history of MongoDB reports, which shows a lack of care about losing data.

If you wish to have a more recent MongoDB report, Jepsen is available for hire, from what I understand.

riku_iki · 2025-02-27T02:07:50 1740622070

No, discussion started with question "Why choose Mongo in 2025?" So, old jepsen reports are irrelevant, and most recent one from 2020 is somehow relevant.

threeseed · 2025-02-24T22:13:15 1740435195

High availability is more important than scalability for most.

On average an AWS availability zone tends to suffer at least one failure a year. Some are disclosed. Many are not. And so that database you are running on a single instance will die.

Question is do you want to do something about it or just suffer the outage.

riku_iki · 2025-02-25T00:42:19 1740444139

I think major providers provide PG service with cross zone availability through replication.

amazingamazing · 2025-02-24T22:30:08 1740436208

It's sad that this was downvoted. It's literally true. MongoDB vs. vanilla Postgres is not in Postgres' favor with respect to horizontal scaling. It's the same situation with Postgres vs. MySQL.

That being said there are plenty of ways to shard Postgres that are free, e.g. Citus. It's also questionable whether many need sharding. You can go a long way with simply a replica.

Postgres also has plenty of its own strengths. For one, you can get a managed solution without being locked into MongoDB the company.

threeseed · 2025-02-24T22:41:14 1740436874

Citus is owned by Microsoft.

And history has not been nice to startups like this continuing their products over the long term.

It's why unless it is built-in and supported it's not feasible for most to depend on it.

amazingamazing · 2025-02-24T22:48:41 1740437321

that's fair, but that's true of mongodb itself too. I wouldn't count that against either of them.

threeseed · 2025-02-24T23:07:50 1740438470

MongoDB makes money selling and supporting MongoDB.

Microsoft does not make money supporting Citus.

999900000999 · 2025-02-24T21:52:21 1740433941

Simple.

Postgres is hard, you have to learn SQL. SQL is hard and mean.

Mongo means we can just dump everyone into a magic box and worry about it later.No tables to create.

But their is little time, we need to ship our CRUD APP NOW! No one on the team knows SQL!

I'm actually using Postgres via Supabase for my current project, but I would probably never use straight up Postgres.

codr7 · 2025-02-24T23:34:28 1740440068

If learning SQL is hard, maybe software isn't the best choice of career.

Writing code and creating good software requires a lot of mental clarity and effort; that fact is never going to change, not even with AI.

999900000999 · 2025-02-25T00:20:31 1740442831

Billions upon billions of value have been created just upon the premise that SQL is hard.

Firebase by and almost every NoSql technology is based upon this.

SEJeff · 2025-02-24T21:58:11 1740434291

Postgres supports JSONB natively. It literally speaks mongo line protocol and you can shove unstructured json into it.

It has supported this since 9.4: https://www.postgresql.org/docs/current/datatype-json.html

999900000999 · 2025-02-24T22:10:38 1740435038

I don't necessarily agree with the above justifications, but in my experience this is basically why teams pick Mongo.

It's easier to get started with.

codr7 · 2025-02-24T23:31:22 1740439882

Now there's a truth about MongoDB, it's easy to get started with.

But why is that the top priority?

FridgeSeal · 2025-02-24T23:34:56 1740440096

Because some devs and teams prioritise “get to prod” above literally all else.

Maintainability? Secondary. Security? Secondary. Data-integrity/correctness? Secondary.

SEJeff · 2025-02-24T23:22:22 1740439342

It’s hard to disagree with you on that part. PG is definitely not free to get starts with and requires a bit of setup (hello pg_hba.conf).

winrid · 2025-02-25T10:22:48 1740478968

Yes but updating nested fields is last write wins, and with mongo you could update two fields separately and have the writes succeed, it's not equivalent.

SEJeff · 2025-02-26T05:49:13 1740548953

Can you provide an example or documentation please?

winrid · 2025-02-26T23:34:02 1740612842

When you write to a postgres jsonb field it updates the entire JSONB content, because that's how postgres's engine works. Mongo allows you to $set two fields on the same document at the same time, for example, and have both writes win, which is very useful and removes distributed locks etc. This is just like updating specific table columns on postgres, but postgres doesn't allow that within columns, you'd have to lock the row for updating to do this safely which is a PITA.

chpatrick · 2025-02-24T21:58:47 1740434327

Even as a JSON document store I'd rather use postgres with a jsonb column.

tiltowait · 2025-02-25T01:46:56 1740448016

Why is that? I found Postgres's JSONB a pill to work with beyond trivial SELECTs, and even those were less ergonomic than Mongo.

chpatrick · 2025-02-25T11:52:10 1740484330

Because you get the convenience of having a document store with a schema defined outside of the DB if you want it, along with the strong guarantees and semantics of SQL.

chpatrick · 2025-02-25T18:58:21 1740509901

For example: let's say you had a CRM. You want to use foreign keys, transactions, all the classic SQL stuff to manage who can edit a post, when it was made, and other important metadata. But the hierarchical stuff representing the actual post is stored in JSON and interpreted by the backend.

cryptonector · 2025-02-25T01:00:02 1740445202

I thought this was sarcasm till the last sentence. Now I'm not sure.

connectsnk · 2025-02-24T17:47:09 1740419229

I understand the criticisms, but in my experience, MongoDB has come a long way. Many of the earlier issues people mention have been addressed. Features like sharding, built-in replication, and flexible schemas have made scaling large datasets much smoother for me. It’s not perfect, but it’s a solid choice.

beoberha · 2025-02-24T21:11:21 1740431481

I think the amount of people working on large enterprise systems here is a lot smaller than one would think.

Whenever a fly.io post about sqlite ends up in here, there are a scary amount of comments about using sqlite in way more scenarios than it should be.

connectsnk · 2025-02-24T22:00:51 1740434451

True. I have that feeling many times that the enterprise crowd doesnt visits hacker news.

dkjaudyeqooe · 2025-02-25T01:57:10 1740448630

Why would they be here? They use Oracle.

But mainly because management hasn't worked out how to cancel their licenses without breaking their budgets.

dkjaudyeqooe · 2025-02-25T01:54:35 1740448475

"Should be" how?

SQLite is a lean and clean tool, it's very much a candidate for being inserted into all manner of contexts.

What beggars belief is the overly complicated, inefficient, rats nests of trendy software that developers actually string together to get things done, totally unaware of how they are implemented or meant to work.

By comparison using SQLite outside of its "blessed (by who?) use cases" is very practical.

koakuma-chan · 2025-02-24T22:52:11 1740437531

Why would I use anything other than sqlite?

margalabargala · 2025-02-24T23:16:41 1740439001

Easy. Sometimes it's more than you need, and there's no reason to use sqlite when you can just write things to a flat text file that you can `grep` against.

dkjaudyeqooe · 2025-02-25T01:58:35 1740448715

Is this text file static? If not, does all grepping stop when you're updating the file?

koakuma-chan · 2025-02-25T01:49:25 1740448165

Damn good point! didn't think about that!

winrid · 2025-02-25T10:23:28 1740479008

Query engine is not as good.

codr7 · 2025-02-24T23:32:58 1740439978

It's still an unstructured blob of JSON, systems built with it are a major pita to maintain.

rich_stable · 2025-02-25T04:20:33 1740457233

For context, I am a startup founder, an Atlas user, and not what anyone would call a "major account". I'm also in my early 30's, so I wasn't around for the whole "web scale" meme era of MongoDB.

I have personally been incredibly impressed with the way MongoDB has "looked out" for my company over the past year. I'll try to be satisfactorily specific their with privacy in mind, so this may come out a bit fuzzy. Their technical teams, both overseas and US, have produced some of the most thorough, detailed recommendations I have ever seen and their communication/followup was excellent. I've run many designs and ideas by their team, out of habit at this point, and have always been pleased with the response. They remember who I am. All of this is really unusual for a company at my growth stage.

My use case requires full technical depth on text searching and vectorization; I use every aspect of Atlas Search that is available. A downside of building "bleeding edge" is that my tooling needs always seem to be just inches beyond what is available, so just about every release seems to have something that is "for me." It's hard to say if my feedback specifically has an impact on their roadmap - but they really do seem to build things I want. I think they reported ~50% better performance on bulkWrite() in the 8 release, but it was closer to 500% for my use case.

Speaking of, this acquisition is like providence for me, because I've shared my various solutions with them for synchronously vectorizing "stuff" for use with local LLMs. It's a reasonably hard technical problem without a lot of consensus on standards; I think a lot of people believe there are standards, yet any discussion will quickly devolve into something like the "postgres/mongo" conversations you see here (I won't be visiting that topic).

I strongly agree with the "this should be a database level feature" approach they are taking here; that's certainly how my brain wants to think about it and currently I have to do a lot of "glue"-ing to make it work the way I require.

I hope they win.

hodgesrm · 2025-02-25T04:29:37 1740457777

So basically you see this as helping to get working vector search in MongoDB? It sounds as if the attraction, then, is that it integrates easily with your existing Atlas usage. Or is there more?

rich_stable · 2025-02-25T04:59:10 1740459550

There is more. They explain it better than I could in the roadmap links.

kaycebasques · 2025-02-24T17:06:46 1740416806

Bloomberg says it was a $220M cash & stock deal: https://www.bloomberg.com/news/articles/2025-02-24/mongodb-b...

hartator · 2025-02-24T17:05:53 1740416753

I rather them focus on performance.

Last MongoDB is still slower than MongoDB 3.4. An almost 10-year old release. For both reads and writes.

memco · 2025-02-24T17:45:31 1740419131

Can you share more details about the conditions under which it is slow in recent versions? We moved from 3.x to 7 for our main database and after adding a few indexes we were missing we have seen at least an order of magnitude speed up.

hartator · 2025-02-24T21:53:41 1740434021

Most regular inserts and regular selects: https://medium.com/serpapi/mongodb-benchmark-3-4-vs-4-4-vs-5...

We have internally a benchmark with MongoDB 8.x, but same pattern of disappointing results.

winrid · 2025-02-25T10:26:50 1740479210

As someone that has ran every version from 3.2 to 8 on small nodes and large clusters (~100+ nodes)...

8 is waaay faster in the real world. It's not really comparable. Your micro benchmark is comparing the few nanoseconds of the heavier query planner, but in the real world that query planner gives real benefits. Not to mention aggregations, memory management improvements, and improvements when your working set size is very large/larger than memory.

hartator · 2025-02-25T15:47:08 1740498428

Can you share some data about this?

Here's another dataset about performance regression doing `$inc`s as fast as possible on the same object.

Mongo 3.4.24:

    332,037 stats update in 100s. (3,321 stats updates per s)

Mongo 8.0.4:

    287,553 stats update in 100s. (2,876 stats updates per s)

(higher is better)

memco · 2025-02-25T17:34:33 1740504873

Thanks for the data! I think I may have different use cases than are covered by your benchmarks.

Do you often do that many independent $incs (or any query) in a single second? I have gotten much better performance by using `BulkWrite` to do a bunch of small updates in a batch.

To go to a specific example from the "Driver Benchmark" on the link from your first reply:

   client[:users].insert_one(name: Digest::MD5.hexdigest(index.to_s))

I notice in this specific example that there's no separation of the hashing from the query timing. so I might try to do the hashing first then time just the inserts. I would also a batch of `insertOne`s and then do a bulk write so I'm making much fewer queries. I will often pick some random size like 1,000 queries or so and do the `bulkWrite ` when I have accumulated that many queries, have surpassed some time (like if it has been more than 0.5s since the last update) or if there's no more items to process. Additionally if the order of the inserts doesn't matter using `ordered: false` can provide additional speedup.

For me the limiting factor is mostly around the performance of BulkWrite. I haven't hit any performance bottlenecks there that would merit benchmarking different ways to use it, but I would mostly be trying to fine tune things like how to group the items in a BulkWrite for optimal performance if I did.

Even in the case of one-off queries it almost always feels faster on 7+ than earlier versions. As I mentioned the one bottleneck we hit with migration was that we had some queries where we were querying on fields that were not properly indexed and in those cases performance tanked horribly to the point where some queries actually stopped working. However, once we added an index the queries were always faster than on the old version. When we did hit problems, it took only a few minutes to figure out what to index then everything was fine. We didn't have to make changes to our application or the queries themselves to fix any issues we had.

winrid · 2025-02-25T18:23:22 1740507802

Again this microbenchark is useless. Don't pick databases this way. This is not the kind of operation you should be worrying about optimizing, it's not usually the bottleneck or what is slow.

Setup a clone of prod and build a tool to replay your traffic to it.

I have lots of data from datadog and ops manager but not going to take the time to publish ATM.

I just moved a 4tb deployment from 3.2 to 7. It cut max query time by about half. I actually went to instances with half the cpus, too (although I switched from ebs to ssds).

hartator · 2025-02-25T23:47:51 1740527271

> Again this microbenchark is useless. Don't pick databases this way. This is not the kind of operation you should be worrying about optimizing, it's not usually the bottleneck or what is slow

It was for us. API calls that need to aggregate stats on same ID. We found a way around, but it would not have been an issue if MongoDB 8 was like 2x faster.

> I just moved a 4tb deployment from 3.2 to 7. It cut max query time by about half. I actually went to instances with half the cpus, too (although I switched from ebs to ssds).

Just single-core performance improvement in the last 10-year might explain your outperformance.

winrid · 2025-02-26T07:38:28 1740555508

> Just single-core performance improvement in the last 10-year might explain your outperformance.

Nope, after migration max query time was still over a minute in some cases. What makes the biggest difference is performance tuning. After a week or so of index tuning, I got max index time below 6s. If Mongo makes each query take 2ms instead of 1ms, it literally doesn't matter to that customer or their customers, since it's just noise at that point. The old instances were M5s, so not that old.

The point is that the few nanoseconds difference you're measuring is not what you spend the most time on, usually.

Also you mentioned write performance. If you set journal commit interval to 500ms or something, then you can easily beat the old 3.2 write speeds, since if you're using 3.2 you probably don't care that much about journal commit intervals anyway ^_^

touche_bag · 2025-02-24T19:17:29 1740424649

I think 8 was a release purely focused on performance, with some big improvements. Comparing 3.4 is kinda unfair.. You were fast with the tradeoff of half your data missing half the time

hartator · 2025-02-24T21:55:12 1740434112

That might explain the write performance degradation, but not the reads.

touche_bag · 2025-02-25T10:59:13 1740481153

Consistent reads also mean additional checks. I think the WT cache is also very double edged. Completely cripples the database when resource constrained and not sized correctly, while giving a huge boost when the environment is "right sized". Honestly, give 8.0 and tell me how it compares. Haven't touched ancient Mongo versions in a long time, so I have little intuition on how it compares.

amazingamazing · 2025-02-24T17:07:50 1740416870

mongodb had consistency issues before v5 if I recall, so take that for what it's worth.

moralestapia · 2025-02-24T18:08:40 1740420520

10x exit in a couple years, quite nice on the VC side!

On the tech side ... no idea what Mongo's plan is ... their embedding model is not SOTA, does not even outperform the open ones out there, and reranking is a dead end in 2025.

I think the value is on Voyage's team, their user base and having a vision that aligned with Mongo's.

Congrats!

hweller · 2025-02-24T19:27:16 1740425236

>their embedding model is not SOTA, does not even outperform the open ones out there, and reranking is a dead end in 2025.

Are you referring to the MTEB leaderboard? It's widely believed many of those test datasets are considered during the training of most open-source text embedding models, hence why you see novel + private benchmarks discussed in many launch blogs that don't exclusively refer to MTEB. There are problems there, and it would be great to see more folks in the search benchmark dataset production space like what Marqo AI has done in recent months.

Also what makes you say reranking is dead? Mongo doesn't provide it out of the box but many other search providers like ES, Pinecone, Opensearch do so it must provide some value to their customers? Maybe you're saying it's overrated in terms of how many apps actually need it?

disclosure: I work on vector search at Mongo

moralestapia · 2025-02-24T19:31:25 1740425485

>Maybe you're saying it's overrated in terms of how many apps actually need it?

Yes, my comment leans more towards that, rather than suggesting is useless.

redwood · 2025-02-24T20:08:51 1740427731

Taking a step back, accuracy/quality of retrieval is critical as input to anything generated b/c your generated output is only as good as your input. And right now folks are struggling to adopt generative use cases due to risk and fear of how to control outputs. Therefore I think this could be bigger than you think.

touche_bag · 2025-02-24T19:15:27 1740424527

Interesting take. Have you benchmarked models on your own data? Cause at this point everything is contaminated so I find it impossible to tell what proper sota is. Also - most folks still just use openai. Last time I checked, reranking always performs better than pure vector search. And to my knowledge it's still the superior fusion method for keyword and vector results.

moralestapia · 2025-02-24T19:29:52 1740425392

In my experience, storing RAG chunks with a little bit of context helps a lot when doing the retrieval, then you can skip the whole "rerank" bit and halve your cost and latency.

With embedding/generative models becoming better with time, the need for a rerank step will be optimized away.

touche_bag · 2025-02-25T11:05:20 1740481520

Huh? Rerank is always a boost on top of retrieval. So regardless of the chunking method or model you use, reranking with a good model will always result in higher MRR. And improvements in embedding models also will never solve the problem of merging lexical and vector search results. Rank/score fusion are flawed since both are hardly comparable and boosting only works sometimes. Whereas rerankers generally do a pretty good job at this. Performance is indeed the biggest issue here. Rerankers are slow as hell and simply not feasible for some use cases.

serjester · 2025-02-25T04:20:32 1740457232

We've benchmarked a ton of the open models and voyage dramatically outperforms them. I think MTEB is a bad benchmark.

markus_zhang · 2025-02-24T17:31:26 1740418286

Looks like everyone is jumping into the AI game. Is there a bubble?

codr7 · 2025-02-24T23:38:49 1740440329

Whatever respect I had left for MongoDB just went out the window, the last thing I want in my database is AI.

eudoxus · 2025-02-26T00:53:18 1740531198

Aside from the MongoDB of it all, wheres the issue with adding "AI" here? As I understand it this is just vector types, similarity searches, embedding indexes, and RAG capabilities.

All of which are just data storage/retrieval mechanics and custom types. This isn't adding some omnipotent AI agent to run/manage/optimize your DB or otherwise turn it into some blackbox gizmo.

codr7 · 2025-02-26T21:04:36 1740603876

Oh I'm pretty sure that will be the next step, given the direction we're moving in and the lack of common sense and responsibility on display.

I see GenAI as a stop gap solution at best, not really optimal for any problems; and AGI is a major distraction from finding good solutions to important problems.

The wild goose chase to apply GenAI to everything has serious consequences.

People are so excited about the fact that a computer can sort of drive a car that they don't even stop to consider that a human driver that randomly fails the same way would never get a license, and rightly so.

So excited about the fact that a computer can sort of write functional code that they don't stop to consider that any human developer that fails randomly the same way would never get a job, and rightly so.

We're already applying it to weapons/warfare, which is obviously a very bad idea.

I'm sure the technology will improve, but never to the point where it's reliable. It will fail randomly less often, but the magnitude of its failures isn't going anywhere.

riku_iki · 2025-02-25T21:48:09 1740520089

Its for search/ranking functionality, not necessary GenAI.

whalesalad · 2025-02-25T03:55:04 1740455704

lol dude where have you been

infecto · 2025-02-24T16:40:52 1740415252

Only skimmed through the release..I hope they continue supporting the API but it comes with a little higher confidence that the company behind it is not collecting all your data. Voyage has some interesting embedding models that I have been hesitant to fully utilize due to the lack of confidence in the startup behind it.

kaycebasques · 2025-02-24T17:16:57 1740417417

This blog post outlines the new roadmap: https://www.mongodb.com/blog/post/redefining-database-ai-why...

__jl__ · 2025-02-24T17:20:51 1740417651

They commit to supporting the API in step 1 but it's not entirely clear to me whether that commitment continues with step 2-3...

schnebbau · 2025-02-24T16:19:56 1740413996

How does MongoDB still have that much available to spend? Everyone I know moved off it years ago.

Cshelton · 2025-02-24T16:35:11 1740414911

We use it a lot for a specific use-case and it works great. Mongo has come a long long way since the release over a decade ago, and if you keep it in Majority Read and Write, it's very reliable.

Also, on some things, it allows us to pivot much faster. And now with the help of LLMs, writing "Aggregation Pipelines" are very fast.

codr7 · 2025-02-24T17:05:17 1740416717

Pretending a pile of json is a database is great for pivoting, not so great for anything else.

Maintaining apps built on MongoDB is soul killing.

rudolph9 · 2025-02-25T01:54:39 1740448479

burningion · 2025-02-24T17:03:19 1740416599

I've been using Mongo while developing some analysis / retrieval systems around video, and this is the correct answer. Aggregation pipelines allow me to do really powerful search around amorphous / changing data. Adding a way to automatically update / recalculate embeddings to your database makes even more sense.

magarnicle · 2025-02-24T21:22:02 1740432122

Do you have any tricks for writing and debugging pipelines? I feel like there are so many little hiccups that I spend ages figuring out if that one field name needs a $ or not.

isoprophlex · 2025-02-24T16:24:39 1740414279

Pretty sure they achieved fiscal nirvana by exploiting enterprise brain rot. You hook em, they accumulate tech debt for years, all their devs leave, now they can't move away & you can start increasing prices. Eventually the empty husk will topple over but that's still years away.

dimgl · 2025-02-24T16:29:30 1740414570

Is it possible that they simply have a good product?

vosper · 2025-02-24T16:54:44 1740416084

They do have a good product, but "they accumulate tech debt for years, all their devs leave, now they can't move away" is the story of the place I worked at a few years ago. The database was such a disorganized, inconsistent mess that no-one had the stomach (or budget) to try and get off it.

codr7 · 2025-02-24T23:37:56 1740440276

Basically every MongoDB I've come across; same story, different faces.

dkjaudyeqooe · 2025-02-25T02:02:04 1740448924

This is maybe the biggest reason to have a RDBMS, enforcing discipline to keep your data clean. Ultimately this trumps almost everything else.

isoprophlex · 2025-02-24T16:30:50 1740414650

Impossible! It's not based on sqlite, postgres or written in rust, so it must be terrible!

flessner · 2025-02-24T23:08:34 1740438514

I never understood this argument, there are many great products running on Java, PHP, Ruby, JavaScript... All of these languages have a "crowd" that hates them for historic and/or esoteric reasons.

Great products are in my opinion a function of skill and care. The only benefit a "popular" tool or language gets you is a greater developer pool for hiring.

FridgeSeal · 2025-02-24T23:38:53 1740440333

These days almost all the alternatives do 90% of what they cover in a way that’s either using your pre-existing tech stack, or just net-better.

There’s probably some extremely specific niche use-cases where it works well, but I suspect they’re pretty few-and-far-between.

xyst · 2025-02-24T16:30:59 1740414659

Then they get acquired by BloodMoor and they squeeze every last cent out of the remaining customers.

axpy906 · 2025-02-24T17:34:11 1740418451

Unironically, this.

bithavoc · 2025-02-24T16:24:29 1740414269

that’s what I thought, but every single candidate I interviewed mentioned MongoDB as their recent reference document database, I asked the last candidate if they were self-hosting, the answer is no, they used MongoDB cloud.

winrid · 2025-02-24T16:34:29 1740414869

I self host a handful of mongodb deployments for personal projects and manage self hosted mongo deployments of almost a hundred nodes for some companies. Atlas can get very expensive if you need good IO.

rpep · 2025-02-24T21:34:15 1740432855

You cant use the embeddings/vector search stuff this refers to in self hosted anyway, it’s only implemented in their Atlas Cloud product. It makes it a real PITA to test locally. The Atlas Dev local container didn’t work the same when I tried it earlier in the year.

tiltowait · 2025-02-25T01:53:31 1740448411

Atlas has a generous free tier that is great for hobby projects.

slt2021 · 2025-02-24T16:52:15 1740415935

if you a developer you wanna use MongoDB as database, not be MongoDB SRE and DBA

thats the reason for using Atlas

skatanski · 2025-02-24T20:44:37 1740429877

Precisely, and if you are enterprise, you want to have an option to request priority support and have a lot of features out of the box. Also some of the search features are only available in Atlas unfortunately.

geodel · 2025-02-24T16:23:02 1740414182

Everyone you know put a dollar in donation basket while moving off. Mongo collected all and brought Voyage AI

mgfist · 2025-02-24T16:41:01 1740415261

$2.3B in cash as of last quarter

DarmokJalad1701 · 2025-02-24T16:32:20 1740414740

Because they are web-scale obviously.

SilasX · 2025-02-24T16:35:12 1740414912

Well, it's referred to as a cash-and-stock deal but I can't find any more detail about how much is stock:

https://seekingalpha.com/news/4412466-mongodb-acquires-voyag...

paxys · 2025-02-25T00:15:32 1740442532

MongoDB is a public company. Its quarterly financial reports will give you a much more accurate picture of the company's health than "everyone you know".

porridgeraisin · 2025-02-24T16:23:21 1740414201

There are a lot of people still on it, including the place I worked at last.

It was starting to get expensive though, so we were experimenting with other document stores (dynamodb was being trialled, since we were already AWS for most things, just around the time I left)

aitchnyu · 2025-02-25T07:40:11 1740469211

Are they profitable, and at which point in time? How good of an investment was it? Sorry, my eyes were swimming in their financial report hosted in their domain.

yla92 · 2025-02-24T23:57:11 1740441431

Mongo Atlas (their cloud offering) is really solid (and expensive)!

yfontana · 2025-02-24T17:44:34 1740419074

This may be a shock to many HN readers, but MongoDB's revenue has been growing quite fast in the last few years (from 400M in 2020 to 1.7B in 2024). They've been pushing Atlas pretty hard in the Enterprise world. Have no experience with it myself, but I've heard some decently positive things about it (ease of set up and maintenance, reliability).

ChrisArchitect · 2025-02-24T16:35:12 1740414912

Voyage AI post: https://blog.voyageai.com/2025/02/24/joining-mongodb/

BlairCurrey · 2025-02-24T16:40:19 1740415219

and the mongo blog post for how it will be used: https://www.mongodb.com/blog/post/redefining-database-ai-why...

cpursley · 2025-02-24T16:57:40 1740416260

How is MongoDB still a thing when there's already several ways to handle json in Postgres including Microsofts new documentdb extension:

https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

What am I missing? Are Mongo users simply front end folks who didn't have time to learn basic SQL or back end architecture?

computerfan494 · 2025-02-24T17:27:27 1740418047

I will copy and paste a comment I wrote here previously:

"MongoDB ships with horizontal sharding out-of-the-box, has idiomatic and well-maintained drivers for pretty much every language you could want (no C library re-use), is reasonably vendor-neutral and can be run locally, and the data modeling it encourages is both preferential for some people as well as pushes users to avoid patterns that don't scale very well with other models. Whether these things are important to you is a different question, but there is a lot to like that alternatives may not have answers for. If you currently or plan on spending > 10K per month on your database, I think MongoDB is one of the strongest choices out there."

I have also run Postgres at very large scale. Postgres' JSONB has some serious performance drawbacks that don't matter if you don't plan on spending a lot of money to run your database, but MongoDB does solve those problems. This new documentdb extension from Microsoft may solve some of the pain, but this is some very rough code if you browse around, and Postgres extensions are quite painful to use over the long term.

The reality is that it is not possible to run vanilla Postgres at scale. It's possible to fix its issues with third party solutions or cobbling together your own setup, but it takes a lot of effort and knowledge to ensure you've done things correctly. It's true that many people never reach that scale, but if you do, you're willing to spend a lot of money on something that works well.

thayne · 2025-02-24T21:37:17 1740433037

> MongoDB ships with horizontal sharding out-of-the-box

Maybe it's better than it was, but my experience with Mongodb a decade ago is that that horizontal sharding didn't work very well. We constantly ran into data corruption and performance issues with rebalancing the shards. So much so that we had a company party to celebrate moving off of Mongodb.

threeseed · 2025-02-24T22:18:55 1740435535

> my experience with Mongodb a decade ago

So before the Apple Watch was released.

Why is this relevant today ? Technology changes very quickly.

cpursley · 2025-02-25T00:43:41 1740444221

And Apple Watch still sucks as an actual watch vs a Casio.

amazingamazing · 2025-02-24T17:02:06 1740416526

MongoDB is not the same as Postgres and jsonb.

also, I'd challenge your thinking - ultimately the goal is to solve problems. you don't necessarily need SQL, or relations for that matter. that being said, naively modeling your stuff in mongodb (or other things like dynamodb) will cause you severe pain...

what's also true, which people forget, is naively modeling your stuff with a relational database will also cause you pain. as they sometimes say, normalize until it hurts, and then denormalize to scale and make it work

the amount of places I've seen that skip the second part and have extremely normalized databases makes me cringe. it's like people think joins are free...

pphysch · 2025-02-24T17:16:31 1740417391

Then your implementation can be as simple as CREATE TABLE documents (content JSONB);. But I suspect a PK and some metadata columns like timestamps will come in handy.

amazingamazing · 2025-02-24T17:25:09 1740417909

sigh - mongoDB is not the same as creating a table with jsonb. for one, you don't have to deal with handling connections. that being said, postgres is great, but it's not the same.

pphysch · 2025-02-24T17:29:30 1740418170

Postgres has ways to simplify connection management, if that is a blocker for you (pooling, pgbouncer, postgrest, etc)

ecshafer · 2025-02-24T17:30:41 1740418241

I have seen a few rather large, production mongodb deployments. I don't understand how so many people chose it as their basis of their applications. There are a not-negligible amount of mongodb deployments I have seen that basically treat mongodb as a memory dump, where they then scan from some key and hope for the best. I have never seen a mongodb solution where I thought that it was better than if they just chose any sql server.

SQL or rather just some schema based database has a ton of advantages. Besides speed, there is a huge benefit for developers to be able to look at a schema and see how the relationships in the data work. Mongodb usually involves looking at a de facto schema, but with fewer guarantees on types relations or existence, then trawling code for how its used.

Tanjreeve · 2025-02-25T07:48:50 1740469730

If you're scared of SQL/have a massive operations team to throw infrastructure problems over the fence then that would be a positive to push all complexity into the application code as you aren't the one paying that cost.

orochimaaru · 2025-02-24T17:40:14 1740418814

We use their atlas offering. It’s a bit pricey but we are very happy with it. It’s got a bunch of stuff integrated - vectors, json (obviously), search and charting along with excellent support for drivers and very nice out of the box monitoring.

Now I could possible spend a bunch of time and do the same thing with open source dbs - but why? I have a small team and stuff to deliver. Atlas allows me to do it fast.

cpursley · 2025-02-24T18:05:00 1740420300

There’s a ton of hosted Postgres providers that do all of that and more and are just as simple to use. Neon.tech is really easy to set up and if you need more of a baas (firebase alternative), Supabase. Plus, no vendor lock in. I’ve moved vendors several times, most recently AWS RDS to Neon and it was nearly seamless. Was originally on Heroku Postgres going way back. Try getting off Atlas…

orochimaaru · 2025-02-24T20:20:55 1740428455

Ha - easier said than done in an enterprise, especially when nothing is wrong. Maybe the $$, but at some point the effort involved with supply chain and reengineering dwarfs any “technical” benefit.

This is why startups like to get into a single supply chain contract with an enterprise - it’s extremely hard to get it setup, but once done very easy to reuse the template.

skatanski · 2025-02-24T21:01:21 1740430881

Similar here, there are gotchas though. Some versions ago they've changed their query optimization engine - some of our "slow aggregations" become "unresponsive aggregations" because suboptimal indexes were suddenly used. We had to use hints to force proper indexing. Their columnar db offering is quite bad - I'd say if there's need for analytical functionality, its better to go with a different db. Oplog changes format - and although its expected, it still hurts me every now and then when I need to check something. Similarly at some point they've changed how nested arrays are updated in changestream, which has broken our auditing (its not recommended to use changestream for auditing, we still did ;) ). We've started using NVM instances for some of our more heavily used clusters. Well it turned out recovery of an NVM cluster is much much slower than a standard cluster. But all in all I really like mongodb, if there are no relations - its a good choice. Its also good for prototyping.

pphysch · 2025-02-24T17:14:53 1740417293

It's simply not that widespread of knowledge. Modern Postgres users would never suggest Mongo, but a generation of engineers was taught that Mongo is the NoSQL solution, even though it's essentially legacy tech.

I just ran into a greenfield project where the dev reached for Mongo, and didn't have a good technical reason for it beyond "I'm handing documents". Probably wasn't aware of alternatives. FWIW Postgres would've been a great fit for it, they were modeling research publications.

crowcroft · 2025-02-24T17:43:35 1740419015

If you can learn Mongo you can learn SQL and 'back end architecture' let's be honest the basics are hardly difficult no matter what tool you're using.

Just because Postgres is good doesn't mean other things can't also be good (and better for some use cases).

frankfrank13 · 2025-02-24T16:58:04 1740416284

Enterprise sales

nextworddev · 2025-02-24T17:43:58 1740419038

Mongo is Firestore for entrprise

gddgb · 2025-02-24T17:18:56 1740417536

Um because it must be worth 2 billion if this acquisition is worth $220 million. I know there’s rules about discussion quality on this site, so I guess we can’t question that.

whalesalad · 2025-02-25T03:55:29 1740455729

Beefin · 2025-02-24T16:47:54 1740415674

what's the calculus here? if i'm a developer choosing a low-level primitive such as a database, i'm likely quite opinionated on which models i use.

crowcroft · 2025-02-24T17:51:49 1740419509

If I had to guess they might see embedding models become small and optimised enough to the point that they can pull them into the DB layer as a feature instead of being something devs need to actively think about and build into their app.

Or it could just be an expansion to their cloud offering. In a lot of cases embedding models just need to be 'good enough' and cheap and/or convenient is a winning GTM approach.

thecleaner · 2025-02-24T16:26:17 1740414377

Curious - do people migrate due to the price tag, ease of use, sth else ?

htrp · 2025-02-24T17:42:30 1740418950

Voyage AI basically builds embedding models for vector search

crowcroft · 2025-02-24T17:46:12 1740419172

You don't hear the big AI providers talk about embeddings much, but I have to believe in the long run that companies building SOTA foundational LLMs are going to ultimately have the best embedding models.

Unless you can get to a point where you can make these models small enough that basically sit in the DB layer of an application...

htrp · 2025-02-24T18:22:14 1740421334

That and because the embedding models are much easier to improve with at scale usage (hence why everyone has a deep search/research/RAG tool built into their AI web app now).

rich_stable · 2025-02-25T05:37:00 1740461820

This is essentially my prediction; either that or something functionally equivalent.

redskyluan · 2025-02-25T17:02:27 1740502947

I'm actually very disappointed about the performance of Mongo vector search after I test on it. Any vector database, is better than mongodb performance wise.

lpapez · 2025-02-24T18:21:19 1740421279

Is Voyage AI web-scale yet?