Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: MongoDB Protocol for SQLite (github.com/ferretdb)
147 points by aleksi on July 4, 2023 | hide | past | favorite | 68 comments



"However, as time passed, MongoDB abandoned its open-source roots; changing the license to SSPL - making it unusable for many open source and early-stage commercial projects."

This is not true, you can use MongoDB for commercial projects as long as you don't sell MongoDB services, but internally you're free to use it as your regular DB.

https://www.mongodb.com/licensing/server-side-public-license...


I was talking to the VP of Engineering at a company that got acquired by [top 10 VC/PE firm].

Apparently the firm put the company through the wringer (and more than likely bumped down their valuation during the transaction) over MongoDB’s licensing.

I have a feeling in this specific case, [top ten firm] probably found some technicality in MongoDB’s licensing that allowed them to use as negotiating leverage to bump the price down during the transaction.

Licensing is important. You especially don’t want ambiguous licenses that give lawyers wiggle room to make absurd nonsensical arguments, which unfortunately is very common even with reputable investors and firms.

Edit: Removed name of firm for legal reasons. Email me if you want specifics. bp@brandonpaton.com


In case anybody wonders, the firm in question was a16z.


It actually isn’t, but it’s a firm with similar name recognition.


Sequoia?


The SSPL is a murky license of questionable validity. The SSPL text itself is an unlicensed plagiarism of the AGPL [1]. The abuse of copyright law[2] that the SSPL explicitly demands might make it neither a valid license nor a valid contract [3]. What happens when the license covering your use of certain software turns out to be invalid? Well, in the eyes of the law, its as if you never had a license in the first place.

That's a lot of legal risk for your commercial project.

1: https://lists.opensource.org/pipermail/license-review_lists....

2: https://lists.opensource.org/pipermail/license-review_lists....

3: https://www.processmechanics.com/2018/10/18/the-server-side-...


Copyright misuse, as per one of your links, is a defense to an infringement claim that is recognized under certain conditions in part but not all of the US - it does not invalidate the license, it invalidates the licensor’s lawsuit.

But yes, it’s still a mess of a license that creates a lot of legal uncertainty for potential licensees, especially if a particular court decides that the copyright misuse defense doesn’t apply but that unclean hands does apply to the licensee for taking a license without intent to comply (e.g. because of having already concluded that compliance was impracticable or impossible).

Definitely best to avoid the SSPL and software licensed under it.


It's a little more complicated than than just having to refrain from selling MongoDB services. There is a good page which summarizes the problem with SSPL. [1] SSPL is crafted in a way that it very hard to tell whether you comply with the license or not, especially if you run it as part of a cloud hosted solution.

[1]: www.ssplisbad.com


This page is pretty wrong on a whole lot of points, and doesn't even go into enough detail on any of its claims to comprehend how it got there.


They seem to present a reasonable well thought out argument. I'd be interested in hearing a counter more nuanced than "nuh uh."


I mean, that's fundamentally what the website did. Claims without evidence can be dismissed without evidence. However, without any justification for the belief the author apparently has, the author asserts you cannot even expose MongoDB's API as part of your own application within the terms of SSPL, and that is neither in the letter nor spirit of the license terms, which solely relates to selling the software in question (MongoDB here) as a service offering principally as itself.

The author, of course, relies on the claimed vagueness of the license as an excuse to take every claim to the most ridiculous extent possible, even though no reasonable person would believe anything claimed here, such as the suggestion that it might entail you release the source code for your computer's BIOS.

In a section with truly atrocious spelling and grammar, the author asserts that somehow this kills real competitors (even though real competitors would presumably have their own actual product offering, not just re-ship a product offered by the SSPL software developer at a predatory pricing rate only made possible through monopolist behavior). They then make the ridiculous claim that this entirely removes the ability for the customer to choose their cloud provider. Meanwhile, in the land of reality, between either forks or licensing agreements, there is plenty of competition, but the developers have an opportunity to sustain development instead of all of the profit being skimmed off by Amazon, who contributes nothing back and does none of the work. Having to raise their prices above the original developers' due to license fees, of course, doesn't even remove the value add for choosing AWS, where the benefits are bringing it into the same datacenter and platform as the rest of your other cloud needs.

The author then tries to villainize software companies using SSPL by pointing out how many thousands of employees and how many millions in revenue they have, without acknowledging that the sole benefactor of making SSPL look bad is Amazon, which brings in hundreds of billions of dollars of revenue and has over 1.5 million employees.

This train wreck is then finished up with their suggestions that these companies should just remain unsustainable and rely entirely on business models for open source we all know don't work very well.

I would say the author is arguably wasting their $10 a year domain registration on this garbage article, but considering the lack of public attribution present, I assume Amazon's paying for it.


> Claims without evidence can be dismissed without evidence.

> considering the lack of public attribution present, I assume Amazon's paying for it.

Well, that was a wild ride.


Hah. I will grant you that one. :D

Though I am absolutely fine with you dismissing my parting shot on those grounds indeed.


That condition is not in the spirit or commonly accepted definition of open source.


“Open source” tends to be a spectrum.

For example, many consider GPL to be a great OSS license, but it’s restrictive enough that some BSD projects try to avoid it.


It tends to be a spectrum, but the various, sometimes very different licenses adopted by the OSI still have something in common - the open source values laid out in the Open Source Definition. [1] SSPL did not comply with this requirement, as it discriminate against specific users or use cases. I think it is in the interest of everyone to draw the line somewhere.

[1]: https://opensource.org/osd/


BSD would not be OSS by this definition, as it has no clause against discrimination of people.

OSS is IMO a very subjective term. There is a massive gray area surrounding open-sourcing derivative works, etc. and other stipulations.


Open source licenses might be a spectrum but it's likely that the legal reality isn't. Like the famous double split experiment there's probably no spectrum and only a couple of buckets of legal validity.


As far as the buckets go, I still believe there is a pretty large spectrum.

BSD vs GPL vs LGPL are pretty dramatically different, and there are many in-between that might satisfy the definition of “open-source” at face value.


Why not?


I will never understand why Mongo and Elastic get so much hate when all the did was to try to protect themselves from AWS parasites.

I guess having yourself exploited and ultimately destroyed by monopolist overlords is the moral choice or something? Or do boots just taste good?


I am not sure if any hate is warranted, but it is true that coming up with a new license which does not comply with open source licenses, and proceeding to call it open source definitely did not help to create sympathy. There is nothing wrong with releasing something under a proprietary license but calling this license change "Doubling down on open source" is damaging.

https://logz.io/blog/open-source-elasticsearch-doubling-down...


We would never touch that though. The lawyers don’t really grok the beginning and the end of the limitations. We can pick up mit, Apache, mpl, bsd and in some cases gpl (inc v3). Probably some more but agpl and such are off limits even though we will never actually sell it as a service as such.


The keyword here is "Internally" however if you want to "embedd" MongoDB in your Open Source Product then resulting Bundle will not be Open Source as SSPL is not Open Souce license


If this implemented sharding it'd be killer. I recognize that'd be a huge amount of work, but that would be amazing! I may do that someday. You could just use one PG instance as the config servers and this takes place as the mongos.


Maintainer here. That's a very good idea! Let us know if you ever have some time to work with us on this.


You should speak to PayloadCMS guys. It can be mutually benefiting (publicity/more users/etc.). It needs Mongo, but if it works with Ferret backed by SQLite (and others) that would be game changer.

https://payloadcms.com/

Great project of yours! Thank you

* will try to hack docker-compose for this if time allows


Thank you, we will take a look!


Awesome! This might be my new Sunday project once I'm done with the current one.

Mongo's sharding is nice but I'm really tired of their support. They force you to run smaller machines in the contract so they can charge more for # of nodes, and it adds complexity to your architecture so you're swayed into buying Atlas. I could replace it all with a few PG shards with bigger machines.


Yes this is one of the reasons we started FerretDB. Atlas is very easy to use, but it is nearly impossible to move away from later on, as there is no alternative (unless you are ready to rewrite your app). We think that most of its killer features, like easy sharding, can be done with Postgres and/or SQLite.


I agree. I'd love to work on this full time if I could.


Might also be an interesting idea to use sqllite's inmemory feature.


Right, we are going to add support for it.


Honest question: why doesn't Couchdb receive any love on HN?

It's such a perfect tool for my side projects and most projects that will never reach Twitter scale


I had a bad experience with it a decade ago: I inherited some code that used it to test-drive the technology, and it was a failed experiment.

In the case of this code, CouchDB was simply being used as a dumb key-value lookup. We weren't using any special features of CouchDB.

Of course, a lot has changed since, so the problems I encountered might not exist.

Basically, CouchDB needed to periodically run a gargage-collection-like operation. (If I remember correctly, records were not deleted, but instead hidden until you run the cleanup.) The problem was that the cleanup process was so unoptimized that CouchDB copied the entire database without the deleted records. In our case, the disk we had was about 90% occupied, so we couldn't run cleanup.

So why was CouchDB a failed experiment:

First: The problem with handling deletions was rather serious, for a database. It's quite painful for a database to need 2x your anticipated storage; and for that database to continue creating records even if it blocks something like garbage collection. This lead me to conclude that the database could have other bugs lurking. I need to trust that the database won't lose data or become inoperable due to its own bugs; or due to beginner mistakes with unfamiliar technologies.

Second: CouchDB offered no features that I needed for my project that I couldn't get elsewhere. I just needed simple key-value lookups.

Third: Learning curve: Everyone and their dog knows SQL. This was handling a niche use case, so most likely, whoever was touching it was going to be very unfamilar with the code. Picking a well-known SQL database helps keep that learning curve in check.


That data cleanup caveat is an important lesson — thank you so much for sharing your hard lessons learned.

I actually don't know SQL very well (coming from design), so something like Couch feel more intuitive to set up as it looks and feels like JS / JSON, with neat additions like views.

I think I'll keep using Couchdb for my tiny projects, with the caveat that "if something gets bigger and more well-defined, start learning SQL" (or at least properly learn Supabase)


> I actually don't know SQL very well

That's a huge, huge gap in your skillset. SQL databases are generally the one-size, fits-all of databases. There's good reasons (very good reasons) to use alternate databases; but they tend to be outside of the primary use cases of an application.

One of the very important features of SQL is that how you query your data isn't tied to the structures that are convenient for a particular use case.

Basically, the problem with using CouchDB and MongoDB as a general purpose database is that you're storing and querying data structures, or documents. It's easy early in a project, when your use cases are small.

The problem comes later in the project, when a requirement comes to query a data structure that's deeply nested inside of a document. This can incur a rather substantial performance penalty, unless you denormalize your data.

For example, when I used to go to MongoDB conferences, they used the example of a message board, where all messages in a post were stored in the same document. (For example, imagine that all messages in this thread are stored in a single document.) The problem then comes when you need to support a page like https://news.ycombinator.com/threads?id=gwbas1c, that lists all posts that I've made.

In the case of SQL, supporting a page like https://news.ycombinator.com/threads?id=gwbas1c is merely writing a new query and updating an index. It doesn't require denormalization or scanning many documents.

> I actually don't know SQL very well

Another reason why it's important to understand SQL basics comes from being able to write code that performs well, within reason. I constantly inherit code that uses an ORM (library that translates between SQL and objects), and the code just can't perform.

The original authors didn't know SQL at all, and just assumed that the ORM would do the heavy lifting for them.

> I actually don't know SQL very well

You don't really need to know much SQL to get by: Just four commands, "select, insert, update, delete," (and upserts) and then understand joins. Understand transactions, and you're all set. You can leave the more advanced features to when you need an advanced database feature.


It's a cool idea but everything is just a little funky. At least that's how I remember it when I tried it a while back. That plus all the problems of coding in your DB were the reasons I didn't use it again.

Shame too because the sync between couch & pouch was super cool. Feels like the way things should work.


The Couch / Pouch thing is really drawing me in... can't wait to learn some hard lessons about the funkiness myself...


MongoDB has a similar sync to Realm, which seems a lot more capable than Couch sync.


But only through their opaque sync service right?


It did for a while but fell out of favor. Couch doesn’t have any current major endorsements and many companies that used it have moved onto better solutions. Couchbase had most of the momentum and it turned into a different product.


For that scale postgres works very well. Going to need very good reasons to not use it.


Postgres is harder to setup; Using Couchdb can be set up completely in-memory and in IndexedDB as a prototype tool (as PouchDB), then set to sync with a hosted DB somewhere else. For < 10000 user apps, I'd want this flexibility, I think.

If I have more than 10k users, I'd have more problems


> Postgres is harder to setup

???

When I used Heroku, Postgres was a checkbox. Can't get any easier than that.

I understand that it's more work to set up a schema, as opposed to just dumping data structures into a data store. But, as I described elsewhere in this thread, the flexibility to query any table will help dramatically as your requirements change throughout the lifetime of a product.


Wow, I have nothing to add except this is exactly what I was looking for prior to abandoning my last project... time to dig it back up again. Amazing stuff.


Has anyone tried running the Ubiquiti Unifi controller against this?

Even stories of it being successful with the Postgres backend would be really helpful.


Maintainer here. We will try in the next few days, and I will let you know. I have a lot of Unifi stuff myself.


Let me know if there’s something I can do to help.

Running Unifi on NixOS is a bit of a pain since the license means that there is no binary cache for the package, resulting in frequent rebuilds.

I’ve also not had a great time with the mongo upgrades in the past that caused my controller to not start.


Is there any documentation on the native schema used to store the documents. Is it sane enough that you could manually roll your own JSON1 queries in SQL to facilitate more relational join like features?


Some time ago, we wrote a blog post about it: https://blog.ferretdb.io/pjson-how-to-store-bson-in-jsonb/ A few details changed after that (for example, type information is not mixed with values anymore), but the general idea is the same. We probably need to document it better.

Yes, querying it with SQL should be possible.


Great addition of SQLite backend. I would however love prebuilt binaries for Mac, that way I can test it with Marmot (https://github.com/maxpert/marmot sorry for shameless plug). I wanted to build myself but then the instructions in README discouraged it, not sure why, it's not like I am going to train a LLM so it's pretty complex.


Thanks for the very relevant plug, we are going to test with Marmot.


I started looking into replacing the MongoDB instance used by the Unifi Controller software with this, I need to revisit it.


Relevant comment from someone involved with the project.

https://news.ycombinator.com/item?id=36596036


It would be interesting to see some benchmarks, especially comparing the same queries between Mongo and Ferret. Also, while the readme does say it's meant to be a drop in replacement, it does not say whether feature parity has been achieved. I would very much like to see how it fares with an ODM layer.


Feature parity progress with apps:

https://github.com/FerretDB/FerretDB/issues/5

I presume, that after there is feature parity, then someone could run some benchmarks.


Great project!

I find it intriguing to note a comparable approach by MySQL, where it functions as a NoSQL database, specifically a document database, owing to its adoption of a multi-paradigm methodology.

Starting from version v5.7.12, MySQL provides the "MySQL Document Store" feature, which enables data access through a document interface utilizing the proprietary XProtocol protocol. It is important to mention that this protocol is incompatible with MongoDB. The fascinating aspect is that the data can be modified interchangeably using both XProtocol and SQL.


MongoDB was originally an eye-opening technology for many of us developers, empowering us to build applications faster than using relational databases.

Something something HN and pitchforks.


But is it still web scale?


Not yet, but let us turn off fsync…


This is fun! Look like SQLite is getting all kind of Network protocols recently What is next ? Will SQLite learn to speak MySQL ? Oracle ?


Man this is exactly what I need at a high level but not writing this in C is a complete show-stopper. All that Docker effluvium adds insult to injury. Sad!


FerretDB is written in Go, can be used as a Go library, and you could call it from C via CGo… But I would not do that :)


Haha. The thought process behind using Go was probably like: "Great! SQLite is written in C. That means I can write my library in Go! And use microservices!"


FWIW, we use a version of SQLite transpiled into Go to avoid CGo problems: https://gitlab.com/cznic/sqlite


You can run the underlying single static binary as a service any non-docker way you like.

Your applications would be talking to it over a socket using a known and documented protocol so the implementation language isn't something the client code should be expected to need to care about at all.

Given those two facts, I genuinely don't understand what your concerns are here?


It's mostly meant as a service. This is a great application for Go.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: