Hacker News new | past | comments | ask | show | jobs | submit login
Postgres wire compatible SQLite proxy (github.com/benbjohnson)
288 points by ithkuil on April 1, 2022 | hide | past | favorite | 81 comments



Postlite author here. I'm glad to see the interest in this project. This project really only has one use case and that is being able to connect to a remote SQLite database using GUI database administration tools like DBeaver. The idea came out of this Twitter thread:

https://twitter.com/benbjohnson/status/1508927916561743872

It's a recurring theme with SQLite. Many developers are on-board with using it as a database but when you tell them that they also have to SSH in and use a CLI exclusively then they are turned off. Postlite is meant to alleviate that pain.


Heh. When I saw this project, I wondered what it would be written in. Using jackc/pgproto3 almost feels like cheating... it's such an unreasonably great protocol library for Postgres.

I wrote a PgBouncer alternative awhile back using that library, and in the span of only about 1200 SLoC, I had a fully functional alternative that...

- benchmarked better than PgBouncer for me

- avoided the need for the annoying session, transaction, statement modes by just Doing The Right Thing. If you prepare an anonymous statement, it holds that connection for you until you execute the prepared statement. If you open a transaction, it holds that connection for you until you commit or rollback that transaction.

- offered control over whether to set application_name for the DB connection whenever one is acquired by a client connection, and whether to also clear application_name or not when the connection is released.

It's relatively straightforward to detect when a SELECT query comes through, so I had planned to add transparent read replica support to route SELECT queries to read replicas if you aren't in a transaction. I also never got around to implementing TLS support, but... that should be trivial in Go.

Longer term, I also thought it would be cool to implement some extensions to the Postgres wire protocol, like end to end compression with zstd. You might have an application in one datacenter querying a Postgres database in another, and depending on the size of datasets that you're getting back from the database, compression could make a huge difference. You could also imagine implementing a "double proxy" where a proxy is running both locally and in the remote datacenter, and it would implement the non-standard postgres extensions behind the scenes, presenting a purely standard wire protocol to any client that connects.

Since this was just a fun side project that was never proven in production, I've never gotten around to open sourcing it. I also wish it actually had some tests... but those haven't happened yet. I don't want to mislead people into thinking I consider this side project to be production ready, but I don't remember any obvious problems. If people were actually interested, I could open up the repo, but this comment is less about self-promotion and more about how impressed I've been with that particular wire protocol library, but I admittedly do also enjoy talking about side projects. If anyone needs to do something with Postgres's protocol, I would highly recommend it.


Yeah, jackc/pgproto3 is great. When I first started the project, I was digging through the protocol documentation and expecting to write message parsers. But pgproto3 handles all that. Almost all the real code in Postlite is in a 500 LOC file[1]. I was surprised how easy it was.

[1]: https://github.com/benbjohnson/postlite/blob/main/server.go


at least half the code is error-checking or error-handling


Please open source your project. It sounds like a lot of (designing, as well as coding) effort went into making it, and you don’t want to see all that work go to a waste.

Do not worry about it being buggy, or even it being bad (design, or code). Let others do that work for you :-)


I went ahead and made it public here: https://github.com/coder543/roundabout

It took a few minutes since I had to add a license and README, plus I did a quick test of it locally to make sure it still worked, which helped me discover that SASL authentication needed to be implemented, so I added that.

I agree other people could be interested in contributing, I'm just not sure how much interest there actually is for a PgBouncer alternative.


Thank you for opening it up to the public. Much appreciated!

Do you know of a package/application/library that can be used to validate just the FEBE protocal of Postgres, and possibly stress/performance test it, as well. Such a test-suite would be great to independently test the various implementations of Postgres wire-compatible projects and products, including your roundabout.


I’m not sure what exists as far as protocol validation goes, but I did benchmarking of mine versus PgBouncer using pg_bench, and it had no problems.


I think the following could be an interesting project:

A pair of programs. One to emulate Postgres server, another to emulate a Postgres client. These two programs would assume they are just talking to each other, and know exactly what to expect from the other side.

Then we can inject a protocol implementer (the system under test, or SUT) between these two programs and see if the SUT can make both these programs believe that they are still talking just with each other. This way we can validate the level of protocol support by the SUT. And if we can run this whole setup, multiple clients, a server, and the SUT, under controlled conditions, we can also evaluate the performance of each such protocol implementation.

The Postgres client emulator would send a predefined set of commands, and would know exactly what the response should be. The Postgres server emulator would know exactly what commands to expect, and the hard-coded responses to send for each incoming command.

The client emulator's knowledge of the responses would make it easy to catch any errors/bugs introduced by the SUT.

The Postgres server emulator would _not_ implement any server-side logic (command parsing, planning, etc.), to ensure the peak performance for each command it processes.

I was thinking of implementing such a Postgres server emulator back in around 2014, but for a different reason. IIRC, I was thinking of calling it Black Hole Postgres, to test the performance of my TPC-C implementation, DBYardstick [1].

[1]: https://github.com/DBYardstick/TPC-C


Ben, you have been one of most influential creators I have known. I have been following you since boltdb days. I used boltdb in project that helped scale backoffice system of many counties in tier 2 cities of US, creating huge impact. With litestream and now this, thanks for creating project that are literally changing the way new developers are building systems.


Thank you, Debarshi! That means a lot. I really appreciate it.


Do you think this technique can be adapter for integration testing? So you wouldn't need a running PostgreSQL server to run tests that connect to a PostgreSQL database, or maybe it would make tests easier to run in parallel by using different SQLite files.


Tests that run against a different relational database from production make me really nervous.

The Django ORM has provided the ability to test against SQLite and deploy against PostgreSQL with the same code base (and the same tests) for years - and while it works incredibly well, I still won't use that in any system that I build. How your database behaves is such a crucial component of your application!

These days I find spinning up a real PostgreSQL database to run the tests is so easy there's essentially no reason not to do it. I use GitHub Actions for my PostgreSQL testing insurance, and it ends up just being a few extra lines in the YAML file.


I agree with Simon. I think you could run tests against SQLite locally so they're quick but then run them against Postgres in CI to ensure it works against the database you're running on. There's a lot of subtle differences in how the two databases work even though they both support a lot of the same SQL syntax.

You could also run SQLite in production. Then you don't have to test against a different database locally. :)


Why not just use a separate schemas on the same postgres server?


I normally scp the db files down to my machine and use Table Plus locally.

Works well if you are not trying to watch real time processes as they write to the db.




For the stated use case, I have found DB Browser to be an invaluable tool to graphically interact with SQLite databases:

https://sqlitebrowser.org/

You can do basically anything you would need to do to a database with it, and it’s very performant and cross platform.


DB Browser for SQLite is a great tool but I don't believe it supports connecting to remote databases, does it?


I run it with ssh -X to run the remote GUI. It's a bit sluggish, but it's a good way to quickly browse a DB to see whats there.


Why should it need to? sshfs should work fine, probably.


I don't believe sshfs is safe with SQLite with concurrent use: https://github.com/libfuse/sshfs/issues/204#issue-561260021


I have to say I have a disliking for what appears to be the current trend of marketing things as "postgres (wire) compatible". CockroachDB is another example that does the same.

The implication is that "if you can do it in postgres, you can do it in X". But the reality is, of course, different.

If you are using databases properly, not just as a blackbox dumping ground, then if you use postgres on the wire, its because you are talking to postgres and using postgres specific features (like you should).

Unless you have feature parity, don't tell me I can connect to another database using my postgres libraries. Its pointless. Its meaningless. Why would I want to do that if I can't use the features ?

Rant over. ;-)


This particular one is described as a way to use existing Postgres GUI tooling (DBeaver, Postico 2) with SQLite. Building a compatibility shim for existing tooling is sometimes a more efficient way to solve this class of problem than adapting the tooling to support multiple backends.

I'm not saying it's what I would've done (I have little use for GUI database tooling), but I can see why someone else might.


Postlite author here. Yes, this is exactly the reason. There's no standard for connecting to a remote SQLite database so this is simply a shim to help support existing tooling. I don't use GUI tooling either but it's the most common complaint I hear when developers try switching to SQLite.


Dbeaver supports sqlite directly. It's homepage states:

> Supports all popular databases: MySQL, PostgreSQL, SQLite, Oracle, DB2, SQL Server, Sybase, MS Access, Teradata, Firebird, Apache Hive, Phoenix, Presto, etc.


I think calling something "PostgreSQL wire compatible" is completely honest: you're not saying it's PostgreSQL compatible, the word "wire" is there for a very good reason.


The purpose of this software is to allow remote management of sqlite databases using existing postgres GUI tools.

> Postlite is a network proxy to allow access to remote SQLite databases over the Postgres wire protocol. This allows GUI tools to be used on remote SQLite databases which can make administration easier.

The readme does not seem to advertise this software as being a drop-in replacement for Postgres by any means, although I can understand your frustration with database vendors that market their software as such. I suspect they used Postgres as the wire format because otherwise they would need to invent one from scratch, and then work with existing tools to integrate. Leveraging the Postgres wire form means they can integrate with existing GUI tools on day one, and seems to be a very pragmatic choice.


Largely for driver coverage. Postgres drivers are available in tons of languages so by being wire compatible you can skip all that work.


Yes. I don't see the harm in the PostgreSQL wire protocol becoming a defacto standard for sending SQL statements over a network connection and getting results back.

SQL itself is only a vague standard these days, as almost every single database has its own dialect. The only ones that are the same are MariaDB and MySQL... and that doesn't count. [1] And I don't see this situation changing in the medium term, or ever, really.

[1] ... because the first is the fork of the second.


There's a small amount of harm because Postgres doesn't currently support request multiplexing and so its wire protocol doesn't have any facility for it. This pushes other databases to inherit one of Postgres's few true weak points.


You're gonna have some similar thing no matter what gets settled on, unless there's a flawless protocol out there but those are pretty rare in my experience.

I think the comparison is more like "is having an ad-hoc semi-standard wire protocol built around the postgres one better than having none at all?" and I think the answer is yes.

But it's not like there's an authority making this choice for everyone. If you think something else is better then use it in your tools and push for it. Postgres is ancient, open source, widely used and adapted for many different use cases. I think those all make it hard to beat in just a practical "worse is better" type way, but I also think people saying "no. we can do better" is good too, esp when they demonstrate it.


I agree with everything you said! I was reacting to "I don't see the harm" specifically. Ultimately, I think that with a few improvements to the Postgres protocol, settling on it wouldn't even be a "worse is better" outcome.


If I’m not wrong, I think this situation has changed with the latest Postgres release:

https://www.postgresql.org/docs/current/libpq-pipeline-mode....


Interesting! I was not aware of this, thanks for sharing. I'll have to look into it in detail to determine whether it's actually pipelining or multiplexing because pipelining is still problematic, since it's subject to head-of-line blocking (one expensive request blocks everyone from getting their responses until it's completed).


it's not multiplexing afaik. you will still have to process commands in the order they were put on the wire.


I take it that opening multiple database connections to the same server is not usually viable? I realize there are usually limits to open sockets / file descriptors and such, and that multiple connections will consume more memory on each end. Are there problems beyond that?


It's usually (but not always) viable and it's ultimately what one ends up doing. You're right about the drawbacks. One additional drawback is the setup time for opening new connections (network round-trip, re-authenticating, etc.).


This. I’ll take the pg protocol over OCI and especially ODBC any day.


My understanding is that SQLite tries to maintain compatibility with Postgres in the subset of SQL that it supports. So if this program reaches a reasonable level of maturity I would love to be able to use it with pre-existing products that were designed to use Postgres as a backend, to substitute SQLite as a drop-in replacement. One might hope that this could work for products that don't use Postgres's more cutting-edge features.


SQLite's lack of type safety makes me hesitate to use it as a drop-in replacement for Postgres.


I wouldn't use SQLite as a drop-in replacement either but, in case you haven't seen it, SQLite did add "strict mode" for better enforcement of type safety: https://sqlite.org/src/wiki?name=StrictMode


They fixed that a couple of versions ago: https://www.sqlite.org/stricttables.html


> Postgres drivers are available in tons of languages

So are SQLite drivers (and MySQL and all the other top-10/top-20 databases).

Why add un-necessary abstraction ?

Why add an un-necessary layer with bugs and edge-cases that you will subsequently waste hours of your life debugging ?

Plus, if we are talking about drivers, then surely if you're going to be using SQLite, you should be using SQLite drivers because they will be inherently simpler and smaller than the Postgres ones because of the much reduced feature-set of SQLite.

> What SQLite3 driver out there allows you to talk over a network connection to a remote database?

(Replied here because I can't reply to your point below)

Maybe that's because, as per the SQLite website[1], SQLite was never intended to be used as a client/server database !!!

I would argue that by attempting to "square peg round hole" SQLite into being a network database, you are only opening yourself up for unknown troubles down the road.

There are many excellent client/server databases out there (including the postgres referred to here). If you need that functionality you should be using the right tool for the job, there's no need to shoehorn SQLite into it.

[1] https://www.sqlite.org/whentouse.html


> Plus, if we are talking about drivers, then surely if you're going to be using SQLite, you should be using SQLite drivers because they will be inherently simpler and smaller than the Postgres ones because of the much reduced feature-set of SQLite.

What SQLite3 driver out there allows you to talk over a network connection to a remote database? Not any one that I've ever used.


> I would argue that by attempting to "square peg round hole" SQLite into being a network database, you are only opening yourself up for unknown troubles down the road.

It depends a lot on what you're using the remote connectivity for.

I think I have a pretty good understanding of SQLite's strengths+capabilities, enough to understand when it can replace a traditional RDBMS for a back-end service. I wouldn't build a solution where multiple clients are connecting to the DB remotely - but sometimes I want the convenience of connecting to the DB remotely for observability/troubleshooting. This is something that's trivial with traditional RDBMSs, and a bit painful with SQLite.

Exposing the SQLite DB over the network for occasional interactive use seems quite useful to me.


It's also useful if you initially built an application SQLite and are transitioning to Postgres.


That's the meaning of "wire" or "protocol" compatible. It's saying that the language is the same, not the features. It's similar to how Redis uses RESP which has been implemented by other systems.

You can even move the abstraction down a level with non-relational databases like Cassandra and Elastic offering SQL interfaces. They don't offer relational features, but let you express queries using a relational language.


Seems to me that something like this would be good for on-the-fly interchange between SQLite and Postgres, without the need for dumping and re-importing data.


Without query translation you won't be able to do much date querying since date formatting and getting now() are very different in postgres and sqlite.

Also, sqlite doesn't support INTERVAL which I personally use heavily in postgres.

Curious how the author plans to deal with this since they mentioned they're interested in supporting BI tools that presumably wouldn't issue sqlite queries if they think they're talking to postgres.


Postlite author here. Patrick DeVivo mentioned using BI tools against Postlite, which sounded really interesting, but I think you're right. Those tools are going to heavily rely on dates which will not translate well between Postgres & SQLite.

I don't have any plans for query translation except for some basic client startup query handling (e.g. SET, SHOW). Postlite is really just meant as a thin shim to allow GUI tools to work remotely on SQLite databases.


This is a cute idea but it sounds headed for pain, as most typical client environments will come with bindings for multiple db's including sqlite. So if you want to use pg bindings, that probably means you want to use pg features, which aren't present in sqlite. There might be a use case where you want to use sqlite remotely as a client/server db, but even then, is this such a great approach? Sqlite has its own special features that likely don't fit the pg wire protocol that well.


It's literally in the first paragraph...

> Postlite is a network proxy to allow access to remote SQLite databases over the Postgres wire protocol. This allows GUI tools to be used on remote SQLite databases which can make administration easier.


Yes, I'm saying if the gui tools expect postgres, they are likely to use postgres features, which expands the amount of work this thing needs. If the tools are written specifically for this proxy, then ok, but why bring postgres into it at all? Write something sqlite-specific that exposes sqlite's features.


> Write something sqlite-specific that exposes sqlite's features

This is about using SQLite with existing tools that don't have support for it. It's very common since it's easier to support multiple remote database servers (MySQL, Postgres) vs adding the in-process model for SQLite only.

Also you need to take the context into account - Fly.io has SQLite databases distributed across their nodes (see their post about replacing Consul [1]) and most likely want to be able to manage those remotely. Since SQLite doesn't have a server protocol, tools that have bindings for it only support working with local databases.

[1] https://fly.io/blog/a-foolish-consistency/


> Yes, I'm saying if the gui tools expect postgres, they are likely to use postgres features,

Unlikely. A lot of such tools are mainly for displaying the data that's present; they might have support for displaying e.g. GIS types but they're not going to need to write GIS data into the database, and certainly not to the point where you can't run them at all if that's not possible.


All of these tools will at least run an introspection query to get the available tables and columns to display the data. But these queries will fail because SQLite doesn't have these tables.

So the idea is very nice, but any existing tool for PostgreSQL will not work. You need to either work with the psql terminal application or write a new client.


This is explicitly accounted for, as per the README:

> Many Postgres clients also inspect the pg_catalog to determine system information so Postlite mirrors this catalog by using an attached in-memory database with virtual tables.



I'd imagine the biggest use case will be to simplify unit tests and local dev.


Postlite author here. The primary use case is to administer remote SQL databases using existing GUI database tools like DBeaver. It's just a thin shim to translate between Postgres wire protocol to SQLite transactions so I don't think it would help with unit tests or local dev.


Awesome to see work in the DB wire compatible space. On the MySQL side, there was MySQL Proxy (https://github.com/mysql/mysql-proxy), which was scriptable with Lua, with which you could create your own MySQL wire compatible connections. Unfortunately it appears to have been abandoned by Oracle and IIRC doesn't work with 5.7 and beyond. I used it in the past to hack together a MySQL wire adapter for Interana (https://scuba.io/).

I guess these days the best approach for connecting arbitrary data sources to existing drivers, at least for OLAP, is Apache Calcite (https://calcite.apache.org/). Unfortunately that feels a little more involved.


NYTimes/DBSlayer (2007) wraps MySQL in JSON: https://open.nytimes.com/introducing-dbslayer-64d7168a143f

ODBC > Bridging configurations: https://en.wikipedia.org/wiki/Open_Database_Connectivity#Bri...

awesome-graphql > tools: security: https://github.com/chentsulin/awesome-graphql#tools---securi...

... W3C SOLID > Authorization and Access Control: https://github.com/solid/solid-spec#authorization-and-access...

"Hosting SQLite Databases on GitHub Pages" (2021) re: sql.js-httpvfs, DuckDB https://news.ycombinator.com/item?id=28021766

[edit] TIL the MS ODBC 4.0 spec is MIT Licensed, on GitHub , and supports ~challenge/response token auth: "3.2.2 Web-based Authentication Flow with SQLBrowseConnect" https://github.com/microsoft/ODBC-Specification/blob/master/...

> Applications that are unable to allow drivers to pop up dialogs can call SQLBrowseConnect to connect to the service.

> SQLBrowseConnect provides an iterative dialog between the driver and the application where the application passes in an initial input connection string. If the connection string contains sufficient information to connect, the driver responds with SQL_SUCCESS and an output connection string containing the complete set of connection attributes used.

> If the initial input connection string does not contain sufficient information to connect to the source, the driver responds with SQL_NEED_DATA and an output connection string specifying informational attributes for the application (such as the authorization url) as well as required attributes to be specified in a subsequent call to SQLBrowseConnect. Attribute names returned by SQLBrowseConnect may include a colon followed by a localized identifier, and the value of the requested attribute is either a single question mark or a comma-separated list of valid values (optionally including localized identifiers) enclosed in curly braces. Optional attributes are returned preceded with an asterix (*).

> In a Web-based authentication scenario, if SQLBrowseConnect is called with an input connection string containing an access token that has not expired, along with any other required properties, no additional information should be required. If the access token has expired and the connection string contains a refresh token, the driver attempts to refresh the connection using the supplied refresh token.

Also TIL ODBC 4.0 supports: https://github.com/microsoft/ODBC-Specification :

> Semi-structured data – Tables whose schema may not be defined or may change on a row-by-row basis

> Hierarchical Data – Data with nested structure (structured fields, lists)

> Web Authentication model


At this note are there any hosting providers that support update range for files?

If there were you could basically query and write to a SQLite database entirely through http.


You also need locking then. WebDav implements locking over HTTP but I doubt most providers will implement it... (let alone according to spec and/or whatever a MS backend expects)

https://www.sqlite.org/howtocorrupt.html


You can query and write to SQLite over http with [1] (i wrote it) but it's not through the method you describe (range for files).

https://github.com/subzerocloud/blue-steel


Well now I want a virtual table interface to pgsql so I can query Postgres databases from SQLite too!


It would be neat to have a library that has the same API as Sqlite3, but then acts as a network proxy for a remote PostgreSQL database.

You could then use that plus the above network proxy to move a Sqlite3 database off of the local system to a remote system, without having to change the existing codebase.

This would also give you multiple remote clients as well.


You could also join tables from different databases! Imagine the performance!


One could also use sshfs to access a remote SQLite DB file.

https://github.com/libfuse/sshfs


I don't believe sshfs is safe to use with concurrent access on the database: https://github.com/libfuse/sshfs/issues/204#issue-561260021


I use usql, it uses different approach but i am able to use same psql commands to query multiple different db's that is postgres,sqlite, mysql etc. I only have one gripe with it and that is its dependency on glibc which could have been avoided given that it is go application.


Postlite author here. I haven't used usql but it looks really interesting. I was looking for a way to connect existing GUI tools to remote SQLite databases which I don't believe usql supports.


Very coo! I think the most useful use case would be managing remote SQLite dbs using existing database GUI tools. One question - does it support custom collations? That might prevent us from using it if it doesn't...


Postlite author here. Yes, remote administration with existing GUI tools is the primary (and probably only) use case for Postlite. It's just a slim passthrough layer between Postgres and SQLite so there's nothing explicitly supported surrounding collations. What kind of support are you thinking?


I don't remember the details, but the framework I use uses some custom collation when creating tables in a sqlite db, and when tools like SQLite Studio is performing some kind of actions it complains something like "collation not recognized/supported"...


I was looking for something like this ! Thanks for the project


This is great!

Thanks for this, it never occurred to me that I could improve on ssh + sqlite3 on the cli, but here it is.

Great for those times when I want to check some data but don’t want to ssh in


Waiting for first benchmarks, we might have a suprise


Postlite author here. This proxy is not meant to be the primary database connection for a SQLite database. It's only use case is to support remote administration of SQLite databases using existing GUI tools like DBeaver.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: