Indeed. The whole point is to extend it with application-specific functions since the database is typically hosted in-process.
This is what we have done. We have a lot of domain-specific UDFs available in our SQLite scopes. Both scalar and aggregate types. Note that not all functions must be pure. You can quickly repurpose SQLite into a full-blown DSL by going down this path.
I often use SQLite for data analysis. Put all my raw data in, query out parts, and graph them, etc. It's quite annoying to have to dig up another tool just to get some common functions.
I love SQLite, but recently I've found that a Jupyter notebook with duckdb and pandas is a pretty powerful solution. They're well integrated and duckdb sql can refer to dataframes in scope as if they were tables.
Honest question, if I have PostgreSQL containerized in docker such that it's trivially easy to spin up a new database, what then is the use-case for SQLite? I frequently hear people talk about it with such rosy tone, but I just don't see the reason to actually set it up when the alternatives today are so easy to use, what am I missing?
You'd be surprised how many programs running on things that aren't web application servers which end up using relational queries.
I remember my first encounter with it under the hood. It was a huge list of all my location data stored on my early generation iphone, you had to pull some tricks to extract the file off of your phone and then boom I got a map of where I had been around 2009 better than even I remember it. I think there was a congressional hearing over this. Now they just store that stuff off of the device.
I mean, why reach for the cordless screwdriver when all you need is a couple turns of a Phillips head? It’s cool that it’s easy to spin up databases but that’s a lot of complexity that may not be necessary in all cases.
I'm assuming you are plotting in python or R, is there an advantage here using sql queries instead of subsetting a flat file with some function and plotting these subsets?
gnuplot. Yes, like I mentioned JOIN between two data sets is terrifically useful and quite a pain to write iteratively. Do you have another approach for doing joins?
What language do you write your extensions in? what language is your codebase? and what are some non-deterministic UDFs you've written? Are they just randomness related or are there other types of non deterministic UDFs that you use?
The "lack of functions" is what allows SQLite to be what it is: a small, efficient SQL database that works on everything from mainframes to a watch. The fact it doesn't have every possible feature you want or need is not a deficiency of SQLite: if you need tons of features then, by definition, you're outside the niche of what SQLite was built to handle and you need a large, full-featured database server.
SQLite isn't that small, about 1.3M compiled as a shared library on my system, and the "amalgamation file" is 240k lines of code, or 8.2M (160k lines w/o comments). I don't think a bunch of functions will add that much to it; it's probably more about developer time, maintenance burden, etc.
> it's probably more about developer time, maintenance burden, etc.
I mean... SQLite is famous for having almost a thousand lines of test code per line of application code. So yeah, I think they view the functions not just in terms of implementing, but also in terms of a commitment to robust maintenance.
> it's probably more about developer time, maintenance burden, etc.
That's a large part of it, yes. The sqlite project has only 2 full-time core developers and, AFAIK, only 3 current part-timers who tend to work on more "outer rim" features like the shell app, the .Net bindings, and the new WASM/JS parts.
Richard (sqlite architect/lead) has an explicit goal of supporting sqlite through at least 2050, and sqlite has a long history of strong backwards compatibility guarantees. Every feature (e.g. new SQL-facing function) added to it may well have to be maintained until 2050 by that small group of people. Thus Richard is necessarily extremely selective about what goes in.
TFA says a bunch of math functions are included but aren’t built by default.
I’m wondering if they have a probation (—edit— prohibition even, stupid spellcheck) on new utility functions (so they would only have to be written once and everyone would benefit) hidden behind feature flags or if nobody has bothered to send in a patch?
Also wondering if there’s a benefit to having the database call the functions over just doing it yourself in whatever language you are using? Not a db person so don’t know.
TFA is kinda wrong about them being disabled by default. They are behind a define macro, but that macro is enabled by default on the makefiles that ship with SQLite. I'd certainly expect them to be available in a normal build of SQLite.
> if nobody has bothered to send in a patch
Quoting from SQLite:
> SQLite is open-source, meaning that you can make as many copies of it as you want and do whatever you want with those copies, without limitation. But SQLite is not open-contribution.
They pretty tightly control who can contribute to the code base for a few reasons .. they don't accept patches from the outside.
SQLite also has some weird (non-standard?) behavior when aggregating. If you forget to include a GROUP BY, SQLite will just pick a random value from the non-aggregate columns. For example:
SELECT col1, COUNT(1) FROM table1;
happily returns a single row of data.
I struggle to understand why that is preferable to throwing an error?
MySQL also does this. Personally I find it quite useful - I can avoid indexing costs or table scanning on things where either A: I know from an application level that any value is the same as all others for my query, or B: when any value is equally acceptable (e.g. to show an example result).
Thanks for the rabbit hole. The faq linked golang CGo issue [0] features `mattn/go-sqlite` too. It appears there is a particular impact for repeatedly calling any C that takes longer than ~20µs but that issue doesn't really try to quantify general CGo impact.
In my entirely unscientific go test on 100000 incremented integers, it's look like "some" impact.
If Sqlite is missing functionality for your particular use-case, there is probably an extension that adds it. If not, you can write your custom extension.
If you are desperate for greater functionality, you can install the free Oracle XE database, then write whatever functions you want in pl/SQL.
Oracle "external tables" will allow you to see your CSVs directly, without creating a physical table within the database or otherwise using resources. If you actually have SQLite databases, you can join them in Oracle directly with dg4odbc inserted into your listener.
Note that the SQL/PSM standard is based on Oracle pl/SQL, and many databases implement it; SQLite doesn't.
"SQL/PSM is derived, seemingly directly, from Oracle's PL/SQL. Oracle developed PL/SQL and released it in 1991, basing the language on the US Department of Defense's Ada programming language."
EnterpriseDB sells a version of Postgres with deep Oracle compatibility, but this emulation is not included in the free version. Many years ago, IBM bundled this code into db2, and it remains jointly developed as far as I know.
The EnterpriseDB sales page for this product would be a good start on what's missing.
Okay, but that wasn't really my question. You recommend Oracle XE; I never used Oracle DB so I'm wondering what technical reasons there are to choose Oracle over PostgreSQL?
The general attitude towards Oracle tends to be quite negative on HN. Maybe for good reason, but I'd like to hear a different perspective and this seemed like a good opportunity.
Oracle XE will have the deepest SQL/PSM implementation of any freely-available database, without question. The list of standard functions, procedures and packages is enormous.
This is a double-edged sword. If you use these (proprietary) features and you outgrow XE, then you have to buy a license. XE also doesn't get quarterly patches - it's updated every couple of years (hardening is essential).
Oracle Enterprise retails for $47,500/CPU core. The stripped-down SE2 falls to about $18k.
Definitely beware of the lock-in, but XE likely remains the best for procedural programming in SQL.
For the extremely Oracle-averse, there still may be some value in running it alongside Postgres(with the SQL/PSM extension) and MySQL to verify that your code is portable.
EDB Postgres Advanced Server is an enhanced version of PostgreSQL that offers compatibility with Oracle in the following areas:
Oracle-specific and syntax-compatible database object types
Oracle-specific data types
Oracle-specific SQL extensions
Oracle PL/SQL support as a built-in native procedural language
Oracle data dictionary views (i.e., ALL_, DBA_, USER_ views)
Oracle built in PL/SQL packages
EDB Postgres Advanced Server doesn't fully implement all Oracle-specific features in these areas. Nevertheless, the compatibility is extensive and covers many of the most commonly used Oracle constructs.
You can directly attach SQLite databases into an Oracle instance via dg4odbc. At this point, you have a much more expansive SQL dialect that you can use to query your data (I've never tried this in the XE edition, and when it works it can be somewhat brittle and quick to throw errors).
Granted, most databases can do this. However, if you are specifically interested in SQL/PSM, you really can't do better than Oracle.
Fyi, Microsoft/Sybase do not implement SQL/PSM at all; they use Transact-SQL. I've read that IBM db2 is the best implementation outside Oracle, but EnterpriseDB is likely identical.
IBM db2 udb for Unix/Windows happens to have a community edition.
This is what we have done. We have a lot of domain-specific UDFs available in our SQLite scopes. Both scalar and aggregate types. Note that not all functions must be pure. You can quickly repurpose SQLite into a full-blown DSL by going down this path.
See: https://www.sqlite.com/appfunc.html
And: https://docs.microsoft.com/en-us/dotnet/standard/data/sqlite...