SQLite has pretty limited builtin functions

bob1029 · on Aug 22, 2022

Indeed. The whole point is to extend it with application-specific functions since the database is typically hosted in-process.

This is what we have done. We have a lot of domain-specific UDFs available in our SQLite scopes. Both scalar and aggregate types. Note that not all functions must be pure. You can quickly repurpose SQLite into a full-blown DSL by going down this path.

See: https://www.sqlite.com/appfunc.html

And: https://docs.microsoft.com/en-us/dotnet/standard/data/sqlite...

sicp-enjoyer · on Aug 23, 2022

I often use SQLite for data analysis. Put all my raw data in, query out parts, and graph them, etc. It's quite annoying to have to dig up another tool just to get some common functions.

Nihilartikel · on Aug 23, 2022

I love SQLite, but recently I've found that a Jupyter notebook with duckdb and pandas is a pretty powerful solution. They're well integrated and duckdb sql can refer to dataframes in scope as if they were tables.

bluejellybean · on Aug 23, 2022

Honest question, if I have PostgreSQL containerized in docker such that it's trivially easy to spin up a new database, what then is the use-case for SQLite? I frequently hear people talk about it with such rosy tone, but I just don't see the reason to actually set it up when the alternatives today are so easy to use, what am I missing?

jamal-kumar · on Aug 23, 2022

You'd be surprised how many programs running on things that aren't web application servers which end up using relational queries.

I remember my first encounter with it under the hood. It was a huge list of all my location data stored on my early generation iphone, you had to pull some tricks to extract the file off of your phone and then boom I got a map of where I had been around 2009 better than even I remember it. I think there was a congressional hearing over this. Now they just store that stuff off of the device.

chomp · on Aug 23, 2022

I mean, why reach for the cordless screwdriver when all you need is a couple turns of a Phillips head? It’s cool that it’s easy to spin up databases but that’s a lot of complexity that may not be necessary in all cases.

duped · on Aug 23, 2022

Because it's even easier to embed sqlite, that's why. Docker is a big and heavy tool that has nonzero baggage. SQLite is a single .c/.h file.

jbverschoor · on Aug 23, 2022

Single file Easy to embed No ports, no sockets Easy to back up

orthecreedence · on Aug 23, 2022

If you were to ship a desktop app that stored its local data in relational format, would you bundle docker and postgres along with it?

asdff · on Aug 23, 2022

How large are your data? I'm thinking about implementing something similar but I'm not sure if its worth the time versus reading in flat files.

sicp-enjoyer · on Aug 23, 2022

Not big (a few GBs?). But being able to query is one easy way to get join and statistics.

asdff · on Aug 23, 2022

I'm assuming you are plotting in python or R, is there an advantage here using sql queries instead of subsetting a flat file with some function and plotting these subsets?

sicp-enjoyer · on Aug 24, 2022

gnuplot. Yes, like I mentioned JOIN between two data sets is terrifically useful and quite a pain to write iteratively. Do you have another approach for doing joins?

srcreigh · on Aug 22, 2022

What language do you write your extensions in? what language is your codebase? and what are some non-deterministic UDFs you've written? Are they just randomness related or are there other types of non deterministic UDFs that you use?

mikece · on Aug 22, 2022

The "lack of functions" is what allows SQLite to be what it is: a small, efficient SQL database that works on everything from mainframes to a watch. The fact it doesn't have every possible feature you want or need is not a deficiency of SQLite: if you need tons of features then, by definition, you're outside the niche of what SQLite was built to handle and you need a large, full-featured database server.

Beltalowda · on Aug 22, 2022

SQLite isn't that small, about 1.3M compiled as a shared library on my system, and the "amalgamation file" is 240k lines of code, or 8.2M (160k lines w/o comments). I don't think a bunch of functions will add that much to it; it's probably more about developer time, maintenance burden, etc.

lmkg · on Aug 23, 2022

> it's probably more about developer time, maintenance burden, etc.

I mean... SQLite is famous for having almost a thousand lines of test code per line of application code. So yeah, I think they view the functions not just in terms of implementing, but also in terms of a commitment to robust maintenance.

sgbeal · on Aug 23, 2022

> it's probably more about developer time, maintenance burden, etc.

That's a large part of it, yes. The sqlite project has only 2 full-time core developers and, AFAIK, only 3 current part-timers who tend to work on more "outer rim" features like the shell app, the .Net bindings, and the new WASM/JS parts.

Richard (sqlite architect/lead) has an explicit goal of supporting sqlite through at least 2050, and sqlite has a long history of strong backwards compatibility guarantees. Every feature (e.g. new SQL-facing function) added to it may well have to be maintained until 2050 by that small group of people. Thus Richard is necessarily extremely selective about what goes in.

UncleEntity · on Aug 23, 2022

TFA says a bunch of math functions are included but aren’t built by default.

I’m wondering if they have a probation (—edit— prohibition even, stupid spellcheck) on new utility functions (so they would only have to be written once and everyone would benefit) hidden behind feature flags or if nobody has bothered to send in a patch?

Also wondering if there’s a benefit to having the database call the functions over just doing it yourself in whatever language you are using? Not a db person so don’t know.

banana_giraffe · on Aug 23, 2022

TFA is kinda wrong about them being disabled by default. They are behind a define macro, but that macro is enabled by default on the makefiles that ship with SQLite. I'd certainly expect them to be available in a normal build of SQLite.

> if nobody has bothered to send in a patch

Quoting from SQLite:

> SQLite is open-source, meaning that you can make as many copies of it as you want and do whatever you want with those copies, without limitation. But SQLite is not open-contribution.

They pretty tightly control who can contribute to the code base for a few reasons .. they don't accept patches from the outside.

sicp-enjoyer · on Aug 22, 2022

how much code do you think some stats functions would add?

tqi · on Aug 22, 2022

SQLite also has some weird (non-standard?) behavior when aggregating. If you forget to include a GROUP BY, SQLite will just pick a random value from the non-aggregate columns. For example:

SELECT col1, COUNT(1) FROM table1;

happily returns a single row of data.

I struggle to understand why that is preferable to throwing an error?

Groxx · on Aug 22, 2022

MySQL also does this. Personally I find it quite useful - I can avoid indexing costs or table scanning on things where either A: I know from an application level that any value is the same as all others for my query, or B: when any value is equally acceptable (e.g. to show an example result).

wruza · on Aug 22, 2022

https://www.sqlite.org/lang_select.html

Section 2.5, starting with:

Special processing occurs when the aggregate function is either min() or max()

mhio · on Aug 22, 2022

Does anyone have an idea of the performance impact of calling back and forth to Go from sqlite rather than C?

justsomeuser · on Aug 23, 2022

https://dqlite.io/docs/faq, see "Why C"

mhio · on Aug 23, 2022

Thanks for the rabbit hole. The faq linked golang CGo issue [0] features `mattn/go-sqlite` too. It appears there is a particular impact for repeatedly calling any C that takes longer than ~20µs but that issue doesn't really try to quantify general CGo impact.

In my entirely unscientific go test on 100000 incremented integers, it's look like "some" impact.

    inserts took:     891.307359ms
    count: 100000       6.76659ms   SELECT count(integer) from itable
    lower one: 1111    31.708µs     SELECT lower(integer || integer || integer || integer) from itable
    repeat one: 1111   64.275µs     SELECT repeat(integer, 4) from itable
    lower4:           149.027976ms  SELECT lower(integer || integer || integer || integer) from itable
    repeat4:          507.223713ms  SELECT repeat(integer, 4) from itable
    concat4:          132.156995ms  SELECT integer || integer || integer || integer from itable

It probably doesn't matter a huge amount for the ad hoc `dsq` use, having them available is better than not.

[0]: https://github.com/golang/go/issues/19574

srgpqt · on Aug 22, 2022

If Sqlite is missing functionality for your particular use-case, there is probably an extension that adds it. If not, you can write your custom extension.

brockrockman · on Aug 23, 2022

I've always loved how simple adding to sqlite is in Tcl:

  db function regexp {regexp --}

NonNefarious · on Aug 23, 2022

It's "built-in."

chasil · on Aug 22, 2022

If you are desperate for greater functionality, you can install the free Oracle XE database, then write whatever functions you want in pl/SQL.

Oracle "external tables" will allow you to see your CSVs directly, without creating a physical table within the database or otherwise using resources. If you actually have SQLite databases, you can join them in Oracle directly with dg4odbc inserted into your listener.

Note that the SQL/PSM standard is based on Oracle pl/SQL, and many databases implement it; SQLite doesn't.

"SQL/PSM is derived, seemingly directly, from Oracle's PL/SQL. Oracle developed PL/SQL and released it in 1991, basing the language on the US Department of Defense's Ada programming language."

https://en.m.wikipedia.org/wiki/SQL/PSM

mhio · on Aug 22, 2022

At what point does the licensing kick in and the lawyers turn up on your door?

Beltalowda · on Aug 22, 2022

Why Oracle XE and not PostgreSQL? What advantages does Oracle XE have over it?

chasil · on Aug 22, 2022

EnterpriseDB sells a version of Postgres with deep Oracle compatibility, but this emulation is not included in the free version. Many years ago, IBM bundled this code into db2, and it remains jointly developed as far as I know.

The EnterpriseDB sales page for this product would be a good start on what's missing.

https://www.enterprisedb.com/why-edb/leave-oracle-for-postgr...

Beltalowda · on Aug 22, 2022

Okay, but that wasn't really my question. You recommend Oracle XE; I never used Oracle DB so I'm wondering what technical reasons there are to choose Oracle over PostgreSQL?

The general attitude towards Oracle tends to be quite negative on HN. Maybe for good reason, but I'd like to hear a different perspective and this seemed like a good opportunity.

chasil · on Aug 22, 2022

Oracle XE will have the deepest SQL/PSM implementation of any freely-available database, without question. The list of standard functions, procedures and packages is enormous.

This is a double-edged sword. If you use these (proprietary) features and you outgrow XE, then you have to buy a license. XE also doesn't get quarterly patches - it's updated every couple of years (hardening is essential).

Oracle Enterprise retails for $47,500/CPU core. The stripped-down SE2 falls to about $18k.

Definitely beware of the lock-in, but XE likely remains the best for procedural programming in SQL.

For the extremely Oracle-averse, there still may be some value in running it alongside Postgres(with the SQL/PSM extension) and MySQL to verify that your code is portable.

marcosdumay · on Aug 22, 2022

So you mean that the feature Oracle has that Postges is missing is Oracle compatibility (unless you install it from a 3rd party)?

That page doesn't list features. It only shows you tools to help migrating.

chasil · on Aug 22, 2022

There used to be PDF details there.

Here is a synopsis, there is probably more.

EDB Postgres Advanced Server is an enhanced version of PostgreSQL that offers compatibility with Oracle in the following areas:

Oracle-specific and syntax-compatible database object types

Oracle-specific data types

Oracle-specific SQL extensions

Oracle PL/SQL support as a built-in native procedural language

Oracle data dictionary views (i.e., ALL_, DBA_, USER_ views)

Oracle built in PL/SQL packages

EDB Postgres Advanced Server doesn't fully implement all Oracle-specific features in these areas. Nevertheless, the compatibility is extensive and covers many of the most commonly used Oracle constructs.

https://www.enterprisedb.com/docs/migrating/oracle/migration...

(If you don't care for Oracle, don't use their database. However, SQL/PSM is the ISO PSM-96 standard; using this dialect will be maximally portable.)

sigzero · on Aug 22, 2022

Windows and Linux only. So that's out.

chasil · on Aug 23, 2022

You can directly attach SQLite databases into an Oracle instance via dg4odbc. At this point, you have a much more expansive SQL dialect that you can use to query your data (I've never tried this in the XE edition, and when it works it can be somewhat brittle and quick to throw errors).

Granted, most databases can do this. However, if you are specifically interested in SQL/PSM, you really can't do better than Oracle.

Fyi, Microsoft/Sybase do not implement SQL/PSM at all; they use Transact-SQL. I've read that IBM db2 is the best implementation outside Oracle, but EnterpriseDB is likely identical.

IBM db2 udb for Unix/Windows happens to have a community edition.

https://www.ibm.com/db2/trials