Hey all, author here. I didn't expect this to reach front page of HN. I came here to submit and it was here already! I am looking for more ideas to experiment. Here's one idea which someone from another forum gave me: exploring recursive queries and using SQLite random methods to do the insertions.
Another crazy idea is to learn about SQLite file format and just write the pages to disk.
libeatmydata shouldn't be used in production environments, generally speaking, as it biases for speed over safety (lib-eat-my-data), by disabling fsync, and associated commands for the running process under it. Disabling those commands results in less I/O pressure, but comes with the risk that the program thinks it has written safely and durably, and that may not be true. It essentially stops programs that are written to be crash proof, from being actually crash proof. Which under the circumstances you're operating on you almost certainly don't care.
I've used this when reloading database replicas from a dump from master before, as it drastically speeds up operations there.
Overlayfs offers libeatmydata functionality in recent linux kernels without having to intercept libc, so it should work for things that use raw syscalls too.
Also great for stuff like package/container builds where you're not worried about a consistent intermediate state— it either succeeds and you get the end result you want or it fails and you only care about logs.
Didn't go very far down that track though, as the mostly single threaded nature of writes in SQLite seemed to prevent that from really achieving much. Well, I _think_ that's why it didn't really help. ;)
This is a heavily subsetted implementation of SQLite3 that can read/write databases (presumably on SD cards) from very small microcontrollers.
It presumably doesn't have the same ACID compliance properties, but with a single <1.5k source file, may represent a particularly efficient way to rapidly learn the intrinsics if you happened to want to directly manipulate files on disk.
Now I'm thinking it could actually be interesting to see what drh thinks of this implementation (and any gotchas in it) because of its small size and accessibility.
I did similar Rust/Python record generation experiments, and I can relate to these numbers.
However, I think your Rust threaded trial might be a little bit off in subtle ways. I would truly expect it to perform about several times better than single threaded Rust and async Rust (async is generally slower on these workloads, but still faster than Python)
Edit : After reading your rust code, you might have room for some improvements :
- don’t rely on random, simply cycle through your values
- pre-allocate your vecs with ˋwith_capacityˋ
- dont use channels, prefer deques..
- ..or even better, don’t use synchronization primitives and open one connection per thread (not sure if it will work with sqlite?)
I am also interested in writing the SQLite or PostgreSQL file format straight to disk as a faster way to do ETL. I'd be curious to hear if anyone has actually tried this. I'd be as curious if you end up trying this too.
I really like sqlite but I'm curious how fast it would be to write the data to parquet files and read it with duckdb. I've done something similar before and in my case duckdb was faster than sqlite.
I'm curious how you'd fair if you pregenerated the data, with periodic COMMITs, and then just sling it at the sqlite3 binary over STDIN and see how it handled that.
Certainly if I was trying to do the same thing in Pg my first thought would be "batched COPY commands".
You can go even faster if you can organize your problem such that you simply start from a copy of a template SQLite database each time (it is just a file/byte sequence).
We do SQL evaluation for a lot of business logic throughout our product, and we have found that starting from a template database (i.e. one with the schema predefined and canonical values populated) can save a lot of time when working in tight loops.
An SQLite database is just a file. You can build the empty database with schema and base values ahead of time, save it to a file (or an in-memory byte buffer) and then every time you want to create a new database, you just copy that file. No need to do any expensive initialization queries that way. If raw high-speed throughput is needed, skipping that step can make a significant difference.
I guess if you wanted the fastest creation you could make a custom backend format for sqlite and use that. Especially if query speed was not important.
Assuming SQLite has any internal counters or indexes, let alone B-trees or anything fancy, then this approach won't work. It will work for a raw record format (CSV, JSON, Protobuf, an mmapped array, etc.), but the author wants to actually interact with a real SQLite database. Generating a billion rows in some non-SQLite format still leaves problem to converting that format by loading it into SQLite, which isn't really a reduction of the original problem.
Yes. Start from known-good database and then update or insert with needed deltas. You could even have more than you need in the template file and truncate/drop what is not relevant for the current context. Depending on the situation, it might be faster to start from a database containing all of the possible things.
This makes sense. Just do a copy or copy-on-write to an existing database file; you could even have the actual bytes of an empty (or pre-DDL'ed) sqlite file in memory in your app, rather than needing to do a disk copy.
Database optimization posts are always interesting, but it's really hard to do any apples to apples comparison. Your performance is going to depend mostly on your hardware, internal database settings and tunings, and OS level tunings. I'm glad this one included some insight into the SQLite settings disabled, but there's always going to be too many factors to easily compare this to your own setup.
For most SQL systems, the fastest way to do inserts is always just going to batched inserts. There's maybe some extra tricks to reduce network costs/optimize batches [0], but at it's core you are still essentially inserting into the table through the normal insert path. You can basically then only try and reduce the amount of work done on the DB side per insert, or optimize your OS for your workload.
Some other DB systems (more common in NoSQL) let you actually do real bulk loads [1] where you are writing direct(ish) database files and actually bypassing much of the normal write path.
Oracle, DB2, MySQL, SQL Server, and PostgreSQL all support bulk insert. Two obvious use cases are QA Databases and the L part of ETL which pretty much require it.
This is an important comment. It's not a typo in the article, even the source code only does 100M rows, not 1B. The author definitely does not hit the target of 1B rows in a minute.
The author doesn’t say a billion is 100 million. They say they’d like to be able to insert a billion, and say they’re able to insert 100 million. It’s not a contradiction.
Hey, sorry for the misleading title. I started with ‘Fast SQLite Inserts’ and it had many iterations. In the title, I wanted to intend that I want to insert 1 billion rows under a minute on my machine. I thought the current title is fine, since I got LGTM for earlier drafts. The detail about on my machine is also important since mine is a two year old laptop and all the measurements are done on it.
Also I got another feedback that title should indicate that it is a test database and emphasise that it is not durable.
I am wondering the right way to convey all of this in the title yet also keep it short.
Thanks for the explanation. I think it only needs a minor tweak. Maybe prefix with “trying to” or something like that. I am empathetic to the challenge of naming things concisely and accurately.
I agree. While the title reflects the eventual goal of the effort, the goal has yet to be achieved (and may or may not be achievable at all). I think it’s a bit irresponsible to use a title like that for a post that neglects to have achieved what was described in the title.
For other languages it's a big "it varies", though the second definition seems to be the most common. The term "billion" is honestly, as ambiguous as using "06-03" for a date.
Also note that, historically, English also followed the second definition, so for old literature it's also confusing.
Yeah, that's my interpretation of a billion too. I vaguely recall that India or Britain interprets a billion differently though. Maybe that's what they're thinking?
I think it's the same in UK English as well. But in some (most?) European languages billion actually means 1 000 000 million (so a thousand times more). And we use "milliard" for 1000 million.
He hasn't reached the goal yet, and is currently doing 100 million in ~34 seconds.
> Looking forward to discussions and/or collaborations with curious souls in my quest to generate a billion record SQLite DB quickly. If this sounds interesting to you, reach out to me on Twitter or submit a PR.
Just about everyone in the UK defines a billion to be 10^9. I've heard about the 10^12 definition but never encountered anyone who uses it - I think it must have been an older usage that fell out of favour.
In 1974, the UK government abandoned the long scale (million, milliard, billion, billiard) in favor of the short scale (million, billion, trillion, quadrillion) used in the US. The long scale is still used in languages like Dutch, German, and French.
I have to assume this was rejected as a valid approach, but if not, this whole thread gets 10x more interesting than it already was. I hope the author responds.
Well, even if you just insert zeros instead of random values, it takes 9 seconds on my computer to insert 100M rows, so even that is not a 1B rows per minute.
And I think INSERT INTO ... SELECT is the fastest way to bulk insert data into sqlite.
Also, I have tried to use carray sqlite feature that allow to share memory with sqlite and use recursive CTE to query it, but it is slower. Though, you can pass values you've generated from Rust instead of using random().
My run on NixOS 21.05 with sqlite 3.35.2, without randomness, on an i7-7500U:
Invocation:
command time sqlite3 ':memory:' '
create table IF NOT EXISTS user
(
id INTEGER not null primary key,
area CHAR(6),
age INTEGER not null,
active INTEGER not null
);
INSERT INTO user (area, age, active) SELECT 0, 1, 2 FROM generate_series(1, 100000000, 1);
'
command time sqlite3 ':memory:' '
PRAGMA journal_mode = OFF;
PRAGMA synchronous = 0;
PRAGMA cache_size = 1000000;
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA temp_store = MEMORY;
create table IF NOT EXISTS user
(
id INTEGER not null primary key,
area CHAR(6),
age INTEGER not null,
active INTEGER not null
);
INSERT INTO user (area, age, active) SELECT 0, 1, 2 FROM generate_series(1, 100000000, 1);
'
Since it is only 100M rows, it takes 1.8 GB on the disk, so I've used tmpfs for this which essentially is a ramdisk. But I have a gen4 pcie nvme SSD - it can reliably write at 4GB/s sequentially, so writing takes a ~500ms for 100M rows, it is not a bottleneck here. random() takes ~half of the insert time. Generating those values with Rust, for example, is faster, but sharing this data with sqlite takes more time than generating it with random().
Maybe implementing custom virtual table in C or Rust like build-in generate_series, but the one that will produce user table fields will be faster, but that is significantly more effort than my query.
This query with random() and generate_series executed in sqlite CLI takes whooping 8MB of the RAM, so you don't even have to close all Electron-based applications to run it on a computer with 8GB of RAM.
I was actually just working on SQLite speed for Objective-S (http://objective.st), partly as a driver for getting some of the more glaring inefficiencies out.
Using a "to do list" schema, I currently get the 100M rows out in 56 seconds, which is around half the speed of the Rust example given here, but 3 times the speed of PyPy and almost 10x faster than Python.
This is from an interpreted script that not only does the inserts and creates the objects to insert in the first place, but also defines the actual class.
The lower-level code is written in Objective-C, like the rest of Objective-S.
Class definition:
class Task {
var <int> id.
var <bool> done.
var <NSString> title.
-description { "<Task: title: {this:title} done: {this:done}>". }
+sqlForCreate {
'( [id] INTEGER PRIMARY KEY, [title] NVARCHAR(220) NOT NULL, [done] INTEGER );'.
}
}.
Code to insert a computed array of tasks 10 times:
It is a good question, but I'm guessing that the whole reason for creating the database in the first place is to test interactions with the database, in which case you'd have to mock up all the interactions being tested (selects, deletes, joins, etc.).
There are some domains like building cluster managers where having a relational view of the cluster state and being able to query/manipulate it declaratively has significant perks. See for example: https://www.usenix.org/system/files/osdi20-suresh.pdf (disclaimer: I'm an author)
Sqlite provides a full text search module (FTS5) which provides a lot of nice search features. I have used this numerous times to build serverless apis for large static datasets for frontend analytical experiences like data tables without setting up elasticsearch. Thats one use case.
Another is they support this closures.c extension which is very nice for rapid queries on tree structured data locally in memory. The JSON1 extension is also nice for rapidly querying/reshaping deeply nested json data. Theres also spellfix1 that can provide fuzzing capabilities. If you need any of these with low latency and in memory its a great choice.
Sqlite is great for rapidly building low latency static data serving services for frontend experiences. Something I’m exploring now is combining sqlite with streamlit to rapidly build data exploration UIs.
Like how many times have you wanted to quickly add fuzzy matching or full text search to some program? You use fuzzywuzzy but its pretty slow, sqlite provides performant implementations of this stuff thats super simple to set up.
What do you think a relational database does to provide you the querying flexibility? It builds the same exact data structures, but does it automatically and dynamically, while offering much more usability.
Most attempts to query using raw data structures just means you end up rebuilding a (very poor) relational database with none of the features.
If the set of queries you want to run is fixed you can probably beat SQLite with a lot of work, if the queries are dynamic I doubt that you can do much better than SQLite without herculean efforts. A in-memory database is thus a good option if you either don't care too much about the runtime of your queries and you want to save a bunch of development time, or if you don't know enough about your workload to be able to beat a general purpose solution.
>you can =probably= beat SQLite with =a lot of= work,
Actually, I am beyond certain. When it's all about the memory no database comes even remotely close to a properly picked datastructures + structure/objects layout.
If I need transaction log + persistence, databases have a decent application.
In more than 20y, I have never had a case: Yay, I can use relation structures in memory b/c I don't know what I am going to do with the data.
For the case where I want the ability to run one-off analytics queries, having access to the usual set of join/filter/summarize/etc. operations is a lot more convenient and less error-prone than having to manually write them. But dplyr [1] is my go-to rather than SQLite personally for in-memory data sizes.
For something like redis, but with more structure.
But a pretty good use case (IMO) is testing. If you want to do an integration test with an SQL database, and you want to test large numbers, this might be a good fully functional stub to run locally.
Redis is in pretty much the same category. Testing is sort of a valid case, if you are committed to write pure SQL with minimal use of any dialect specifics (but even 'create table' syntax might be different). Running on the real thing is close to no replacement when it comes to databases.
Many databases have docker images nowadays, so it aint hard to run them locally. Likely at a point you'd want to optimize the SQL, itself, rendering the tests incompatible.
It's very likely it wouldn't. (Decent) analytical DBMSes have a whole lot up their sleeves which just choosing one plain-vanilla data structure for your table doesn't even approach in terms of performance. The benefit may be well upwards of 10x in many real-world scenario.
Of course, if you know you get one single query which you know in advance, carefully build a data structure to cater just to that, and you know your data structures beyond the just the basics - then, yes, a DBMS would be overkill. But it still won't be a walk in the park.
If you don't do any joins, Clikhouse is worth a try. Otherwise, MonetDB (or even MonetDB embedded) if you want FOSS. Commercial offerings - Actian Vector, or maybe HyperDB. The big corporate ones are probably pretty good, but it's always difficult to tell because there's this taboo on any proper benchmarking against them.
If you just want to browse what other people are using, and not go by the recommendations of a random commenter, try:
If you insert 4 rows into a table, and join it on itself without any conditional criteria, it will result in a cartesian product of 16 rows (Select * from a, a)
Do that cartesian join three more times, and you have over 4 billion results.
Then you simply need to use the random function in conjunction with division, rounding and case statements to get the desired random numbers.
Yep - I discovered most of this a while back when I needed to do a bunch of exploring of ~200m rows of data (like 50GB). I chucked it into SQLite and was done with everything in a day, it was absolutely wonderful. This post is a pretty good summary of what to change and why.
Prepared statements should be considered the norm for SQLite - they have pretty major performance benefits, and any decent ORM or query engine can probably do it for you implicitly or with a single flag, so it's practically free for many applications. I'm always surprised how often I see applications or benchmarks or "how to be fast with SQLite" not using them, so I'm definitely glad to see them covered here :)
I've been dealing with lots of data and SQLite and my outtakes are:
- language has very little to do since the bottleneck will most likely be the way you insert data
- indeed prepared statements are useful but the performance didn't change much when I did long transactions and commit every certain amount of thousand of rows
- having lots of rows in your table is good but certain queries, like aggregation over many rows, are not what SQLite is great about.
- ClickHouse can easily ingest that and more in a laptop without even any scripting language.
/fast-sqlite3-inserts (master)> time make busy-rust
Sun Jul 18 17:04:59 UTC 2021 [RUST] busy.rs (100_000_000) iterations
real 0m9.816s
user 0m9.380s
sys 0m0.433s
________________________________________________________
Executed in 9.92 secs fish external
usr time 9.43 secs 0.20 millis 9.43 secs
sys time 0.47 secs 1.07 millis 0.47 secs
fast-sqlite3-inserts (master)> time make busy-rust-thread
Sun Jul 18 17:04:48 UTC 2021 [RUST] threaded_busy.rs (100_000_000) iterations
real 0m2.104s
user 0m13.640s
sys 0m0.724s
________________________________________________________
Executed in 2.33 secs fish external
usr time 13.68 secs 0.20 millis 13.68 secs
sys time 0.78 secs 1.18 millis 0.78 secs
I'm probably doing something wrong. Or I'm getting the pace needed for the billion?
CREATE TABLE user(
pk INTEGER PRIMARY KEY,
area INTEGER,
age INTEGER,
active INTEGER
);}
for {set i 0} {$i < $loop} {incr i} {
puts "
WITH RECURSIVE tmp(pk,area,age,active) AS
(
SELECT [expr {$i * $increment}],
500001,
5,
0
UNION ALL
SELECT pk+1,
10000*(abs(random())%9+1)+abs(random())%100000,
5*(abs(random()%3)+1),
abs(random()%2)
FROM tmp
WHERE pk < [expr {($i+1)*$increment-1}]
)
INSERT INTO user
SELECT * FROM tmp;"
}
I'm pretty new to this sort of optimization stuff, so forgive me if this is ridiculous for any reason, but I'm wondering about two things reading this:
1: is it useful / feasible to set a sort of "lower bound" to these optimizations by profiling the raw write time on a set of a certain size?
2. assuming that writes are ultimately the limiting factor, could you gain performance by calculating the minimum batch size as it's write time intersects in-memory calculation, and as soon as you hit that size, pop a thread to write that batch to disk - then return some async signal when the write is done to trigger the next batch of writes? is it possible to "stitch together" these batches post-writes?
From my armchair here I thought something similar, I think I heard 1) that inserting into the same table has some speed limitations and 2) one of the authors ideas was to spin up 4 processes…
I wonder if you could insert into 10 different tables from 10 threads or processes then
insert into single_table (…) select (…) union.
No idea if insert select does the trick or not but you’re almost at the point of partitioning here. If the application called for it you could do some sort of poor mans partition in order to write all the data and then push some complexity into the read.
I also wondered what if any impact the ID primary key had on inserting at the level of frequency.
Just tried to insert 100M KV pairs of the same data format in a KV store. It takes about 26s on a Xeon Silver 4210, including fully writing down the data to the filesystem/SSD.
>The generated data would be random with following constraints: The area column would hold six digits area code (any six digits would do, no validation). The age would be any of 5, 10, or 15. The active column is either 0 or 1.
That is about 6 million combinations. This is not such a big space that it would be impractical to precompute all possible combinations. I wonder if hashing in to such a precomputed table might help. Might not if the overhead of maintaining the lookup table is too high.
Webapps (or webapp frameworks) like Nextcloud and Django, come, by default, with SQLite. That makes it easy to start, but they all warn you: When you get serious, get a real DB like MariaDB or PostgreSQL... And with stuff like this I wonder, is this really necessary?
I really like SQLite, it's well supported in Python, backup-restore is very simple. What is the real difference (probably application depended) between SQLite and a "real" DB engine?
Hmm, never thought of this :) I come more from a selfhosting angle. And my django website is prototype only so far… But it is a point I never considered so thanx.
I wonder at what scales this starts to count (in human readable units ;))
(It would likely be faster for other DB engines, because there is network overhead in the communication between the program and the DB; no such things for sqlite).
I would recommend looking into dtrace or other observability tools to figure out what to optimize next. So far you have basically had to guess where the slowness is and optimize that.
Lastly, how long does it take to make a copy of an already prepared 1b row SQLite file?that seems easier than generating a new one.
I wonder how much I could speed up my test suites (that don't rely on transaction rollbacks) by disabling journalling. Those milliseconds per query add up to seconds per tests and minutes per PR.
I tried using tmpfs to create and exercise a file-based SQLite DB, and it performed significantly worse than a RAM-based DB (opened with an empty string or ':memory:').
Maybe, maybe not. I imagine writing directly to memory will always be faster than writing to a filesystem, even if it’s ultimately backed by RAM. Might be a negligible difference though.
> Recently, I ran into a situation where I needed a test database with lots of rows and needed it fast. So I did what any programmer would do: wrote a Python script to generate the DB.
The author must have a rather low opinion of programmers... generating data with a script is good for 1,000 rows. Maybe 10,000 rows. 100,000 is probably stretching it. Beyond that the slowness will be meaningful.
Anyway, if the "test database" data is given in advance, you want an in-memory DBMS which can load data from files. If it isn't - use an in-memory DBMS which supports generated columns, or apply your random-number-generating function on a trivial column.
(MonetDB standalone/embedded would be a decent option; maybe the columnar storage mechanism in MariaDB? And of course there are commercial offerings.)
> Unfortunately, it was slow. Really slow. So I did what any programmer would do: went down the rabbit hole of learning more about SQLite...
That's the wrong approach. SQLite isn't called "lite" for nothing. It's not what you would use for bulk operations on large tables.
it oversells itself a bit, but still says its (only) advantages for data analysis are: "easier to install and use and the resulting database is a single file that can be written to a USB memory stick or emailed."
I suspect these opinions would be less likely to get downvoted if you could back them up with data, specifically:
- Assuming you do have a test database in advance that's represented in a file, but for some reason not a SQLite file or other database, how long does it take to load 100M rows in a database of your choice, and how does that compare to the author's results?
- The author does not in fact have a test database in advance, and is generating random data, which I think is what you're saying by "apply your random-number-generating function on a trivial column" - how long would your approach take to generate 100M rows?
- If you generated a database of about the same size in a database of your choice, how do bulk operations on large tables perform in comparison to SQLite? (My understanding is that it's "lite" because it's not client/server, not because its performance is poor.)
I don't think any of these experiments would be terribly hard to run.
Another crazy idea is to learn about SQLite file format and just write the pages to disk.