D1: Our SQL database

slashdev · on May 11, 2022

For a Cloudflare article, this one is surprisingly light on technical details. And for the product where it most matters.

I'm guessing this is a single master database with multiple read replicas. That means it's not consistent anymore (the C in ACID). Obviously reads after a write will see stale data until the write propogates.

I'm a bit curious how that replication works. Ship the whole db? Binary diffs of the master? Ship the SQL statements that did the write and reapply them? Lots of performance and other tradeoffs here.

What's the latency like? This likely doesn't run in every edge location. Does the database ship out on the first request. Get cached with an expiry? Does the request itself move to the database instead of running at the edge - like maybe this runs on a select subset of locations?

So many questions, but no details yet.

otoolep · on May 11, 2022

I agree -- this blog post is light on details. To me the value Cloudflare believes they are offering is mostly ease-of-use, particularly setup. With minimal work you can have a stateful, relational store available to your code. But in terms of actual database functionality, they are not offering anything particularly novel. Of course, I might be missing something.

In fact, I don't see anything D1 is doing that is not already offered by something like rqlite[1], which is also a super-easy-to-use distributed database built on SQLite. Of course Cloudflare will run the database for you, which is a great help -- they take care of the uptime, monitoring, backups, etc. And that's important obviously, because in the real-world databases must be operated.

Disclaimer: I am the creator of rqlite.

[1] https://github.com/rqlite/rqlite

rad_gruchalski · on May 11, 2022

I’ve been looking at rqlite for some time and it’s really great to track the product on github.

I believe that the power of what Cloudflare offers here isn’t in the actual database. It’s the packaging and how it sits in their serverless world. Even with rqlite, I still need ip addresses to run a resilient system. As someone who sometimes needs a table here snd there, I really, really don’t want a server. I want a table to store a thousand records in and that’s it. This is where I would very much enjoy using something like D1.

A combo of D1, R2 and Workers is a serious contender for over-the-top serverless distributed apps. This is great.

CRConrad · on May 12, 2022

> It’s the packaging and how it sits in their serverless world. [...] As someone who sometimes needs a table here snd there, I really, really don’t want a server. I want a table to store a thousand records in and that’s it.

Sorry, but I don't get it -- WTF does "serverless" even mean here?

I mean, sorry for jumping on your comment specifically, I know that wasn't primarily what you were talking about here, but... You seem to know what you're talking about, effortlessly encompassing "their serverless world" etc.

The article even mentions that <<SQLite was so ahead of its time, it dubbed itself “serverless” before the term gained connotation with cloud services, and originally meant literally “not involving a server”.>> That makes sense to me; "serverless" means "not having a server". So then you have a local DB; be it SQLite or a DBF or Paradox or MS Access file or whatever. Or even a local DB software "service"; Firebird or MySQL or what have you.

But the term, as it's been bandied about online for the last decade(?) or so (including in this article), seems to pretty obviously actually be about... Remote servers (that's what it talks about replicating between, right?). So what's "serverless" about that???

I've been wondering for a good while now. Anyone who has a short explanation, or link to such, please jump in and enlighten me.

(Otherwise I'll have to conclude it's like "the Cloud", a.k.a. "Someone else's computer". "Serverless" = Someone else's server? :-)

[Edit: Typo.] [Edit: Sigh... Two of them.]

rad_gruchalski · on May 12, 2022

Sure, let me explain my way of thought. I think it’s time to get off of the bandwagon „it’s someone else’s computer”. Sure, internet services don’t live in the void and there’s always a server out there. Are the end of the day, three things in life are for certain: taxes, death, and there is a server.

However, with Cloudflare specifically, you write apps that deal with individual requests. You, as a dev, never ever get exposed to a server. This abstraction goes way further than even AWS Lambda goes. There are no local caches, no temp directories, no mucking around with a shutdown of a function. Every instance of an app deals with exactly one request.

That’s what I mean by „their serverless world”.

CRConrad · on May 12, 2022

Thanks!

sebk · on May 11, 2022

Small nitpick, but that's still consistent as in ACID. I think what you mean is it wouldn't be consistent in the CAP sense (it wouldn't be linearizable).

TFA does say that read-replicas will be present at every edge location, which makes sense for a product like Workers. But it doesn't mention writes at all.

slashdev · on May 11, 2022

Yes, that's true.

dragonwriter · on May 11, 2022

> I'm guessing this is a single master database with multiple read replicas. That means it's not consistent

Single master with read replicas is fully consistent if commits don't return until propagated to and acknowledged by replicas (the expense here being commit latency.)

otoolep · on May 11, 2022

You've basically described rqlite [1], which uses Raft to coordinate the changes to the Leader, and then across some number of Followers. The write won't be acked until a quorum has persisted the change, and committed to the underlying SQLite database.

Disclaimer: I am the creator of rqlite.

[1] https://github.com/rqlite/rqlite

otoolep · on May 11, 2022

rqlite also supports read-only nodes, so in theory you can have more nodes at the edge, just like D1 -- but these nodes won't participate in the distributed consensus process. Those nodes will keep up-to-date with changes, even catching up in the event of a temporary disconnection.

slashdev · on May 11, 2022

I would say the expense is both latency and availability because if one node doesn't ack within the timeframe then you have to drop it from the cluster. Requests that go there would need to be routed elsewhere to avoid being unavailable. If there's a network partition preventing that, then you have partial downtime. If enough nodes fail then you have full downtime across the whole cluster.

ithrow · on May 11, 2022

Yeah, nothing about WAL mode which is what most users will want for web apps.

sqlite is accessed via a socket? defeats the whole purpose of using sqlite.

Many here are mentioning using one sqlite file per customer but that sounds like a nightmare for migrations and analytics.

SQLite is great and all these new services and articles are nice but intentionally shadowing lots of complexity.

detroitcoder · on May 12, 2022

Going to be very interesting to see how they glue together R2, edge workers and sqllite. They can manage replication using R2 and make the sqllite process aware of this for eventual consistency. Having edge compute with edge data on a globally consistent data model is the dream.

kurinikku · on May 11, 2022

wow SQLite getting a lot of love these days

https://tailscale.com/blog/database-for-2022

https://fly.io/blog/all-in-on-sqlite-litestream

https://blog.cloudflare.com/introducing-d1

peterhunt · on May 11, 2022

SQLite is great but it's way overhyped and abused on HN. People are very eager to turn SQLite into a durable, distributed database and it's really not meant for that, and by going down that road instead of using something like MySQL or Postgres you're missing out on lots of important functionality and tooling.

I only say this because I have made this mistake at my previous startup. We built these really cool distributed databases on top of a similar storage engine (RocksDB) plus Kafka, but it ended up being more trouble than it was worth. We should have just used a battle-tested relational database instead.

Using SQLite for these applications is really fun, and it seems like a good idea on paper. But in practice I just don't think it's worth it. YMMV though.

manigandham · on May 11, 2022

So you didn't use SQLite then? Because RocksDB + Kafka is not similar at all.

Also databases all use the same fundamental primitives and it's up to you to choose the level of abstraction you need. For example, FoundationDB is a durable distributed database that uses SQLite underneath as the storage layer but exposes an ordered key/value API, but then allows you to build your own relational DB on top.

If you just needed distributed SQL because a single instance wasn't enough then there are already plenty of choices like CockroachDB/Yugabyte/TiDB/Memsql/etc that can serve the purpose instead of building your own.

peterhunt · on May 11, 2022

It's actually quite similar. Both are embedded storage engines that are designed for a single node.

Actually, the case for RocksDB for backing a distributed data store is probably much stronger than SQLite given that it supports multiple concurrent writers.

SQLite lacks many important characteristics that one would expect a distributed data store to have. Row level locking is one obvious feature that's super important in a highly concurrent context (as mentioned, RocksDB has this). Want to backup your production DB? You're going to need to block all writes until the backup completes.

Additionally, features like profiling and replication are nonexistent or immature with SQLite. Rqlite and Litestream are super new relative to tools like Postgres and MySQL and you can't find a lot of people that know how to run them.

Also, you can't autoscale your app since your processes are now highly stateful. Sure, this is a problem with MySQL/Postgres too, but I can pay AWS or Google Cloud for a managed version that will abstract this problem away from me.

Most of these problems are solvable with enough net new software on top of SQLite. But... why? I think the only reason you'd subject yourself to such an architecture is because you want to learn (great!) or you're gunning for that next promotion and need to show off your system design skills :P

ignoramous · on May 11, 2022

> So you didn't use SQLite then? Because RocksDB + Kafka is not similar at all.

To me, I could make the connection in the sense that just like sqlite, rocksdb is an embedded store, while Kafka can be used to build a replicated log (log device).

> If you just needed distributed SQL because a single instance wasn't enough then there are already plenty of choices...

Well, that was GP's point, too? In addition, they mention that existing DBMS like Postgres have way more breath and depth than a replicated sqlite can ever hope to have (which isn't really a controversial assertion at all, tbh).

samatman · on May 11, 2022

I accept that you learned a lot about the limits of combining RockDB with Kafka, especially in the exact way you combined them.

This might have limited utility if the goal were to combine RocksDB with something else. And even less for SQLite and something else.

The big push of interest in SQLite serverside isn't driven by people who have never set up pgbounce, but rather by developers who have both read the SQLite docs very carefully and have used the library extensively, and know what it's good for.

ripley12 · on May 11, 2022

I'm not sure why you concluded that SQLite is the problem when you built a "really cool distributed database" with Kafka. Distributed databases are complicated, Kafka's complicated.

If you're saying that a replicated Postgres setup would be simpler than what you're built, I agree; but SQLite+Litestream probably would be too.

petre · on May 12, 2022

Litestream is too much work if you're not using S3: replication over sftp. Even fossil has nicer no nonsense replication done over http/s. It's way easier to set up mysql with replication than manage unix accounts and public keys.

MuffinFlavored · on May 12, 2022

Is this any good? https://github.com/rqlite/rqlite

I've been looking for a turn key solution that is better than me running a single node Postgres instance "bare metal" or in a container.

postgres-operator seems cool but... k8s, pretty heavy I guess.

jen20 · on May 12, 2022

It’s the default storage engine for FoundationDB - not sure many would agree that isn’t a “durable, distributed database”.

peterhunt · on May 12, 2022

For one thing, they're ripping it out because of its poor write parallelism https://youtu.be/nlus1Z7TVTI?t=271

But that's orthogonal to my point. As a user of FoundationDB, you're not programming directly against SQLite, so you aren't going to run into these issues as much since FoundationDB exposes different semantics and coordinates concurrency across many SQLite instances in parallel.

I think it's best to think of SQLite as a replacement for your filesystem, rather than a replacement for your relational DBMS.

jgrahamc · on May 11, 2022

SQLite has been cool forever. It was the underlying data store for my machine learning email filter POPFile 20 years ago!

https://en.wikipedia.org/wiki/POPFile https://getpopfile.org/browser/trunk/engine/POPFile/Database...

runlevel1 · on May 11, 2022

It's high-quality software too. It's well-commented and exceptionally well tested.[^1][^2]

> As of version 3.33.0 (2020-08-14), the SQLite library consists of approximately 143.4 KSLOC of C code. ... By comparison, the project has 640 times as much test code and test scripts - 91911.0 KSLOC.

I don't usually place much stock in those sort of counts, but 640x is notable.

It makes sense considering the wide variety of use-cases, from embedded devices to edge computing and everything in between.

[1]: https://www.sqlite.org/testing.html [2]: https://sqlite.org/src/dir?ci=trunk

sigzero · on May 11, 2022

I used POPFile!! It was awesome.

alberth · on May 11, 2022

SQLite was originally great for desktop applications.

Problem is, there's still a huge market for these apps but everything has moved to the web (no one is making desktop apps anymore). So having a full-blown RDMS is overkill for these kind of app, and now SQLite is starting to fill these web app needs.

@sqlite - if you are reading this, any word on merging WAL2 and BEGIN CONCURRENT into main? There clearly is a new class of needs to do so in this world that has completely moved over to web app development (which introduces concurrency problems never experienced on desktop). Any thoughts of focusing more on these web related needs for SQLite (or maybe even fork your own code base to have a more enhanced SQLite version targeted at web needs)?

jchw · on May 11, 2022

I think it’s long overdue. While SQLite certainly has its limitations, it’s a winner in many categories. Even for sites with mild traffic using ordinary SQLite in PHP like a decade ago, it was always nice to use for its simplicity and the performance was totally acceptable. In comparison, the memory usage of typical relational database servers was high enough to make it hard to fit on a single lowend VPS with the same data and traffic. (I found myself tuning MySQL, but I never needed to tune SQLite.)

Cthulhu_ · on May 11, 2022

The main thing for tuning SQLite will be how to open it, e.g. in write-ahead mode, to turn on foreign keys (this needs to be enabled manually), and whether it should wait to get a database lock on slower hardware before giving up. There's also some gotchas like if you mark an ID column as primary key, it'll use the rowid as key - which can be reused if a row is removed. So you need to explicitly set primary key AND autoincrement, else you're going to have a bad time. (https://www.sqlite.org/autoinc.html)

gigatexal · on May 11, 2022

If you define a table with an integer primary key you get autoincrement as default at least in newer versions.

RaoulP · on May 11, 2022

With the mileage (and attention) those new products are getting out of using SQLite, I think Richard Hipp deserves a lot more acknowledgement for creating such an amazing piece of software.

sophacles · on May 11, 2022

New products getting a lot of mileage out of sqlite is old-hat at this point. It one of those rare evergreen techs - pay attention for a while and this latest round of attention will die down for 6-12 months then someone else will start another round of "look how cool sqlite is".

At least that's been my observation since I started coming around here.

tootie · on May 11, 2022

I'm wondering if we'll see some similar energy around non-sql embedded databases like leveldb or rocksdb

sanderjd · on May 11, 2022

Right! SQLite is great, but those two are great as well. It seems like the energy should be around "hey, you should consider a local, maybe even in-memory, database for some things!" more so than specifically "SQLite is great" (though it is).

Thaxll · on May 11, 2022

Well I don't think it's a good fit for regular service, exactly how do you handle 2 replicas of the same service talking to the same DB?

The fact that it's just a file on disk limits the usage.

nibab · on May 11, 2022

Projects such as litestream and rqlite have this figured out.

otoolep · on May 11, 2022

rqlite author here, happy to answer any questions.

Thaxll · on May 11, 2022

Mutliple writer on the same SQLLite?

manigandham · on May 11, 2022

Transactions, locks, queues, etc. No different than multiple app instances changing the same row in other databases.

Any state mutation is ultimately ordered in time and how that that ordering is accomplished depends on the abstractions you're using: in your app, network layer, database, etc.

Thaxll · on May 11, 2022

Why would you use SQLlite once you start dealing with network, just use MySQl or PG.

It's just re-inventing the wheel badly, I need to read the details but basically you're using a tool SQLLite that was not designed to be used outside of a single app use case.

manigandham · on May 11, 2022

What context are you talking about here?

For Cloudflare, they're offering it because it's simple and lightweight, and they already have their Durable Objects product which serves as the transaction ordering mechanism and takes care of writes.

If you're doing it yourself then sure it's probably not the best fit but that's up to you to decide.

Thaxll · on May 12, 2022

SQLLite was not meant to be used by multiple process so you have to build the missing parts yourself, 100% those have more limitations and issues than regular RDBMS that were built for it.

mbreese · on May 11, 2022

Is think one way to think about this is to have one database being tied to one replica (replicas could handle more than one database). Where (importantly) the idea would be one database for each user. You horizontally scale for the number of users, but each user is only using one end node.

It’s interesting because you have to consider how to scale your database as well as your application. The fact that you don’t have one central database opens up more possibilities. But it doesn’t work for all instances (such as a shared read-write data source for all users). For example, this approach wouldn’t work for something like Twitter (at least the original architecture).

jgrahamc · on May 11, 2022

BTW R2 is open beta now: https://blog.cloudflare.com/r2-open-beta/

mariushn · on May 11, 2022

R2 is 3x more expensive than B2 (storage) https://www.backblaze.com/b2/cloud-storage-pricing.html

Am I missing something? Is there no bandwidth cost at all?

messe · on May 11, 2022

Yep, you're not charged for egress.

einichi · on May 11, 2022

B2 to Cloudflare also does not incur egress fees: https://www.backblaze.com/blog/backblaze-and-cloudflare-part...

Backblaze B2 customers will be able to download data stored in B2 to Cloudflare for zero transfer fees. This happens automatically once Cloudflare is configured to distribute your B2 files.

kjksf · on May 11, 2022

I did Backblaze via Cloudflare setup.

I really don't care about the cost of storage. In my case it's the bandwidth costs that were killing me.

If it was available at the time, I would use R2 if only for simplicity.

If I was using Cloudflare Workers it would be another reason to use R2: I assume that it's easier to use and faster to use than any other storage system, since it's on the same network and written by the same people.

Also, exposing Backblaze via Cloudflare has it's issues. I ran into Cloudflare caching 404 responses from Backblaze and Backblaze being slow to make write visible.

So I would write into Backblaze and tried to access that key via Cloudflare proxy. While the write was acknowledged to my client it wasn't yet visible via http endpoint so Cloudflare would cache 404 response. I would have to clear the cache to fix and then I've added 5 min delay "just in case" to work around this.

archon810 · on May 16, 2022

Only to Cloudflare though, right? If users download, it'll be billable.

With R2, user downloads are free, aren't they?

my69thaccount · on May 11, 2022

Does anyone remember when we had Net Neutrality?

IAmEveryone · on May 11, 2022

Yes, like it was yesterday (or today). It was a strange time where the term was often used for things that had nothing to do with the original meaning of the term.

alberth · on May 11, 2022

Does R2 provide synching between regions? Maybe that's why it's so much more expensive? You're getting regional failover?

archon810 · on May 16, 2022

Presumably under the hood it'll be nicely distributed, as per https://blog.cloudflare.com/introducing-r2-object-storage/.

"Our vision for R2 includes multi-region storage that automatically replicates objects to the locations they’re frequently requested from."

rubenv · on May 11, 2022

Latency

losvedir · on May 11, 2022

Wow, this looks potentially very interesting. Since this is sort of fresh in my mind from the recent Fly post about it:

* How exactly is the read replication implemented? Is it using litestream behind the scenes to stream the WAL somewhere? How do the readers keep up? Last I saw you just had to poll it, but that could be computationally expensive depending on the size of the data (since I thought you had to download the whole DB), and could potentially introduce a bit of latency in propagation. Any idea what the metrics are for latency in propagation?

* How are writes handled? Does it do the Fly thing about sending all requests to one worker?

I don't quite know what a "worker" is but I'm assuming it's kind of like a Lambda? If you have it replicated around the world, is that one worker all running the same code, and Cloudflare somehow manages the SQL replicating and write forwarding? Or would those all be separate workers?

hn_ei_ser_23 · on May 11, 2022

First, I'm very excited. Sure, SQLite has some limitations compared to Postgres, esp. regarding the type system and concurrency. But we get ACID compliance and SQL.

But it is really hard getting some useful information from this article. I can't even tell if it is not there or just buried in all this marketing hot air.

So, what is it really? Is there one Write-Master that is asynchronously replicated to all other locations? Will writes be forwarded to this master and then replicated back?

I'm very curious about how it performs in real life. Especially considering the locking behavior (SQLite has always the isolation level 'serializable' iirc). The more you put in a transaction or the longer you have to wait for another process to finish their writes, the more likely you have to deal with stale data.

But overall I'm very excited. Also by the fly.io announcement, of course. Lots of innovation and competition. Good times for customers.

tyingq · on May 11, 2022

>So, what is it really? Is there one Write-Master that is asynchronously replicated to all other locations? Will writes be forwarded to this master and then replicated back?

Not a lot of detail, but that is mentioned:

"But we're going further. With D1, it will be possible to define a chunk of your Worker code that runs directly next to the database, giving you total control and maximum performance—each request first hits your Worker near your users, but depending on the operation, can hand off to another Worker deployed alongside a replica or your primary D1 instance to complete its work."

infogulch · on May 11, 2022

Very cool! Glad to see all the love for SQLite recently.

One thing I've noticed that many commenters miss about read-replicated SQLite is assuming that the only valid model is having one, giant, centralized database with all the data. Lets be honest with ourselves, the vast majority of applications hold personal or B2B data and don't need centralized transactions, and at scale will use multi-tenant primary keys or manual sharding anyways. For private data, a single SQLite database per user / business will easily satisfy the write load of all but the most gigantic corporations. With this model you have unbounded compute scaling for new users because they very likely don't need online transactions across multiple databases at once.

Some questions:

Will D1 be able to deliver this design of having many thousands of separate databases for a single application? Will this be problematic from a cost perspective?

> since we're building on the redundant storage of Durable Objects, your database can physically move locations as needed

Will D1 be able to easily migrate the "primary" at will? CockroachDB described this as "follow the sun" primary.

unraveller · on May 11, 2022

I guess the first answer is: similar to Durable Object limits (unlimited databases / 50 GB total) since they alluded to those abilities more so than a simple file stored on R2 (only for backups).

fzaninotto · on May 11, 2022

Love the Northwind Traders reference! However, for a demo, I suggest a slightly larger and more complex data set, [data-generator-retail](https://www.npmjs.com/package/data-generator-retail).

The demo is also a bit buggy: orders are duplicated as many times as there are products, but clicking on the various lines of the same order leads to the same record, where the user can only see the first product...

I also think the demo would have more impact if it wasn't read-only (although I understand that this could lead to broken pages if visitors mess up with the data).

Anyway, kudos to the CloudFlare team!

naiv · on May 11, 2022

I was thinking the same. The dataset is way too small.

celso · on May 11, 2022

Fixed the orders table. Good catch.

ranguna · on May 11, 2022

This looks amazing!

I see cloudflare people are on this post, any chance to compar D1 vs postgres in terms of DB features?

Insert ... Returning

Stored procedures and triggers

Etc etc

Would be really helpful to get a comparison like cockroachDB did here https://www.cockroachlabs.com/docs/stable/postgresql-compati...

Or even better, a general sql compatibility matrix like this https://www.cockroachlabs.com/docs/stable/sql-feature-suppor...

Kudos to the cloudflare team!

the_duke · on May 11, 2022

Well, it's sqlite... so presumably you will get most of the capabilities sqlite has.

RETURNING is covered.

Stored procedures are indirectly there by running your own code "next to the database", as mentioned in the post. Which is arguably much nicer than having to use some database specific language, given that you can run WASM on workers.

tyingq · on May 11, 2022

There is a layer on top of Sqlite here, so I imagine it's something less than all the capabilities sqlite has, at least initially. Plus the upsides and downsides from their approach to have a master and read replicas.

ranguna · on May 11, 2022

Yes was thinking the same. Nice to see some people here actually understood the question, thank you.

ranguna · on May 11, 2022

> Stored procedures are indirectly there by running your own code "next to the database",

"indirectly" is a keyword here, because running code when data is modified potentially won't replace triggers since they'll probably execute outside the running transaction.

pier25 · on May 11, 2022

Listen/notify

Cthulhu_ · on May 11, 2022

The announcement - if you read it before posting - says it's sqlite, so that's something you can punch into google.

Long story short, don't expect anything fancy. Support for alter table is limited, and concurrency can be an issue.

ranguna · on May 11, 2022

It is indeed sqlite but it could possibly have modification done or additions added. Please be considerate and think a little more before commenting.

the_duke · on May 11, 2022

All this recent hype around sqlite...

sqlite is a great embedded database and thanks to use by browsers and on mobile the most used database in the world by orders of magnitude.

But it also comes with lots of limitations.

* there is no type safety, unless you run with the new strict mode, which comes with some significant drawbacks (eg limited to the handful of primitive types)

* very narrow set of column types and overall functionality in general

* the big one for me: limited migration support, requiring quite a lot of ceremony for common tasks (eg rewriting a whole table and swapping it out)

These approaches (like fly.io s) with read replication also (apparently?) seem to throw away read after write consistency. Which might be fine for certain use cases and even desirable for resilience, but can impact application design quite a lot.

With sqlite you have do to a lot more in your own code because the database gives you fewer tools. Which is usually fine because most usage is "single writer, single or a few local readers". Moving that to a distributed setting with multiple deployed versions of code is not without difficulty.

This seems to be mitigated/solved here though by the ability to run worker code "next to the database".

I'm somewhat surprised they went this route. It probably makes sense given the constraints of Cloudflares architecture and the complexity of running a more advanced globally distributed database.

On the upside: hopefully this usage in domains that are somewhat unusual can lead to funding for more upstream sqlite features.

prirun · on May 11, 2022

* the big one for me: very limited migration support, requiring quite a lot of ceremony for common tasks (eg rewriting a whole table and swapping it out)

I don't know where this idea of having to swap a whole table in SQLite came from, but it simply isn't true. Over the last 13 years I have upgraded production HashBackup databases at customer sites a total of 35 times without rewriting and swapping out tables by using the ALTER statement, just like other databases:

https://www.sqlite.org/lang_altertable.html

For the most recent upgrade, I upgraded to strict tables, which I could also have done without a rebuild/swap. I chose to do a rebuild/swap this one time because I wanted to reorder some columns. Why? Because columns stored with default or null values don't have row space allocated if the column is at the end of the row.

the_duke · on May 11, 2022

For a long time sqlite did not have DROP COLUMN and RENAME COLUMN support, which are both pretty essential.

I'm embarrassed to admit that I didn't realize RENAME COLUMN was actually added in 3.25, almost four years ago.

DROP COLUMN was only just added last year in 3.35.

I'm surprised a database schema lasted 9/12 years without ever renaming or dropping a column.

This changes things! But even now, ALTER TABLE is not transactional. So especially with many concurrent readers there can definitely be situations where you'd still want to rewrite.

teraflop · on May 11, 2022

I'm not sure what you mean by "not transactional". SQLite implements transaction support at the "page" level, and builds all other database operations on top of it, which means anything that touches the bytes of the database file is transaction-safe. You can verify this for yourself:

    sqlite> CREATE TABLE foo(a,b,c);
    sqlite> INSERT INTO foo VALUES (1,2,3);
    sqlite> BEGIN;
    sqlite> ALTER TABLE foo DROP COLUMN b;
    sqlite> SELECT * FROM foo;
    1|3
    sqlite> ROLLBACK;
    sqlite> SELECT * FROM foo;
    1|2|3

It's of course still subject to SQLite's normal restrictions on locking, which means a long-running ALTER statement will block concurrent writers (and probably also concurrent readers if you're not running in WAL mode).

prirun · on May 11, 2022

> I'm surprised a database schema lasted 9/12 years without ever renaming or dropping a column.

I did have a couple of columns that were no longer needed and would have dropped them, but instead I just set them to null and ignored them. Nulls only take 1 byte of space in a row. I dropped them when DROP COLUMN was added.

cryptonector · on May 11, 2022

It would really help if SQLite3 had a `MERGE`, or, failing that, `FULL OUTER JOIN`. In fact, I want it to have `FULL OUTER JOIN` even if it gains a `MERGE`.

`FULL OUTER JOIN` is the secret to diff'ing table sources. `MERGE` is just a diff operation + insert/update/delete statements to make the target table more like the source one (or even completely like the source one).

`FULL OUTER JOIN` is essential to implementing `MERGE`. Granted, one could implement `MERGE` without implementing `FULL OUTER JOIN` as a public feature, but that seems silly.

Sadly, the SQLite3 dev team specifically says they will not implement `FULL OUTER JOIN`[0].

Implementing `MERGE`-like updates without `FULL OUTER JOIN` is possible (using two `LEFT OUTER JOIN`s), but it's an O(N log N) operation instead of O(N).

The lack of `FULL OUTER JOIN` is a serious flaw in SQLite3. IMO.

  [0] https://www.sqlite.org/omitted.html

SQLite · on May 11, 2022

RIGHT and FULL JOIN are on the trunk branch of SQLite and will (very likely) appear in the next release. Please grab a copy of the latest pre-release snapshot of SQLite (https://sqlite.org/download.html) and try out the new RIGHT/FULL JOIN support. Report any problems on the forum, or directly to me at drh at sqlite dot org.

isoprophlex · on May 11, 2022

This is fantastic news, I'm very glad to hear that this is appearing soon! Thanks!

cryptonector · on May 11, 2022

SWEEEEET!

Finally!

Thank you so much for this Mr. Hipp!

EDIT: Don't forget to edit the `omitted.html` page when you ship it!

sorenbs · on May 11, 2022

Migrations have gotten better recently, but there are still cases where you need to follow the 12 steps very carefully: https://www.sqlite.org/lang_altertable.html#otheralter

Prisma Migrate can automatically generate these steps, removing most of the pain. I'm sure other migration tools can do this as well.

llimllib · on May 11, 2022

simonw's sqlite-utils can help here too: https://sqlite-utils.datasette.io/en/stable/cli.html#transfo...

vlovich123 · on May 11, 2022

D1 does not throw away consistency. It’s built on top of Durable Objects which is globally strongly consistent.

smarx007 · on May 11, 2022

"D1 will create read-only clones of your data, close to where your users are, and constantly keep them up-to-date with changes."

Sounds like there will be no synchronous replication and instead there will be a background process to "constantly keep [read-only clones] up-to-date". This means that a stale read from an older read replica can occur even after a write transaction has successfully committed on the "primary" used for writes.

So, while the consistency is not "thrown away", it's no longer a strong consistency? Anyway, Kyle from Jepsen will figure it out soon, I guess :)

geelen · on May 11, 2022

Yeah, so you can always opt-in to strong consistency by transferring execution to the primary (see the "Embedded Compute" section of the blog). Then it's pretty much exactly the same as a DO.

greg-m · on May 11, 2022

Just clarifying - D1 without read replicas is strongly consistent. If you add read replicas, those can have replication lag and will not be strongly consistent.

Disclaimer: I work at Cloudflare :)

infogulch · on May 11, 2022

Thanks for the clarification, that is what I would expect.

Does SQLite support some kind of monotonic transaction id that can be used as a cache coherency key? Say a client writes a new record to the database which returns `{"result": "ok", "transaction_id": 123}`, then to ensure that subsequent read requests are coherent they provide a header that checks that the read replica has transaction_id >= 123 and either waits for replication before serving or fails the request. (Perhaps a good use for the embedded worker?)

discodave · on May 11, 2022

Since it's a relational DB, and supports transactions, you can have a journal table right?

I know of a very important system at AWS that did this with MySQL :D

infogulch · on May 12, 2022

Yes you could do it manually, but it would be nice if the solution didn't require carefully managing update queries so the journal addition isn't missed and increasing write amplification to manually update a journal table when that information probably already exists somewhere in the WAL implementation.

vlovich123 · on May 11, 2022

Yup sorry about that. I missed the entire "read replica" bit when reading that blog post.

mwcampbell · on May 11, 2022

Interesting that D1 is built on top of Durable Objects. Does this mean that it would be practical for a single worker to access multiple D1 databases, so it could use, for example, a separate database for each tenant in a B2B SaaS application? Edit: And could each database be in a different primary region?

a-robinson · on May 11, 2022

Yes, exactly!

hn_ei_ser_23 · on May 11, 2022

That is interesting. I wish CF would give us some more information as I've assumed that there must be a lack of strong consistency which would be a major drawback.

Edit: But that would mean that durable objects can't be replicated asynchronously? That would mean a big latency hit. Then what's the difference to a central DB in one datacenter?

kwizzt · on May 11, 2022

I’m not familiar with Durable Objects. When D1 does replication to read replicas, if it’s not doing synchronous replication, then it’s not strongly consistent, is that correct?

the_duke · on May 11, 2022

I wish the post had provided some more technical details.

It's more of a "quickstart" than a peek under the hood.

unraveller · on May 11, 2022

I'd like to see some up front D1 & R2 benchmarks (read/write/iops). I can't judge invocation cost value until I can judge my use case performance. Here's hoping its -gt NVMe raid 10 under the hood of D1 as some big SQLite reads suffer under slow storage.

jpcapdevila · on May 11, 2022

Are you guys using litestream or a similar approach? E.g storing WAL frames in a durable object.

jambutters · on May 11, 2022

What types are missing from strict that you need?

vaughan · on May 11, 2022

Has anyone tried to write a new modern SQLite?

sophacles · on May 11, 2022

Why? Yes sqlite doesn't have all the features postgres has. Postgres doesn't have all the features the sqlite has either. What's wrong with having different tools with different sets tradeoffs. It's a different shape of Lego and that's fine - some things call for a 1/3height 2x2 and others call for a full height 1x8.

jpcapdevila · on May 11, 2022

I think the most successful attempt would be Realm.

https://realm.io/

chrisshroba · on May 11, 2022

DuckDB comes to mind, but I can't speak to its differences from SQLite.

https://duckdb.org/

anyfactor · on May 11, 2022

I haven't tried duckdb but I have been googling about it. I think I saw a discussion where it was mentioned that duckdb isn't a replacement for SQLite. It is an OLAP database [0] which makes its ingestion time slower than SQLite, I think. So it is meant for analytics but not as fullfledge replacement for SQLite.

[0]: https://en.wikipedia.org/wiki/Online_analytical_processing

Duckdb on HN: https://news.ycombinator.com/item?id=23287278

1egg0myegg0 · on May 11, 2022

Close! DuckDB has very fast bulk insert speeds, but slower individual row insertion/update speeds. (Disclaimer: I write docs for DuckDB)

gigatexal · on May 11, 2022

DuckDB is Olap SQLite. The vector engine is dope. But most of the innovation is in the OLAP stuff.

steaminghams · on May 11, 2022

why do you consider sqlite to not be modern?

all the hip service providers seem to be all over it which would indicate pretty good modernity to me at least.

ngrilly · on May 11, 2022

Not clear from reading the post if the SQLite C library is embedded and linked in the Worker runtime (which would mean no network roundtrip) or if each query or batch of queries is converted to a network request to a server embedding the SQLite C library.

That's important to understand because that's one of the key advantages of SQLite compared to the usual client-server architecture of databases like PostgreSQL or MySQL: https://www.sqlite.org/np1queryprob.html

samwillis · on May 11, 2022

This is really interesting, it's (basing it on SQLite) exactly what I was expecting CloudFlare to do for their first DB.

Its perfect for content type sites that want search and querying.

Anyone from CF here, is it using Litestream (https://litestream.io) for its replication or have you built your own replication system?

I assume this first version is somewhat limited on write performance having a single "main" instance and SQLite laking concurrent writes? It seems to me that using SQLite sessions[0] would be a good way to build an eventually consistent replication system for SQLite, would be perfect for an edge first sql database, maybe D2?

0: https://www.sqlite.org/sessionintro.html

jgrahamc · on May 11, 2022

1. No, it's not built on Litestream. Operating a massive network and shuttling data around is kind of our thing.

2. We are going all in on databases and D2 sounds like a cool name for something...

xafke · on May 11, 2022

R2, D2. I see what you did there!

endisneigh · on May 11, 2022

Have any of the problems that led people to use Postgres instead of SQLite actually been solved? Are we doomed to repeat the same mistakes?

Also, any plans to support PATCH x-update-range so SQLite can be used entirely in the browser via SQLite.js?

Can someone enlighten me with the types of use cases this would be better for vs say Postgres?

ignoramous · on May 11, 2022

It isn't as much as folks who need Postgres features are moving to SQLite just because it is cool, but it is folks who don't want those Postgres features moving to SQLite, because the latter has just enough features they only ever really need.

endisneigh · on May 11, 2022

SQLite made sense as an embedded database on day a desktop or phone because there’s only a single person generally writing to it. The perfect use case.

I don’t understand how it will be usable at all in a website with multiple users. Is the idea to make your site to every user gets their own database? How do you stop SQL injection?

Once you solve all of these problems aren’t you better off just using Postgres?

adamckay · on May 11, 2022

> I don’t understand how it will be usable at all in a website with multiple users

With WAL mode enabled the database is locked during writes only, and concurrent writes are queued but you can still perform reads concurrently. If you keep your write transactions small and consider that a lot of apps aren't writing a lot, it can give perfectly good performance for a lot of usecases.

> Is the idea to make your site to every user gets their own database?

You can do... I know of B2B apps that give each billable customer their own database.

> How do you stop SQL injection?

In the exact same way you do in all other flavours of SQL - with parameterized queries.

> Once you solve all of these problems aren’t you better off just using Postgres?

Not necessarily. Postgres gives you a different set of problems and limitations to consider and work around.

endisneigh · on May 11, 2022

Most of the problems with Postgres are a result of it not being embedded. If you’re using SQLite in an non embedded fashion I don’t see how you don’t inherit the same problems.

nindalf · on May 11, 2022

Which problems were you thinking of?

Cloudflare and fly.io both promise hassle free read replicas and backup. They will both offer only a single node capable of writes, because that’s how SQLite rolls.

This is a pretty good fit for a read heavy load that requires SQL and very low latency.

endisneigh · on May 11, 2022

I guess I’m not understanding what the benefit is vs hosted Postgres. Also low latency and setup can be equally trivial - see supabase for example.

simonw · on May 11, 2022

Biggest benefit over hosted PostgreSQL is that you get SELECT queries that are measured in microseconds, because SQLite avoids needing network overhead per query.

https://www.sqlite.org/np1queryprob.html

endisneigh · on May 11, 2022

Wouldn’t D1 introduce network overhead?

simonw · on May 11, 2022

Yes for writes, but it shouldn't for reads: it looks like it works by replicating the full database down to each edge location where the code is running.

jve · on May 11, 2022

Hope this can give you some concrete answers: https://www.sqlite.org/whentouse.html

hn_ei_ser_23 · on May 11, 2022

The important drawback is async replication and therefore the lack of full consistency. On the other hand, this is the big advantage of hosted Postgres and the like.

Those offerings are great for use-cases that don't need that kind of consistency, which are many.

hn_ei_ser_23 · on May 11, 2022

No and no. I think this is great for Edge computing, where there is currently no solution. So, it's better than nothing.

It all depends on the use-case, of course. A traditional hosted Postgres or MySQL database or cluster is certainly the go-to solution for all who need advanced features or full consistency, which only synchronous replication could provide.

jve · on May 11, 2022

What problems? Both are for different use cases albeit overlapping.

endisneigh · on May 11, 2022

Concurrent writes, for one.

lucasyvas · on May 11, 2022

To the person from Cloudflare I complained to in last year's thread about putting your money where your mouth is on serverless databases:

You weren't lying, and this is super cool - the SQLite hype train also seems to be in full force.

throwaway894345 · on May 11, 2022

It's interesting to see a relatively old technology get hyped.

jgrahamc · on May 11, 2022

rmbyrro · on May 11, 2022

I'm buying Cloudflare stocks right now.

In 2-3 years from now, these services will be so mature and strong they will be crushing the cloud market.

They're turning dreams into reality, one after another.

endisneigh · on May 11, 2022

Cloud business is driven by enterprise generally. Would enterprise be using SQLite?

Quarrelsome · on May 11, 2022

they should be using SQLite more often than they are.

endisneigh · on May 11, 2022

Why? What use cases are better with SQLite vs Postgres, MySQL, etc?

sophacles · on May 11, 2022

Some "pros" that many find appealing:

* Copying the database around its a file copy in sqlite. Each database is it's own single file. (there's also WAL stuff that you get control of)

* No extra service to deploy, manage, and/or optimize. I don't fully agree with the following, but I had a colleague who used to say "If you don't have multiple app servers writing to the db, postgres is a waste of effort".

* embedded means way lower data latency - if the dataset is in the fs cache even lower, no waiting on network transactions.

I've frequently chosen it over PG in cases where I needed basic relational data operations. In one case we ingested a large dataset (a few gbs of measurements) once an hour. Then we did some initial analytics on those measurements and threw the results in the same db file. After that step was done, the data was read only for several other systems and we just copied the db file to each of the systems that needed the data on-demand. A couple of the systems did additional analytics and effectively imported the db file to a different db (one was PG another was a graph db - neo4j). A couple of the systems just used the db file directly. It worked our really well.

Quarrelsome · on May 13, 2022

less things to go wrong. This gives you benefits all the way up the dev stack. Changing an integration test from needing its own db server installed to just a couple of files on disk is a big difference in complexity. You can probably run that test with just a local disk almost infinitely entirely deterministically, conversely as soon as you go onto the network all bets are off.

Granted, if you already have the tooling its less of a big deal but if you don't then you need the tooling, e.g. your build and test machines need to have access to some sql installation somewhere and that the process of maintaining that can be a flaky one.

bmon · on May 11, 2022

If you consider cost... I would imagine a fair few. From the article:

> We will ensure that D1 costs less and performs better than comparable centralized solutions.

jpcapdevila · on May 11, 2022

If SQLite gets you excited, I'm building a firebase alternative based on sqlite. I'm betting hard on sqlite so this get's me super excited!!

https://javascriptdb.com

CF people around, I would love to chat, if anyone is interested please reach out at: jp@javascriptdb.com

I'll be applying to this beta for sure!

js4ever · on May 11, 2022

Super interesting! I really like the idea. I'll join the beta, email sent :)

jpcapdevila · on May 11, 2022

Any feedback on what do you find interesting would be awesome :) thanks!!

mwcampbell · on May 11, 2022

Any current or planned support for existing ORMs, such as Prisma or TypeOrm?

Also, I wonder how hard it will be to migrate existing PostgreSQL databases and SQL statements. Of course, I understand if Cloudflare is focused on greenfield applications.

sorenbs · on May 11, 2022

Prisma won't work with D1 out of the box. The primary limitations are:

- SQLite is traditionally embedded in an application, so Prisma interacts with it by mounting a file. Workers does not have a local filesystem, and D1 is exposed over the network through an API accessible from a Worker. Prisma will have to create a specific connector for D1. - Workers have a script size limit which is currently 1MB. My understanding is that Cloudflare will be increasing this in the future. We also have specific work to decrease the size of Prisma. Both of those will have to happen before Prisma could be used with D1.

Note that Prisma already support querying Postgres, MySQL, SQL Server and MongoDB from Cloudflare Workers through the Prisma Data Proxy, which will see a GA release next month.

We are also very excited about D1 as a way to bring a subset of data closer to users in order to deliver faster experiences. We hope this will be a way to bring the benefit of edge computing to larger organisations who cannot simply rearchitect everything to run on Workers.

geelen · on May 11, 2022

> We are also very excited about D1 as a way to bring a subset of data closer to users in order to deliver faster experiences. We hope this will be a way to bring the benefit of edge computing to larger organisations who cannot simply rearchitect everything to run on Workers.

I am also excited about this :)

Cthulhu_ · on May 11, 2022

Before you consider using an ORM, try using regular SQL and some tooling first; your future self will thank you. Just write the code, it's only volume and it's not so bad.

joshstrange · on May 11, 2022

I took this advice on my last project and ended up re-writing the whole thing to use Prisma later. I launched and had a successful event with raw sql but it quickly became unwieldy. Prisma gives me type safety throughout my app (written in Typescript) and would have prevented a number of bugs/pain points as my app grew. And I'm only 1 developer, this gets worse if you have multiple people working on it. I still write raw sql for reporting/aggregation (Prisma's features here only work for basic examples in my experience) and I'm not "scared of raw sql" but I can move much faster when I have the guardrails of types.

pier25 · on May 11, 2022

Totally agree.

Source: someone who avoided learning SQL for 20 years.

gigatexal · on May 11, 2022

+1 to this as well.

jgrahamc · on May 11, 2022

We are definitely interested in ORMs. Want to make it easy to use. I hope someone creates the next Rails using Workers. And having other models on top of our SQL offerings will be important. Get in contact and let us know what you'd like.

joshstrange · on May 11, 2022

> I hope someone creates the next Rails using Workers

I too am eagerly waiting for a good serverless nodejs framework that is "batteries included". I've deployed on Lambda using the "Serverless Framework" but once your app grows to a certain size everything starts to fall apart and you lose some of the magic. Unfortunately, most of the things that advertise themselves as serverless/lambda/worker nodejs frameworks are monoliths and/or an existing monolith framework that "supports" lambda (with a billion asterisks after that). There is absolutely nothing wrong with monolith frameworks, I love them, but just not for lambda, I want to deploy a single endpoint as a single function (or as a cron, or queue listener, etc), not all of my code for every function (you hit size limits quick with this method).

I want express/nestjs/etc-type routes that I define with code or annotations that result in /only/ that function (endpoint) being bundled up and deployed. I ended up rolling my own "framework" on top of Serverless Framework (uses serverless.ts config file that scans my directories for a special file that defines the routes defined in that directory) but Serverless Framework is pretty shaky ground. Their documentation is a mess, Serverless Components appears dead, and they seem to be busy with their own "cloud" so I don't know how much longer I can keep building on top of them.

When it works it's like magic but there are a ton of walls you run headfirst into: Cloud formation entity limits, package size limits, typescript/bundling support, clear disregard for medium/large projects ("Just use multiple services", this leads to a terrible dev experience), and long deploy times.

I wish CF Workers had been out when I first started building my current project, I might have gone in that direction instead, I still might.

jpcapdevila · on May 11, 2022

Hey Josh,

I'm building a serverless firebase alternative that uses SQLite. If CF gives me access I will totally support D1 & workers.

Check it out: javascriptdb.com

joshstrange · on May 11, 2022

Thanks! I'll check it out.

mwcampbell · on May 11, 2022

Does Cloudflare Workers now support a large number of workers under a single domain without having to use an expensive pricing tier?

joshstrange · on May 11, 2022

I'm not sure, I've not done the full research into CF Workers since I'm on AWS Lambda right now and don't have the capacity to evaluate alternatives. I just like a lot of the CF products and their general ethos/vibe so I'm interested in it. Who knows, it might have a whole new set of issues (most likely) but I don't know if those issues are worse or better than what I'm dealing with now.

gen220 · on May 11, 2022

You might want to consider adding Deno [1] to the language examples: https://developers.cloudflare.com/workers/platform/languages...

Deno can compile to wasm, so it can plug in through that vertical. But it's just TS on the frontend.

I'm mainly a python programmer, but Deno's been the most alluring development in the JS ecosystem since typescript for me. Might be helpful to you all to capture some steam from source.

[1]: https://deno.land/

jpcapdevila · on May 11, 2022

I'm building an open source firebase alternative using sqlite. I'll be reaching out, I was thinking to build the distribution & durability part myself, but I would rather use D1!

I guess it would count as a client focused ORM :)

I'll be reaching out from jp@javascriptdb.com

Great addition, congrats!

eatonphil · on May 11, 2022

Will not any existing ORM that supports SQLite support D1? I looked in the post for details on how it extends SQLite (is the query language different or extended, semantics very different, etc.) but didn't notice anything.

mwcampbell · on May 11, 2022

I think the main issue will be with ORMs that are tightly coupled to a specific SQLite driver, such as Prisma.

jgrahamc · on May 11, 2022

They should.

irq-1 · on May 11, 2022

This should have a virtual file system. CF should write it so each user doesn't have to load a JS abstraction and it has better performance.

Cthulhu_ · on May 11, 2022

Before you consider using an ORM, try using regular SQL and some tooling first; your future self will thank you. Just write the code, it's only volume and it's not so bad. What is bad is learning a 3rd language on top of SQL and JS/TS that you somehow have to manually map to SQL.

ryanto · on May 11, 2022

This is so cool!

From the blog post it says read-only replicas are created close to users and kept up to date with the latest data.

- How should I think about this in terms of CAP? If there's a write and I query a replica what happens?

- How are writes handled? Do they go to a single location or are they handled by various locations?

I'm excited to try this. It's so cool to see databases being distributed "on CDNs" for lack of a better term.

leonidasv · on May 11, 2022

I think they're replicated asynchronously, so reading directly from the replica may return old data. That's why they've added the ability to deploy special workers that "live" closer to the primary:

> Embedded compute

> But we're going further. With D1, it will be possible to define a chunk of your Worker code that runs directly next to the database, giving you total control and maximum performance — each request first hits your Worker near your users, but depending on the operation, can hand off to another Worker deployed alongside a replica or your primary D1 instance to complete its work.

tyingq · on May 11, 2022

"With D1, it will be possible to define a chunk of your Worker code that runs directly next to the database...each request first hits your Worker near your users, but depending on the operation, can hand off to another Worker deployed alongside a replica or your primary D1 instance to complete its work."

That's interesting to me. It opens the door for Cloudflare to offer something more like a "normal" serverless offering. One that can run containers, or least natively run Python/Golang/Java/etc, like AWS Lambda does. And with this ecosystem described above that can conditionally route between the lighter edge Workers and the heavier central serverless functions. To me, that's the tipping point where they start to threaten larger portions of AWS.

SheinhardtWigCo · on May 11, 2022

Big fan of Cloudflare but I wish they would stick to descriptive product names.

Good: Workers, KV, Durable Objects, Cron Triggers

Bad: Spectrum, Zaraz, R2, D1

alberth · on May 11, 2022

Naming is hard.

> Zaraz

That's the name of the company they acquired. Though, I do agree that more descriptive naming is nice.

E.g.

Zaraz = SafeXXS

D1 = LDS (light database system)

R2 = ObjectStore

Spectrum = Reverse Proxy

lucasyvas · on May 11, 2022

The API for this is currently the only thing I wish I could grok a bit better. It seems like it would be hard to make it work with existing libraries that can access SQLite, which is kind of a shame.

I'm thinking of sqlx in Rust (or any other language binding / ORM for that matter), which has compile time schema safety. This is a nice capability, and because this interface seems non-standard (possibly for good reason), I guess we are being asked to give some of those things up.

I am getting a bit ahead of myself on the Rust part (presumably that will eventually be supported as part of workers-rs), but I think the feelings still stand if you consider the JS ecosystem.

Edit: I may actually be wrong, but presumably the entire surface isn't covered because there's no file opening, etc.

mritchie712 · on May 11, 2022

There might be a `env.DB.url` (e.g. the jdbc URL) which you could pass into an existing library.

yencabulator · on May 11, 2022

I'm kinda willing to make a bet that this rides on top of what looks like HTTP to the Javascript engine. That's how their worker-to-worker and worker-to-durable-object protocols are.

(It's not really HTTP as in it might never cross a TCP socket, just get shuffled from one V8 isolate to another, but it looks like a `fetch` call to the Javascript.)

It's also worth remembering that SQLite itself has no wire protocol, it's a library. And there is no such thing as a "SQL wire protocol". It sure isn't gonna be Postgres wire protocol either.

From the article:

> D1’s API includes batching: anywhere you can send a single SQL statement you can also provide an array of them, meaning you only need a single HTTP round-trip to perform multiple operations. This is perfect for transactions that need to execute and commit atomically:

lucasyvas · on May 11, 2022

Interesting thought! Would love to see more details.

irq-1 · on May 11, 2022

Best Effort Writes[1] are an opportunity here. Non-transactional, write to the local replica (ensure foreign keys, constrains, valid data, etc...) and then try to write to the main write-enabled DB. Caching should work without changes since the local replica is updated. This could be cheaper (send binary diffs) and more resilient to brief network issues.

The key is to let the user decide what really needs ACID and what doesn't. If someone wants to make the next Facebook or Reddit they'll need huge write throughput and if some votes or updates are lost, that may be a good trade-off.

[1] You could add a BEW file (like WAL file) to sqlite for Best Effort Writes.

didip · on May 11, 2022

All these hype around SQLite recently and I am still confused.

* How do you replicate it consistently?

* Who has the master privilege (or masters if sharded)? What's the failover story?

I am guessing a blob store is involved, but I have gaps in my understanding here.

discodave · on May 11, 2022

SQLite has a write ahead log (journal) mode. If you write that log to some store that is already replicated (S3, CloudFlare Durable Objects, Kafka?) then the concept of a 'master' is less important.

frogger8 · on May 11, 2022

Not a expert on DOM or JavaScript so be kind ;)

One thing I hope to see in the future is a better product filtering experience. When I worked on a jquery product filter I realized the DOM bloat was the main problem.

I wonder if D1 can help devs build instant product filtering pages that don’t require the reload like microcenter or Newegg does.

IE https://www.newegg.com/p/pl?d=hdmi+cable&N=-1&SortType=8

mbreese · on May 11, 2022

At any sufficient scale, it is difficult to do filtering on the client. Yes, it can be done, but with 10,000+ potential records, you don’t want to ship that to the client for each query. (Note: I’m thinking Newegg scale for “hdmi cable” here. There are certainly situations where you can ship the entire database to the client for filtering.)

It’s not DOM bloat… it’s too many records. If you’re building a DOM node for each record, that’s bloat, but you still have the problem even if the results are stored in a JSON object and dynamically queried on the client side.

So, for each new filter or new query you need to hit the server anyway. If that’s an asynchronous query that returns a json blob or a full refresh, IMHO, it doesn’t really matter that much. Either way, you’re rebuilding a large portion of the DOM with the new results. The only thing that skews things in favor of an async call is if the rest of the page is so heavyweight that reloading the page takes a significant amount of time. This is probably what you’re taking about.

Having a SQLite db close to your worker node really isn’t going to affect this problem all that much.

Cthulhu_ · on May 11, 2022

It's probably better - especially for more advanced search engines - to have an elasticsearch instance or whichever is the more recent example handle product search and filtering like that.

_kyran · on May 11, 2022

So can we assume that D2 will be postgres/mysql ?

eatonphil · on May 11, 2022

It sounds like you're making a simile but I don't understand it. The article did literally state D1 is based on sqlite.

_kyran · on May 11, 2022

The opening paragraph reads "Today, we're excited to announce D1, our first SQL database." read: first

and well R2 and D2 would make for a great naming scheme.

eatonphil · on May 11, 2022

Ah sorry I missed you saying d2 not D1.

greenie_beans · on May 11, 2022

dang i was hoping for postgres so i can use postgis

edit: maybe one day! this looks cool regardless

edvinbesic · on May 11, 2022

I'm right there with you. I wonder if this is an SQLite compatible API on top of their own solution, or if it's using actual SQLite under the hood with custom replication.

If the latter, and anyone from CloudFlare is here, is there any chance to have SpatiaLite enabled?

https://www.gaia-gis.it/fossil/libspatialite/index

durkie · on May 11, 2022

Seconding a vote for Spatialite support! I came here just to make that same request.

yawaramin · on May 11, 2022

No need to call dang!

greenie_beans · on May 12, 2022

lol i'm from a place where "dang" is a natural part of our vocab

aeyes · on May 11, 2022

What write throughput and latency can we expect from this database?

Are there any limitations, for example on the number of tables or size of the database?

xwdv · on May 11, 2022

With this we can probably switch our infrastructure off AWS and entirely onto Cloudflare.

pier25 · on May 11, 2022

So where are the databases running? In the same regions as workers?

Is the data replicated to all regions?

dinkleberg · on May 11, 2022

This is convenient, I’ve been building an app which is using SQLite but am wanting to deploy it to Cloudflare pages. I expected I was going to have to switch to a hosted Postgres instance somewhere, but this could be perfect.

jcuenod · on May 11, 2022

So I assume we'll see a nice big donation to the sqlite coffers, then?

ralusek · on May 11, 2022

Unless I missed it by skimming, where are the deets? Is this strongly or eventually consistent? What are max table sizes, and do they become partitioned? Are there cross partition joins?

robertlagrant · on May 11, 2022

This looks awesome. I was thinking about creating a custom version of this to live behind a CF Worker. Much better to have an official version!

estensen · on May 11, 2022

Too bad you probably can't use this to store data about EU citizens. Phone numbers like they show in the demo are considered PII, right?

methyl · on May 11, 2022

whitepaint · on May 11, 2022

Will they seriously challenge Azure, AWS and GCP eventually? Cloudflare is very innovative and what they are doing is really exciting.

015a · on May 11, 2022

The unique thing about Cloudflare's product offerings is how global-first they are; traditional cloud providers (AWS to DigitalOcean) have a very region-oriented domain model, with select christened services allowed or architected to be global (ex: AWS Cloudfront, IAM, Route53, that's about it there). That's their disaster/failure model; but all it really does is force cross-regional architecture onto the customer. Most customers don't bother.

In comparison, everything at CF is global. And its not just "global" from an AWS perspective of "we've got 14 regions and your stuff runs in all of them"; its global from 300+ points-of-presence, within 50ms of like 98% of all humans. CDN for compute, databases, etc.

CF has a way to go in DevEx on many of their products. For example; Workers, being based on V8 Isolates, is a pain to use even compared to e.g. Lambda. It's a battle of figuring out what's possible and what isn't within the runtime. But I'm sure it'll be improved!

philholden · on May 11, 2022

Glad to hear was considering moving to Deno Deploy + Supabase because KV was not good for relationships.

jzer0cool · on May 11, 2022

How does this work when developing locally. Is it SQLite for local development?

benjiweber · on May 11, 2022

I was expecting this to be using https://en.wikipedia.org/wiki/D_(data_language_specification... given the name.

polskibus · on May 11, 2022

Is this going to be open sourced? Seems to be building on the shoulder of a particular giant that could use a bit wider ecosystem.

deanc · on May 11, 2022

Any word on pricing =)?

oxff · on May 11, 2022

Its a bold strategy, Cotton, sounding a bit like they want to compete with AWS.