Hacker News new | past | comments | ask | show | jobs | submit login
D1: Our SQL database (cloudflare.com)
592 points by elithrar on May 11, 2022 | hide | past | favorite | 228 comments



For a Cloudflare article, this one is surprisingly light on technical details. And for the product where it most matters.

I'm guessing this is a single master database with multiple read replicas. That means it's not consistent anymore (the C in ACID). Obviously reads after a write will see stale data until the write propogates.

I'm a bit curious how that replication works. Ship the whole db? Binary diffs of the master? Ship the SQL statements that did the write and reapply them? Lots of performance and other tradeoffs here.

What's the latency like? This likely doesn't run in every edge location. Does the database ship out on the first request. Get cached with an expiry? Does the request itself move to the database instead of running at the edge - like maybe this runs on a select subset of locations?

So many questions, but no details yet.


I agree -- this blog post is light on details. To me the value Cloudflare believes they are offering is mostly ease-of-use, particularly setup. With minimal work you can have a stateful, relational store available to your code. But in terms of actual database functionality, they are not offering anything particularly novel. Of course, I might be missing something.

In fact, I don't see anything D1 is doing that is not already offered by something like rqlite[1], which is also a super-easy-to-use distributed database built on SQLite. Of course Cloudflare will run the database for you, which is a great help -- they take care of the uptime, monitoring, backups, etc. And that's important obviously, because in the real-world databases must be operated.

Disclaimer: I am the creator of rqlite.

[1] https://github.com/rqlite/rqlite


I’ve been looking at rqlite for some time and it’s really great to track the product on github.

I believe that the power of what Cloudflare offers here isn’t in the actual database. It’s the packaging and how it sits in their serverless world. Even with rqlite, I still need ip addresses to run a resilient system. As someone who sometimes needs a table here snd there, I really, really don’t want a server. I want a table to store a thousand records in and that’s it. This is where I would very much enjoy using something like D1.

A combo of D1, R2 and Workers is a serious contender for over-the-top serverless distributed apps. This is great.


> It’s the packaging and how it sits in their serverless world. [...] As someone who sometimes needs a table here snd there, I really, really don’t want a server. I want a table to store a thousand records in and that’s it.

Sorry, but I don't get it -- WTF does "serverless" even mean here?

I mean, sorry for jumping on your comment specifically, I know that wasn't primarily what you were talking about here, but... You seem to know what you're talking about, effortlessly encompassing "their serverless world" etc.

The article even mentions that <<SQLite was so ahead of its time, it dubbed itself “serverless” before the term gained connotation with cloud services, and originally meant literally “not involving a server”.>> That makes sense to me; "serverless" means "not having a server". So then you have a local DB; be it SQLite or a DBF or Paradox or MS Access file or whatever. Or even a local DB software "service"; Firebird or MySQL or what have you.

But the term, as it's been bandied about online for the last decade(?) or so (including in this article), seems to pretty obviously actually be about... Remote servers (that's what it talks about replicating between, right?). So what's "serverless" about that???

I've been wondering for a good while now. Anyone who has a short explanation, or link to such, please jump in and enlighten me.

(Otherwise I'll have to conclude it's like "the Cloud", a.k.a. "Someone else's computer". "Serverless" = Someone else's server? :-)

[Edit: Typo.] [Edit: Sigh... Two of them.]


Sure, let me explain my way of thought. I think it’s time to get off of the bandwagon „it’s someone else’s computer”. Sure, internet services don’t live in the void and there’s always a server out there. Are the end of the day, three things in life are for certain: taxes, death, and there is a server.

However, with Cloudflare specifically, you write apps that deal with individual requests. You, as a dev, never ever get exposed to a server. This abstraction goes way further than even AWS Lambda goes. There are no local caches, no temp directories, no mucking around with a shutdown of a function. Every instance of an app deals with exactly one request.

That’s what I mean by „their serverless world”.


Thanks!


Small nitpick, but that's still consistent as in ACID. I think what you mean is it wouldn't be consistent in the CAP sense (it wouldn't be linearizable).

TFA does say that read-replicas will be present at every edge location, which makes sense for a product like Workers. But it doesn't mention writes at all.


Yes, that's true.


> I'm guessing this is a single master database with multiple read replicas. That means it's not consistent

Single master with read replicas is fully consistent if commits don't return until propagated to and acknowledged by replicas (the expense here being commit latency.)


You've basically described rqlite [1], which uses Raft to coordinate the changes to the Leader, and then across some number of Followers. The write won't be acked until a quorum has persisted the change, and committed to the underlying SQLite database.

Disclaimer: I am the creator of rqlite.

[1] https://github.com/rqlite/rqlite


rqlite also supports read-only nodes, so in theory you can have more nodes at the edge, just like D1 -- but these nodes won't participate in the distributed consensus process. Those nodes will keep up-to-date with changes, even catching up in the event of a temporary disconnection.


I would say the expense is both latency and availability because if one node doesn't ack within the timeframe then you have to drop it from the cluster. Requests that go there would need to be routed elsewhere to avoid being unavailable. If there's a network partition preventing that, then you have partial downtime. If enough nodes fail then you have full downtime across the whole cluster.


Yeah, nothing about WAL mode which is what most users will want for web apps.

sqlite is accessed via a socket? defeats the whole purpose of using sqlite.

Many here are mentioning using one sqlite file per customer but that sounds like a nightmare for migrations and analytics.

SQLite is great and all these new services and articles are nice but intentionally shadowing lots of complexity.


Going to be very interesting to see how they glue together R2, edge workers and sqllite. They can manage replication using R2 and make the sqllite process aware of this for eventual consistency. Having edge compute with edge data on a globally consistent data model is the dream.



SQLite is great but it's way overhyped and abused on HN. People are very eager to turn SQLite into a durable, distributed database and it's really not meant for that, and by going down that road instead of using something like MySQL or Postgres you're missing out on lots of important functionality and tooling.

I only say this because I have made this mistake at my previous startup. We built these really cool distributed databases on top of a similar storage engine (RocksDB) plus Kafka, but it ended up being more trouble than it was worth. We should have just used a battle-tested relational database instead.

Using SQLite for these applications is really fun, and it seems like a good idea on paper. But in practice I just don't think it's worth it. YMMV though.


So you didn't use SQLite then? Because RocksDB + Kafka is not similar at all.

Also databases all use the same fundamental primitives and it's up to you to choose the level of abstraction you need. For example, FoundationDB is a durable distributed database that uses SQLite underneath as the storage layer but exposes an ordered key/value API, but then allows you to build your own relational DB on top.

If you just needed distributed SQL because a single instance wasn't enough then there are already plenty of choices like CockroachDB/Yugabyte/TiDB/Memsql/etc that can serve the purpose instead of building your own.


It's actually quite similar. Both are embedded storage engines that are designed for a single node.

Actually, the case for RocksDB for backing a distributed data store is probably much stronger than SQLite given that it supports multiple concurrent writers.

SQLite lacks many important characteristics that one would expect a distributed data store to have. Row level locking is one obvious feature that's super important in a highly concurrent context (as mentioned, RocksDB has this). Want to backup your production DB? You're going to need to block all writes until the backup completes.

Additionally, features like profiling and replication are nonexistent or immature with SQLite. Rqlite and Litestream are super new relative to tools like Postgres and MySQL and you can't find a lot of people that know how to run them.

Also, you can't autoscale your app since your processes are now highly stateful. Sure, this is a problem with MySQL/Postgres too, but I can pay AWS or Google Cloud for a managed version that will abstract this problem away from me.

Most of these problems are solvable with enough net new software on top of SQLite. But... why? I think the only reason you'd subject yourself to such an architecture is because you want to learn (great!) or you're gunning for that next promotion and need to show off your system design skills :P


> So you didn't use SQLite then? Because RocksDB + Kafka is not similar at all.

To me, I could make the connection in the sense that just like sqlite, rocksdb is an embedded store, while Kafka can be used to build a replicated log (log device).

> If you just needed distributed SQL because a single instance wasn't enough then there are already plenty of choices...

Well, that was GP's point, too? In addition, they mention that existing DBMS like Postgres have way more breath and depth than a replicated sqlite can ever hope to have (which isn't really a controversial assertion at all, tbh).


I accept that you learned a lot about the limits of combining RockDB with Kafka, especially in the exact way you combined them.

This might have limited utility if the goal were to combine RocksDB with something else. And even less for SQLite and something else.

The big push of interest in SQLite serverside isn't driven by people who have never set up pgbounce, but rather by developers who have both read the SQLite docs very carefully and have used the library extensively, and know what it's good for.


I'm not sure why you concluded that SQLite is the problem when you built a "really cool distributed database" with Kafka. Distributed databases are complicated, Kafka's complicated.

If you're saying that a replicated Postgres setup would be simpler than what you're built, I agree; but SQLite+Litestream probably would be too.


Litestream is too much work if you're not using S3: replication over sftp. Even fossil has nicer no nonsense replication done over http/s. It's way easier to set up mysql with replication than manage unix accounts and public keys.


Is this any good? https://github.com/rqlite/rqlite

I've been looking for a turn key solution that is better than me running a single node Postgres instance "bare metal" or in a container.

postgres-operator seems cool but... k8s, pretty heavy I guess.


It’s the default storage engine for FoundationDB - not sure many would agree that isn’t a “durable, distributed database”.


For one thing, they're ripping it out because of its poor write parallelism https://youtu.be/nlus1Z7TVTI?t=271

But that's orthogonal to my point. As a user of FoundationDB, you're not programming directly against SQLite, so you aren't going to run into these issues as much since FoundationDB exposes different semantics and coordinates concurrency across many SQLite instances in parallel.

I think it's best to think of SQLite as a replacement for your filesystem, rather than a replacement for your relational DBMS.


SQLite has been cool forever. It was the underlying data store for my machine learning email filter POPFile 20 years ago!

https://en.wikipedia.org/wiki/POPFile https://getpopfile.org/browser/trunk/engine/POPFile/Database...


It's high-quality software too. It's well-commented and exceptionally well tested.[^1][^2]

> As of version 3.33.0 (2020-08-14), the SQLite library consists of approximately 143.4 KSLOC of C code. ... By comparison, the project has 640 times as much test code and test scripts - 91911.0 KSLOC.

I don't usually place much stock in those sort of counts, but 640x is notable.

It makes sense considering the wide variety of use-cases, from embedded devices to edge computing and everything in between.

[1]: https://www.sqlite.org/testing.html [2]: https://sqlite.org/src/dir?ci=trunk


I used POPFile!! It was awesome.


SQLite was originally great for desktop applications.

Problem is, there's still a huge market for these apps but everything has moved to the web (no one is making desktop apps anymore). So having a full-blown RDMS is overkill for these kind of app, and now SQLite is starting to fill these web app needs.

@sqlite - if you are reading this, any word on merging WAL2 and BEGIN CONCURRENT into main? There clearly is a new class of needs to do so in this world that has completely moved over to web app development (which introduces concurrency problems never experienced on desktop). Any thoughts of focusing more on these web related needs for SQLite (or maybe even fork your own code base to have a more enhanced SQLite version targeted at web needs)?


I think it’s long overdue. While SQLite certainly has its limitations, it’s a winner in many categories. Even for sites with mild traffic using ordinary SQLite in PHP like a decade ago, it was always nice to use for its simplicity and the performance was totally acceptable. In comparison, the memory usage of typical relational database servers was high enough to make it hard to fit on a single lowend VPS with the same data and traffic. (I found myself tuning MySQL, but I never needed to tune SQLite.)


The main thing for tuning SQLite will be how to open it, e.g. in write-ahead mode, to turn on foreign keys (this needs to be enabled manually), and whether it should wait to get a database lock on slower hardware before giving up. There's also some gotchas like if you mark an ID column as primary key, it'll use the rowid as key - which can be reused if a row is removed. So you need to explicitly set primary key AND autoincrement, else you're going to have a bad time. (https://www.sqlite.org/autoinc.html)


If you define a table with an integer primary key you get autoincrement as default at least in newer versions.


With the mileage (and attention) those new products are getting out of using SQLite, I think Richard Hipp deserves a lot more acknowledgement for creating such an amazing piece of software.


New products getting a lot of mileage out of sqlite is old-hat at this point. It one of those rare evergreen techs - pay attention for a while and this latest round of attention will die down for 6-12 months then someone else will start another round of "look how cool sqlite is".

At least that's been my observation since I started coming around here.


I'm wondering if we'll see some similar energy around non-sql embedded databases like leveldb or rocksdb


Right! SQLite is great, but those two are great as well. It seems like the energy should be around "hey, you should consider a local, maybe even in-memory, database for some things!" more so than specifically "SQLite is great" (though it is).


Well I don't think it's a good fit for regular service, exactly how do you handle 2 replicas of the same service talking to the same DB?

The fact that it's just a file on disk limits the usage.


Projects such as litestream and rqlite have this figured out.


rqlite author here, happy to answer any questions.


Mutliple writer on the same SQLLite?


Transactions, locks, queues, etc. No different than multiple app instances changing the same row in other databases.

Any state mutation is ultimately ordered in time and how that that ordering is accomplished depends on the abstractions you're using: in your app, network layer, database, etc.


Why would you use SQLlite once you start dealing with network, just use MySQl or PG.

It's just re-inventing the wheel badly, I need to read the details but basically you're using a tool SQLLite that was not designed to be used outside of a single app use case.


What context are you talking about here?

For Cloudflare, they're offering it because it's simple and lightweight, and they already have their Durable Objects product which serves as the transaction ordering mechanism and takes care of writes.

If you're doing it yourself then sure it's probably not the best fit but that's up to you to decide.


SQLLite was not meant to be used by multiple process so you have to build the missing parts yourself, 100% those have more limitations and issues than regular RDBMS that were built for it.


Is think one way to think about this is to have one database being tied to one replica (replicas could handle more than one database). Where (importantly) the idea would be one database for each user. You horizontally scale for the number of users, but each user is only using one end node.

It’s interesting because you have to consider how to scale your database as well as your application. The fact that you don’t have one central database opens up more possibilities. But it doesn’t work for all instances (such as a shared read-write data source for all users). For example, this approach wouldn’t work for something like Twitter (at least the original architecture).



R2 is 3x more expensive than B2 (storage) https://www.backblaze.com/b2/cloud-storage-pricing.html

Am I missing something? Is there no bandwidth cost at all?


Yep, you're not charged for egress.


B2 to Cloudflare also does not incur egress fees: https://www.backblaze.com/blog/backblaze-and-cloudflare-part...

Backblaze B2 customers will be able to download data stored in B2 to Cloudflare for zero transfer fees. This happens automatically once Cloudflare is configured to distribute your B2 files.


I did Backblaze via Cloudflare setup.

I really don't care about the cost of storage. In my case it's the bandwidth costs that were killing me.

If it was available at the time, I would use R2 if only for simplicity.

If I was using Cloudflare Workers it would be another reason to use R2: I assume that it's easier to use and faster to use than any other storage system, since it's on the same network and written by the same people.

Also, exposing Backblaze via Cloudflare has it's issues. I ran into Cloudflare caching 404 responses from Backblaze and Backblaze being slow to make write visible.

So I would write into Backblaze and tried to access that key via Cloudflare proxy. While the write was acknowledged to my client it wasn't yet visible via http endpoint so Cloudflare would cache 404 response. I would have to clear the cache to fix and then I've added 5 min delay "just in case" to work around this.


Only to Cloudflare though, right? If users download, it'll be billable.

With R2, user downloads are free, aren't they?


Does anyone remember when we had Net Neutrality?


Yes, like it was yesterday (or today). It was a strange time where the term was often used for things that had nothing to do with the original meaning of the term.


Does R2 provide synching between regions? Maybe that's why it's so much more expensive? You're getting regional failover?


Presumably under the hood it'll be nicely distributed, as per https://blog.cloudflare.com/introducing-r2-object-storage/.

"Our vision for R2 includes multi-region storage that automatically replicates objects to the locations they’re frequently requested from."


Latency


Wow, this looks potentially very interesting. Since this is sort of fresh in my mind from the recent Fly post about it:

* How exactly is the read replication implemented? Is it using litestream behind the scenes to stream the WAL somewhere? How do the readers keep up? Last I saw you just had to poll it, but that could be computationally expensive depending on the size of the data (since I thought you had to download the whole DB), and could potentially introduce a bit of latency in propagation. Any idea what the metrics are for latency in propagation?

* How are writes handled? Does it do the Fly thing about sending all requests to one worker?

I don't quite know what a "worker" is but I'm assuming it's kind of like a Lambda? If you have it replicated around the world, is that one worker all running the same code, and Cloudflare somehow manages the SQL replicating and write forwarding? Or would those all be separate workers?


First, I'm very excited. Sure, SQLite has some limitations compared to Postgres, esp. regarding the type system and concurrency. But we get ACID compliance and SQL.

But it is really hard getting some useful information from this article. I can't even tell if it is not there or just buried in all this marketing hot air.

So, what is it really? Is there one Write-Master that is asynchronously replicated to all other locations? Will writes be forwarded to this master and then replicated back?

I'm very curious about how it performs in real life. Especially considering the locking behavior (SQLite has always the isolation level 'serializable' iirc). The more you put in a transaction or the longer you have to wait for another process to finish their writes, the more likely you have to deal with stale data.

But overall I'm very excited. Also by the fly.io announcement, of course. Lots of innovation and competition. Good times for customers.


>So, what is it really? Is there one Write-Master that is asynchronously replicated to all other locations? Will writes be forwarded to this master and then replicated back?

Not a lot of detail, but that is mentioned:

"But we're going further. With D1, it will be possible to define a chunk of your Worker code that runs directly next to the database, giving you total control and maximum performance—each request first hits your Worker near your users, but depending on the operation, can hand off to another Worker deployed alongside a replica or your primary D1 instance to complete its work."


Very cool! Glad to see all the love for SQLite recently.

One thing I've noticed that many commenters miss about read-replicated SQLite is assuming that the only valid model is having one, giant, centralized database with all the data. Lets be honest with ourselves, the vast majority of applications hold personal or B2B data and don't need centralized transactions, and at scale will use multi-tenant primary keys or manual sharding anyways. For private data, a single SQLite database per user / business will easily satisfy the write load of all but the most gigantic corporations. With this model you have unbounded compute scaling for new users because they very likely don't need online transactions across multiple databases at once.

Some questions:

Will D1 be able to deliver this design of having many thousands of separate databases for a single application? Will this be problematic from a cost perspective?

> since we're building on the redundant storage of Durable Objects, your database can physically move locations as needed

Will D1 be able to easily migrate the "primary" at will? CockroachDB described this as "follow the sun" primary.


I guess the first answer is: similar to Durable Object limits (unlimited databases / 50 GB total) since they alluded to those abilities more so than a simple file stored on R2 (only for backups).


Love the Northwind Traders reference! However, for a demo, I suggest a slightly larger and more complex data set, [data-generator-retail](https://www.npmjs.com/package/data-generator-retail).

The demo is also a bit buggy: orders are duplicated as many times as there are products, but clicking on the various lines of the same order leads to the same record, where the user can only see the first product...

I also think the demo would have more impact if it wasn't read-only (although I understand that this could lead to broken pages if visitors mess up with the data).

Anyway, kudos to the CloudFlare team!


I was thinking the same. The dataset is way too small.


Fixed the orders table. Good catch.


This looks amazing!

I see cloudflare people are on this post, any chance to compar D1 vs postgres in terms of DB features?

Insert ... Returning

Stored procedures and triggers

Etc etc

Would be really helpful to get a comparison like cockroachDB did here https://www.cockroachlabs.com/docs/stable/postgresql-compati...

Or even better, a general sql compatibility matrix like this https://www.cockroachlabs.com/docs/stable/sql-feature-suppor...

Kudos to the cloudflare team!


Well, it's sqlite... so presumably you will get most of the capabilities sqlite has.

RETURNING is covered.

Stored procedures are indirectly there by running your own code "next to the database", as mentioned in the post. Which is arguably much nicer than having to use some database specific language, given that you can run WASM on workers.


There is a layer on top of Sqlite here, so I imagine it's something less than all the capabilities sqlite has, at least initially. Plus the upsides and downsides from their approach to have a master and read replicas.


Yes was thinking the same. Nice to see some people here actually understood the question, thank you.


> Stored procedures are indirectly there by running your own code "next to the database",

"indirectly" is a keyword here, because running code when data is modified potentially won't replace triggers since they'll probably execute outside the running transaction.


Listen/notify


The announcement - if you read it before posting - says it's sqlite, so that's something you can punch into google.

Long story short, don't expect anything fancy. Support for alter table is limited, and concurrency can be an issue.


It is indeed sqlite but it could possibly have modification done or additions added. Please be considerate and think a little more before commenting.


All this recent hype around sqlite...

sqlite is a great embedded database and thanks to use by browsers and on mobile the most used database in the world by orders of magnitude.

But it also comes with lots of limitations.

* there is no type safety, unless you run with the new strict mode, which comes with some significant drawbacks (eg limited to the handful of primitive types)

* very narrow set of column types and overall functionality in general

* the big one for me: limited migration support, requiring quite a lot of ceremony for common tasks (eg rewriting a whole table and swapping it out)

These approaches (like fly.io s) with read replication also (apparently?) seem to throw away read after write consistency. Which might be fine for certain use cases and even desirable for resilience, but can impact application design quite a lot.

With sqlite you have do to a lot more in your own code because the database gives you fewer tools. Which is usually fine because most usage is "single writer, single or a few local readers". Moving that to a distributed setting with multiple deployed versions of code is not without difficulty.

This seems to be mitigated/solved here though by the ability to run worker code "next to the database".

I'm somewhat surprised they went this route. It probably makes sense given the constraints of Cloudflares architecture and the complexity of running a more advanced globally distributed database.

On the upside: hopefully this usage in domains that are somewhat unusual can lead to funding for more upstream sqlite features.


* the big one for me: very limited migration support, requiring quite a lot of ceremony for common tasks (eg rewriting a whole table and swapping it out)

I don't know where this idea of having to swap a whole table in SQLite came from, but it simply isn't true. Over the last 13 years I have upgraded production HashBackup databases at customer sites a total of 35 times without rewriting and swapping out tables by using the ALTER statement, just like other databases:

https://www.sqlite.org/lang_altertable.html

For the most recent upgrade, I upgraded to strict tables, which I could also have done without a rebuild/swap. I chose to do a rebuild/swap this one time because I wanted to reorder some columns. Why? Because columns stored with default or null values don't have row space allocated if the column is at the end of the row.


For a long time sqlite did not have DROP COLUMN and RENAME COLUMN support, which are both pretty essential.

I'm embarrassed to admit that I didn't realize RENAME COLUMN was actually added in 3.25, almost four years ago.

DROP COLUMN was only just added last year in 3.35.

I'm surprised a database schema lasted 9/12 years without ever renaming or dropping a column.

This changes things! But even now, ALTER TABLE is not transactional. So especially with many concurrent readers there can definitely be situations where you'd still want to rewrite.


I'm not sure what you mean by "not transactional". SQLite implements transaction support at the "page" level, and builds all other database operations on top of it, which means anything that touches the bytes of the database file is transaction-safe. You can verify this for yourself:

    sqlite> CREATE TABLE foo(a,b,c);
    sqlite> INSERT INTO foo VALUES (1,2,3);
    sqlite> BEGIN;
    sqlite> ALTER TABLE foo DROP COLUMN b;
    sqlite> SELECT * FROM foo;
    1|3
    sqlite> ROLLBACK;
    sqlite> SELECT * FROM foo;
    1|2|3
It's of course still subject to SQLite's normal restrictions on locking, which means a long-running ALTER statement will block concurrent writers (and probably also concurrent readers if you're not running in WAL mode).


> I'm surprised a database schema lasted 9/12 years without ever renaming or dropping a column.

I did have a couple of columns that were no longer needed and would have dropped them, but instead I just set them to null and ignored them. Nulls only take 1 byte of space in a row. I dropped them when DROP COLUMN was added.


It would really help if SQLite3 had a `MERGE`, or, failing that, `FULL OUTER JOIN`. In fact, I want it to have `FULL OUTER JOIN` even if it gains a `MERGE`.

`FULL OUTER JOIN` is the secret to diff'ing table sources. `MERGE` is just a diff operation + insert/update/delete statements to make the target table more like the source one (or even completely like the source one).

`FULL OUTER JOIN` is essential to implementing `MERGE`. Granted, one could implement `MERGE` without implementing `FULL OUTER JOIN` as a public feature, but that seems silly.

Sadly, the SQLite3 dev team specifically says they will not implement `FULL OUTER JOIN`[0].

Implementing `MERGE`-like updates without `FULL OUTER JOIN` is possible (using two `LEFT OUTER JOIN`s), but it's an O(N log N) operation instead of O(N).

The lack of `FULL OUTER JOIN` is a serious flaw in SQLite3. IMO.

  [0] https://www.sqlite.org/omitted.html


RIGHT and FULL JOIN are on the trunk branch of SQLite and will (very likely) appear in the next release. Please grab a copy of the latest pre-release snapshot of SQLite (https://sqlite.org/download.html) and try out the new RIGHT/FULL JOIN support. Report any problems on the forum, or directly to me at drh at sqlite dot org.


This is fantastic news, I'm very glad to hear that this is appearing soon! Thanks!


SWEEEEET!

Finally!

Thank you so much for this Mr. Hipp!

EDIT: Don't forget to edit the `omitted.html` page when you ship it!


Migrations have gotten better recently, but there are still cases where you need to follow the 12 steps very carefully: https://www.sqlite.org/lang_altertable.html#otheralter

Prisma Migrate can automatically generate these steps, removing most of the pain. I'm sure other migration tools can do this as well.



D1 does not throw away consistency. It’s built on top of Durable Objects which is globally strongly consistent.


"D1 will create read-only clones of your data, close to where your users are, and constantly keep them up-to-date with changes."

Sounds like there will be no synchronous replication and instead there will be a background process to "constantly keep [read-only clones] up-to-date". This means that a stale read from an older read replica can occur even after a write transaction has successfully committed on the "primary" used for writes.

So, while the consistency is not "thrown away", it's no longer a strong consistency? Anyway, Kyle from Jepsen will figure it out soon, I guess :)


Yeah, so you can always opt-in to strong consistency by transferring execution to the primary (see the "Embedded Compute" section of the blog). Then it's pretty much exactly the same as a DO.


Just clarifying - D1 without read replicas is strongly consistent. If you add read replicas, those can have replication lag and will not be strongly consistent.

Disclaimer: I work at Cloudflare :)


Thanks for the clarification, that is what I would expect.

Does SQLite support some kind of monotonic transaction id that can be used as a cache coherency key? Say a client writes a new record to the database which returns `{"result": "ok", "transaction_id": 123}`, then to ensure that subsequent read requests are coherent they provide a header that checks that the read replica has transaction_id >= 123 and either waits for replication before serving or fails the request. (Perhaps a good use for the embedded worker?)


Since it's a relational DB, and supports transactions, you can have a journal table right?

I know of a very important system at AWS that did this with MySQL :D


Yes you could do it manually, but it would be nice if the solution didn't require carefully managing update queries so the journal addition isn't missed and increasing write amplification to manually update a journal table when that information probably already exists somewhere in the WAL implementation.


Yup sorry about that. I missed the entire "read replica" bit when reading that blog post.


Interesting that D1 is built on top of Durable Objects. Does this mean that it would be practical for a single worker to access multiple D1 databases, so it could use, for example, a separate database for each tenant in a B2B SaaS application? Edit: And could each database be in a different primary region?


Yes, exactly!


That is interesting. I wish CF would give us some more information as I've assumed that there must be a lack of strong consistency which would be a major drawback.

Edit: But that would mean that durable objects can't be replicated asynchronously? That would mean a big latency hit. Then what's the difference to a central DB in one datacenter?


I’m not familiar with Durable Objects. When D1 does replication to read replicas, if it’s not doing synchronous replication, then it’s not strongly consistent, is that correct?


I wish the post had provided some more technical details.

It's more of a "quickstart" than a peek under the hood.


I'd like to see some up front D1 & R2 benchmarks (read/write/iops). I can't judge invocation cost value until I can judge my use case performance. Here's hoping its -gt NVMe raid 10 under the hood of D1 as some big SQLite reads suffer under slow storage.


Are you guys using litestream or a similar approach? E.g storing WAL frames in a durable object.


What types are missing from strict that you need?


Has anyone tried to write a new modern SQLite?


Why? Yes sqlite doesn't have all the features postgres has. Postgres doesn't have all the features the sqlite has either. What's wrong with having different tools with different sets tradeoffs. It's a different shape of Lego and that's fine - some things call for a 1/3height 2x2 and others call for a full height 1x8.


I think the most successful attempt would be Realm.

https://realm.io/


DuckDB comes to mind, but I can't speak to its differences from SQLite.

https://duckdb.org/


I haven't tried duckdb but I have been googling about it. I think I saw a discussion where it was mentioned that duckdb isn't a replacement for SQLite. It is an OLAP database [0] which makes its ingestion time slower than SQLite, I think. So it is meant for analytics but not as fullfledge replacement for SQLite.

[0]: https://en.wikipedia.org/wiki/Online_analytical_processing

Duckdb on HN: https://news.ycombinator.com/item?id=23287278


Close! DuckDB has very fast bulk insert speeds, but slower individual row insertion/update speeds. (Disclaimer: I write docs for DuckDB)


DuckDB is Olap SQLite. The vector engine is dope. But most of the innovation is in the OLAP stuff.


why do you consider sqlite to not be modern?

all the hip service providers seem to be all over it which would indicate pretty good modernity to me at least.


Not clear from reading the post if the SQLite C library is embedded and linked in the Worker runtime (which would mean no network roundtrip) or if each query or batch of queries is converted to a network request to a server embedding the SQLite C library.

That's important to understand because that's one of the key advantages of SQLite compared to the usual client-server architecture of databases like PostgreSQL or MySQL: https://www.sqlite.org/np1queryprob.html


This is really interesting, it's (basing it on SQLite) exactly what I was expecting CloudFlare to do for their first DB.

Its perfect for content type sites that want search and querying.

Anyone from CF here, is it using Litestream (https://litestream.io) for its replication or have you built your own replication system?

I assume this first version is somewhat limited on write performance having a single "main" instance and SQLite laking concurrent writes? It seems to me that using SQLite sessions[0] would be a good way to build an eventually consistent replication system for SQLite, would be perfect for an edge first sql database, maybe D2?

0: https://www.sqlite.org/sessionintro.html


1. No, it's not built on Litestream. Operating a massive network and shuttling data around is kind of our thing.

2. We are going all in on databases and D2 sounds like a cool name for something...


R2, D2. I see what you did there!


Have any of the problems that led people to use Postgres instead of SQLite actually been solved? Are we doomed to repeat the same mistakes?

Also, any plans to support PATCH x-update-range so SQLite can be used entirely in the browser via SQLite.js?

Can someone enlighten me with the types of use cases this would be better for vs say Postgres?


It isn't as much as folks who need Postgres features are moving to SQLite just because it is cool, but it is folks who don't want those Postgres features moving to SQLite, because the latter has just enough features they only ever really need.


SQLite made sense as an embedded database on day a desktop or phone because there’s only a single person generally writing to it. The perfect use case.

I don’t understand how it will be usable at all in a website with multiple users. Is the idea to make your site to every user gets their own database? How do you stop SQL injection?

Once you solve all of these problems aren’t you better off just using Postgres?


> I don’t understand how it will be usable at all in a website with multiple users

With WAL mode enabled the database is locked during writes only, and concurrent writes are queued but you can still perform reads concurrently. If you keep your write transactions small and consider that a lot of apps aren't writing a lot, it can give perfectly good performance for a lot of usecases.

> Is the idea to make your site to every user gets their own database?

You can do... I know of B2B apps that give each billable customer their own database.

> How do you stop SQL injection?

In the exact same way you do in all other flavours of SQL - with parameterized queries.

> Once you solve all of these problems aren’t you better off just using Postgres?

Not necessarily. Postgres gives you a different set of problems and limitations to consider and work around.


Most of the problems with Postgres are a result of it not being embedded. If you’re using SQLite in an non embedded fashion I don’t see how you don’t inherit the same problems.


Which problems were you thinking of?

Cloudflare and fly.io both promise hassle free read replicas and backup. They will both offer only a single node capable of writes, because that’s how SQLite rolls.

This is a pretty good fit for a read heavy load that requires SQL and very low latency.


I guess I’m not understanding what the benefit is vs hosted Postgres. Also low latency and setup can be equally trivial - see supabase for example.


Biggest benefit over hosted PostgreSQL is that you get SELECT queries that are measured in microseconds, because SQLite avoids needing network overhead per query.

https://www.sqlite.org/np1queryprob.html


Wouldn’t D1 introduce network overhead?


Yes for writes, but it shouldn't for reads: it looks like it works by replicating the full database down to each edge location where the code is running.


Hope this can give you some concrete answers: https://www.sqlite.org/whentouse.html


The important drawback is async replication and therefore the lack of full consistency. On the other hand, this is the big advantage of hosted Postgres and the like.

Those offerings are great for use-cases that don't need that kind of consistency, which are many.


No and no. I think this is great for Edge computing, where there is currently no solution. So, it's better than nothing.

It all depends on the use-case, of course. A traditional hosted Postgres or MySQL database or cluster is certainly the go-to solution for all who need advanced features or full consistency, which only synchronous replication could provide.


What problems? Both are for different use cases albeit overlapping.


Concurrent writes, for one.


To the person from Cloudflare I complained to in last year's thread about putting your money where your mouth is on serverless databases:

You weren't lying, and this is super cool - the SQLite hype train also seems to be in full force.


It's interesting to see a relatively old technology get hyped.


:-)


I'm buying Cloudflare stocks right now.

In 2-3 years from now, these services will be so mature and strong they will be crushing the cloud market.

They're turning dreams into reality, one after another.


Cloud business is driven by enterprise generally. Would enterprise be using SQLite?


they should be using SQLite more often than they are.


Why? What use cases are better with SQLite vs Postgres, MySQL, etc?


Some "pros" that many find appealing:

* Copying the database around its a file copy in sqlite. Each database is it's own single file. (there's also WAL stuff that you get control of)

* No extra service to deploy, manage, and/or optimize. I don't fully agree with the following, but I had a colleague who used to say "If you don't have multiple app servers writing to the db, postgres is a waste of effort".

* embedded means way lower data latency - if the dataset is in the fs cache even lower, no waiting on network transactions.

I've frequently chosen it over PG in cases where I needed basic relational data operations. In one case we ingested a large dataset (a few gbs of measurements) once an hour. Then we did some initial analytics on those measurements and threw the results in the same db file. After that step was done, the data was read only for several other systems and we just copied the db file to each of the systems that needed the data on-demand. A couple of the systems did additional analytics and effectively imported the db file to a different db (one was PG another was a graph db - neo4j). A couple of the systems just used the db file directly. It worked our really well.


less things to go wrong. This gives you benefits all the way up the dev stack. Changing an integration test from needing its own db server installed to just a couple of files on disk is a big difference in complexity. You can probably run that test with just a local disk almost infinitely entirely deterministically, conversely as soon as you go onto the network all bets are off.

Granted, if you already have the tooling its less of a big deal but if you don't then you need the tooling, e.g. your build and test machines need to have access to some sql installation somewhere and that the process of maintaining that can be a flaky one.


If you consider cost... I would imagine a fair few. From the article:

> We will ensure that D1 costs less and performs better than comparable centralized solutions.


If SQLite gets you excited, I'm building a firebase alternative based on sqlite. I'm betting hard on sqlite so this get's me super excited!!

https://javascriptdb.com

CF people around, I would love to chat, if anyone is interested please reach out at: jp@javascriptdb.com

I'll be applying to this beta for sure!


Super interesting! I really like the idea. I'll join the beta, email sent :)


Any feedback on what do you find interesting would be awesome :) thanks!!


Any current or planned support for existing ORMs, such as Prisma or TypeOrm?

Also, I wonder how hard it will be to migrate existing PostgreSQL databases and SQL statements. Of course, I understand if Cloudflare is focused on greenfield applications.


Prisma won't work with D1 out of the box. The primary limitations are:

- SQLite is traditionally embedded in an application, so Prisma interacts with it by mounting a file. Workers does not have a local filesystem, and D1 is exposed over the network through an API accessible from a Worker. Prisma will have to create a specific connector for D1. - Workers have a script size limit which is currently 1MB. My understanding is that Cloudflare will be increasing this in the future. We also have specific work to decrease the size of Prisma. Both of those will have to happen before Prisma could be used with D1.

Note that Prisma already support querying Postgres, MySQL, SQL Server and MongoDB from Cloudflare Workers through the Prisma Data Proxy, which will see a GA release next month.

We are also very excited about D1 as a way to bring a subset of data closer to users in order to deliver faster experiences. We hope this will be a way to bring the benefit of edge computing to larger organisations who cannot simply rearchitect everything to run on Workers.


> We are also very excited about D1 as a way to bring a subset of data closer to users in order to deliver faster experiences. We hope this will be a way to bring the benefit of edge computing to larger organisations who cannot simply rearchitect everything to run on Workers.

I am also excited about this :)


Before you consider using an ORM, try using regular SQL and some tooling first; your future self will thank you. Just write the code, it's only volume and it's not so bad.


I took this advice on my last project and ended up re-writing the whole thing to use Prisma later. I launched and had a successful event with raw sql but it quickly became unwieldy. Prisma gives me type safety throughout my app (written in Typescript) and would have prevented a number of bugs/pain points as my app grew. And I'm only 1 developer, this gets worse if you have multiple people working on it. I still write raw sql for reporting/aggregation (Prisma's features here only work for basic examples in my experience) and I'm not "scared of raw sql" but I can move much faster when I have the guardrails of types.


Totally agree.

Source: someone who avoided learning SQL for 20 years.


+1 to this as well.


We are definitely interested in ORMs. Want to make it easy to use. I hope someone creates the next Rails using Workers. And having other models on top of our SQL offerings will be important. Get in contact and let us know what you'd like.


> I hope someone creates the next Rails using Workers

I too am eagerly waiting for a good serverless nodejs framework that is "batteries included". I've deployed on Lambda using the "Serverless Framework" but once your app grows to a certain size everything starts to fall apart and you lose some of the magic. Unfortunately, most of the things that advertise themselves as serverless/lambda/worker nodejs frameworks are monoliths and/or an existing monolith framework that "supports" lambda (with a billion asterisks after that). There is absolutely nothing wrong with monolith frameworks, I love them, but just not for lambda, I want to deploy a single endpoint as a single function (or as a cron, or queue listener, etc), not all of my code for every function (you hit size limits quick with this method).

I want express/nestjs/etc-type routes that I define with code or annotations that result in /only/ that function (endpoint) being bundled up and deployed. I ended up rolling my own "framework" on top of Serverless Framework (uses serverless.ts config file that scans my directories for a special file that defines the routes defined in that directory) but Serverless Framework is pretty shaky ground. Their documentation is a mess, Serverless Components appears dead, and they seem to be busy with their own "cloud" so I don't know how much longer I can keep building on top of them.

When it works it's like magic but there are a ton of walls you run headfirst into: Cloud formation entity limits, package size limits, typescript/bundling support, clear disregard for medium/large projects ("Just use multiple services", this leads to a terrible dev experience), and long deploy times.

I wish CF Workers had been out when I first started building my current project, I might have gone in that direction instead, I still might.


Hey Josh,

I'm building a serverless firebase alternative that uses SQLite. If CF gives me access I will totally support D1 & workers.

Check it out: javascriptdb.com


Thanks! I'll check it out.


Does Cloudflare Workers now support a large number of workers under a single domain without having to use an expensive pricing tier?


I'm not sure, I've not done the full research into CF Workers since I'm on AWS Lambda right now and don't have the capacity to evaluate alternatives. I just like a lot of the CF products and their general ethos/vibe so I'm interested in it. Who knows, it might have a whole new set of issues (most likely) but I don't know if those issues are worse or better than what I'm dealing with now.


You might want to consider adding Deno [1] to the language examples: https://developers.cloudflare.com/workers/platform/languages...

Deno can compile to wasm, so it can plug in through that vertical. But it's just TS on the frontend.

I'm mainly a python programmer, but Deno's been the most alluring development in the JS ecosystem since typescript for me. Might be helpful to you all to capture some steam from source.

[1]: https://deno.land/


I'm building an open source firebase alternative using sqlite. I'll be reaching out, I was thinking to build the distribution & durability part myself, but I would rather use D1!

I guess it would count as a client focused ORM :)

I'll be reaching out from jp@javascriptdb.com

Great addition, congrats!


Will not any existing ORM that supports SQLite support D1? I looked in the post for details on how it extends SQLite (is the query language different or extended, semantics very different, etc.) but didn't notice anything.


I think the main issue will be with ORMs that are tightly coupled to a specific SQLite driver, such as Prisma.


They should.


This should have a virtual file system. CF should write it so each user doesn't have to load a JS abstraction and it has better performance.


Before you consider using an ORM, try using regular SQL and some tooling first; your future self will thank you. Just write the code, it's only volume and it's not so bad. What is bad is learning a 3rd language on top of SQL and JS/TS that you somehow have to manually map to SQL.


This is so cool!

From the blog post it says read-only replicas are created close to users and kept up to date with the latest data.

- How should I think about this in terms of CAP? If there's a write and I query a replica what happens?

- How are writes handled? Do they go to a single location or are they handled by various locations?

I'm excited to try this. It's so cool to see databases being distributed "on CDNs" for lack of a better term.


I think they're replicated asynchronously, so reading directly from the replica may return old data. That's why they've added the ability to deploy special workers that "live" closer to the primary:

> Embedded compute

> But we're going further. With D1, it will be possible to define a chunk of your Worker code that runs directly next to the database, giving you total control and maximum performance — each request first hits your Worker near your users, but depending on the operation, can hand off to another Worker deployed alongside a replica or your primary D1 instance to complete its work.


"With D1, it will be possible to define a chunk of your Worker code that runs directly next to the database...each request first hits your Worker near your users, but depending on the operation, can hand off to another Worker deployed alongside a replica or your primary D1 instance to complete its work."

That's interesting to me. It opens the door for Cloudflare to offer something more like a "normal" serverless offering. One that can run containers, or least natively run Python/Golang/Java/etc, like AWS Lambda does. And with this ecosystem described above that can conditionally route between the lighter edge Workers and the heavier central serverless functions. To me, that's the tipping point where they start to threaten larger portions of AWS.


Big fan of Cloudflare but I wish they would stick to descriptive product names.

Good: Workers, KV, Durable Objects, Cron Triggers

Bad: Spectrum, Zaraz, R2, D1


Naming is hard.

> Zaraz

That's the name of the company they acquired. Though, I do agree that more descriptive naming is nice.

E.g.

Zaraz = SafeXXS

D1 = LDS (light database system)

R2 = ObjectStore

Spectrum = Reverse Proxy


The API for this is currently the only thing I wish I could grok a bit better. It seems like it would be hard to make it work with existing libraries that can access SQLite, which is kind of a shame.

I'm thinking of sqlx in Rust (or any other language binding / ORM for that matter), which has compile time schema safety. This is a nice capability, and because this interface seems non-standard (possibly for good reason), I guess we are being asked to give some of those things up.

I am getting a bit ahead of myself on the Rust part (presumably that will eventually be supported as part of workers-rs), but I think the feelings still stand if you consider the JS ecosystem.

Edit: I may actually be wrong, but presumably the entire surface isn't covered because there's no file opening, etc.


There might be a `env.DB.url` (e.g. the jdbc URL) which you could pass into an existing library.


I'm kinda willing to make a bet that this rides on top of what looks like HTTP to the Javascript engine. That's how their worker-to-worker and worker-to-durable-object protocols are.

(It's not really HTTP as in it might never cross a TCP socket, just get shuffled from one V8 isolate to another, but it looks like a `fetch` call to the Javascript.)

It's also worth remembering that SQLite itself has no wire protocol, it's a library. And there is no such thing as a "SQL wire protocol". It sure isn't gonna be Postgres wire protocol either.

From the article:

> D1’s API includes batching: anywhere you can send a single SQL statement you can also provide an array of them, meaning you only need a single HTTP round-trip to perform multiple operations. This is perfect for transactions that need to execute and commit atomically:


Interesting thought! Would love to see more details.


Best Effort Writes[1] are an opportunity here. Non-transactional, write to the local replica (ensure foreign keys, constrains, valid data, etc...) and then try to write to the main write-enabled DB. Caching should work without changes since the local replica is updated. This could be cheaper (send binary diffs) and more resilient to brief network issues.

The key is to let the user decide what really needs ACID and what doesn't. If someone wants to make the next Facebook or Reddit they'll need huge write throughput and if some votes or updates are lost, that may be a good trade-off.

[1] You could add a BEW file (like WAL file) to sqlite for Best Effort Writes.


All these hype around SQLite recently and I am still confused.

* How do you replicate it consistently?

* Who has the master privilege (or masters if sharded)? What's the failover story?

I am guessing a blob store is involved, but I have gaps in my understanding here.


SQLite has a write ahead log (journal) mode. If you write that log to some store that is already replicated (S3, CloudFlare Durable Objects, Kafka?) then the concept of a 'master' is less important.


Not a expert on DOM or JavaScript so be kind ;)

One thing I hope to see in the future is a better product filtering experience. When I worked on a jquery product filter I realized the DOM bloat was the main problem.

I wonder if D1 can help devs build instant product filtering pages that don’t require the reload like microcenter or Newegg does.

IE https://www.newegg.com/p/pl?d=hdmi+cable&N=-1&SortType=8


At any sufficient scale, it is difficult to do filtering on the client. Yes, it can be done, but with 10,000+ potential records, you don’t want to ship that to the client for each query. (Note: I’m thinking Newegg scale for “hdmi cable” here. There are certainly situations where you can ship the entire database to the client for filtering.)

It’s not DOM bloat… it’s too many records. If you’re building a DOM node for each record, that’s bloat, but you still have the problem even if the results are stored in a JSON object and dynamically queried on the client side.

So, for each new filter or new query you need to hit the server anyway. If that’s an asynchronous query that returns a json blob or a full refresh, IMHO, it doesn’t really matter that much. Either way, you’re rebuilding a large portion of the DOM with the new results. The only thing that skews things in favor of an async call is if the rest of the page is so heavyweight that reloading the page takes a significant amount of time. This is probably what you’re taking about.

Having a SQLite db close to your worker node really isn’t going to affect this problem all that much.


It's probably better - especially for more advanced search engines - to have an elasticsearch instance or whichever is the more recent example handle product search and filtering like that.


So can we assume that D2 will be postgres/mysql ?


It sounds like you're making a simile but I don't understand it. The article did literally state D1 is based on sqlite.


The opening paragraph reads "Today, we're excited to announce D1, our first SQL database." read: first

and well R2 and D2 would make for a great naming scheme.


Ah sorry I missed you saying d2 not D1.


dang i was hoping for postgres so i can use postgis

edit: maybe one day! this looks cool regardless


I'm right there with you. I wonder if this is an SQLite compatible API on top of their own solution, or if it's using actual SQLite under the hood with custom replication.

If the latter, and anyone from CloudFlare is here, is there any chance to have SpatiaLite enabled?

https://www.gaia-gis.it/fossil/libspatialite/index


Seconding a vote for Spatialite support! I came here just to make that same request.


No need to call dang!


lol i'm from a place where "dang" is a natural part of our vocab


What write throughput and latency can we expect from this database?

Are there any limitations, for example on the number of tables or size of the database?


With this we can probably switch our infrastructure off AWS and entirely onto Cloudflare.


So where are the databases running? In the same regions as workers?

Is the data replicated to all regions?


This is convenient, I’ve been building an app which is using SQLite but am wanting to deploy it to Cloudflare pages. I expected I was going to have to switch to a hosted Postgres instance somewhere, but this could be perfect.


So I assume we'll see a nice big donation to the sqlite coffers, then?


Unless I missed it by skimming, where are the deets? Is this strongly or eventually consistent? What are max table sizes, and do they become partitioned? Are there cross partition joins?


This looks awesome. I was thinking about creating a custom version of this to live behind a CF Worker. Much better to have an official version!


Too bad you probably can't use this to store data about EU citizens. Phone numbers like they show in the demo are considered PII, right?


why?


Will they seriously challenge Azure, AWS and GCP eventually? Cloudflare is very innovative and what they are doing is really exciting.


The unique thing about Cloudflare's product offerings is how global-first they are; traditional cloud providers (AWS to DigitalOcean) have a very region-oriented domain model, with select christened services allowed or architected to be global (ex: AWS Cloudfront, IAM, Route53, that's about it there). That's their disaster/failure model; but all it really does is force cross-regional architecture onto the customer. Most customers don't bother.

In comparison, everything at CF is global. And its not just "global" from an AWS perspective of "we've got 14 regions and your stuff runs in all of them"; its global from 300+ points-of-presence, within 50ms of like 98% of all humans. CDN for compute, databases, etc.

CF has a way to go in DevEx on many of their products. For example; Workers, being based on V8 Isolates, is a pain to use even compared to e.g. Lambda. It's a battle of figuring out what's possible and what isn't within the runtime. But I'm sure it'll be improved!


Glad to hear was considering moving to Deno Deploy + Supabase because KV was not good for relationships.


How does this work when developing locally. Is it SQLite for local development?


I was expecting this to be using https://en.wikipedia.org/wiki/D_(data_language_specification... given the name.


Is this going to be open sourced? Seems to be building on the shoulder of a particular giant that could use a bit wider ecosystem.


Any word on pricing =)?


Its a bold strategy, Cotton, sounding a bit like they want to compete with AWS.


Our first database … I like it. I wonder what’s next


First, super excited by having Cloudflare offer a RDMS (can SQLite be called that?)

This enables entirely new classes of applications where everything can now be hosted by Cloudflare.

Questions:

a. To help with concurrent writes, will Cloudflare be using WAL2 and BEGIN CONCURRENT branches of SQLite?

b. How is Cloudflare replicating the data cross region? Will it be Litestream.io behind the scenes?

c. Will our Worker code need to be written differently to ensure only a single-writer is writing to SQLite database?

d. How does data persistency and database file size get factored in? I have to imagine their is a limit to how much storage can be used, whether or not that storage is local to the Worker machine, and if its persistent.


[flagged]


While the blog does use "primary", people are saying "master"/replica, much like how redis refers to their architecture[0]. In fact, ctrl-f "slave" only brings up your comment.

0: https://redis.io/docs/manual/replication/


"Master", however, is as prevalent as "primary" in this discussion "slave" is the implied counterpart.

It would be better if we completely abandoned that terminology.


Out of all the comments that use the term "master" as of right now[1-5], the first three specifically use the term "replica" while the 4th and 5th use the verb "replicate" (in the present tense and past-tense).

1: https://news.ycombinator.com/item?id=31341392

2: https://news.ycombinator.com/item?id=31341661

3: https://news.ycombinator.com/item?id=31340318

4: https://news.ycombinator.com/item?id=31340169

5: https://news.ycombinator.com/item?id=31342145


It wouldn't be better, it would be arbitrary


No one really means anything ill and this political correctness madness needs to stop.


[flagged]


Because it's not painful to others and intent always matters.

These words are everywhere in the language; you're not really changing anything with these antics other than derailing the subject to appease those who assume offense on behalf of an imagined group of people that can't distinguish context.


Most of us moved on to better terminology 4+ years ago. The only ones derailing conversations are grumps like yourself who refuse to get with the program. Why is this so important to you?


Who are you and what's this "program" you deem to impose on others?

No thanks, I'll stick with the actual majority that have mastered using relevant language and rational context in discussions without being slaves to performative social constructs.

Instead of assuming what's actually important to me, perhaps some introspection of why you immediately think of slavery in a computing context would be more helpful.


I think the question is, is it painful or harmful? The early discussions only seem to reference its potential problems, with the 2014 Drupal repo[0]'s reasoning being "those terms may carry racially charged meanings to users", and the 2018 Python bug[1] referencing "for diversity reasons". Maybe there are scholarly papers on this issue?

0: https://github.com/django/django/pull/2692

1: https://bugs.python.org/issue34605


How do you know it's painful for others? Are you assuming it's painful for others?


It is most probably not meaningfully painful for people, but it does seem to deflect responsibility from actually helping marginalized people, while giving status to assholes who enjoy harassing others instead of contributing anything.

In some cases such as the git branch name, it also actively lost us many hours.


It is painful to me and many others to switch from innocuous technical terms like master/slave, man in the middle attacks, motherboard, mount, whitelist, blacklist, etc, to satisfy the whims of a pedantic minority.

But I know my opinion doesn’t matter to those people anyway.


Because it is being used for manipulation, falsely. "Oh, oh, this hurts me! I cannot bear it!"

Recollect Emory University, in 2015, when someone wrote "TRUMP 2016" in chalk on the sidewalk. "Traumatized" protestors shouted "You are not listening! Come speak to us, we are in pain!" It's manipulation, pretending to be hurt. These people weren't locked in some kind of perpetual seizure for the four years of the Trump presidency, wailing in continual agony. It was a show.


Can you explain what incentive someone has to "manipulate" us into no longer using the term slave, if not genuine discomfort?


It makes petty people feel they have power over you. It’s not like they are that wrong. We like to reward norm enforcing, but we aren’t sufficiently inoculated against exploitative ones. The end game can be seen in any big religion.


Well, why did the histrionics at the campus occur? Do you feel their discomfort was genuine? Bonus question: if it was genuine, what would you expect the outcome to be for them during the Trump presidency?


> If we can avoid terminology that is painful for others, why shouldn't we?

Because other people's feelings shouldn't be treated as important as they have (relatively recently) been elevated to be, particularly in the very dysfunctional and broken US culture.

Ideally we start stinging the soft US culture with a lot more political incorrectness. It's not a benefit for society to be so weak on such trivial matters as offensive terminology. Which is exactly what will follow this era of political correctness: inevitably the pendulum will swing back the other direction. We're due for an aggressive era of rebellion against the censorship and political correctness brigades.


People want to change Chess to red/blue. I'll continue to play with white and black pieces.

This is a slippery slope of destroying the society by being hyper sensitive about things that no one really means. Absolutely hate this and fills me with disgust that people obssess over this kind of petty things.

Life is beautiful. Enjoy it. Be kind to others that have zero intention of offending you.


The person you're replying to is very kind and considerate.

They're not assuming people using the problematic terminology do so with the intention to harm. They are, however, mindful of its negative effects, hence the gentle reminder.


The person I am replying to is very kind indeed. The rest of society obsessed with this political correctness madness are extremely hostile, engaging in shaming and generally have sadistic vibes of oppressing people with their moral superiority – which has zero basis except they discovered a new offending word to pile up on others.

This needs to stop IMO.


The behavior you're describing exists, but I don't think what you are describing is a minority behavior, generally exhibited by people who often suffer directly from the use of said terminology. I don't condone violence, but anger is understandable in that case.

Also, if an idea has some idiotic proponents doesn't mean it's inherently false.

Minorities are in the process of gaining cultural power and influence, to the detriment of the majority. This is unpleasant when you are on the receiving side, but IMO necessary right now.


I prefer to make the world better through actions.

Not pointlessly redefining words that have no ill intent in the first place and harassing people that don't use the words you decided were proper.


Just a thought, what if we made the world a better place through clearer terminology? It's not a major improvement, but it helps reduce friction in communications.

Ignoring connotations, master/slave is pretty unclear to me and needs more explanation. I've seen it used to describe:

* a coordinator with a worker pool in which the coordinator sends jobs to the workers for actual computation

* a designation of where on the bus a device sits

* a designation of which copy is the source of truth/gets writes (master) compared to the read only copies

* a designation of which instance is currently doing the work vs which is the standby in an active-active failover

Yes those are all similar, but it's annoying to hear someone say "this is configured in a typical master/slave pattern" only to be left wondering which of the above applies.

I guess we could make some sort of purity argument about which of these is the one-true use for master/slave. Or we could differentiate the cases with more applicable and descriptive words. As a bonus some folks aren't offended.


> Ignoring connotations, master/slave is pretty unclear to me and needs more explanation. I've seen it used to describe:

> * a designation of where on the bus a device sits

I see, you're worried people will get this mixed up with Rosa Parks?

Honestly, it feels like the PC brigade is really having to bend over backwards to find arguments. That should tell them something.

> Or we could differentiate the cases with more applicable and descriptive words.

So you get Primary / Secondary or Main / Secondary... For all of your above cases. I don't quite see how that differentiates them.


> I see a lot of use of the "master/slave" terminology in this thread.

Hey, some people are still hosting their databases on clunky old IDE hard drives! Let's not be insensitive to their lived experience for the sake of technical novelty.


Now is this a Cloudflare ($NET) buy signal? I think you know the answer.

Maybe they will announce a Hashicorp competitor in their next reveal. Who knows.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: