JSONB has landed

luhn · on Dec 6, 2023

Lots of confusion on what JSONB is.

To your application, using JSONB looks very similar to the JSON datatype. You still read and write JSON strings—Your application will never see the raw JSONB content. The same SQL functions are available, with a different prefix (jsonb_). Very little changes from the application's view.

The difference is that the JSON datatype is stored to disk as JSON, whereas the JSONB is stored in a special binary format. With the JSON datatype, the JSON must be parsed in full to perform any operation against the column. With the JSONB datatype, operations can be performed directly against the on-disk format, skipping the parsing step entirely.

If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick. If you're querying or manipulating the data using SQL, JSONB will be the best pick.

cryptonector · on Dec 6, 2023

TFA doesn't say that this is _the_ JSONB of PostgreSQL fame, but I assume it must be. JSONB is a brilliant binary JSON format. What makes it brilliant is that arrays and objects (which when serialized are a sort of array) are encoded by having N-1 lengths of values then 1 offset to the Nth value, then N-1 lengths of values then... The idea is that lengths are going to be similar, while offsets never are, so using lengths makes JSONB more compressible, but offsets are needed to enable random access, so if you store mostly lengths and sometimes offsets you can get O(1)ish random access while still retaining compressibility.

Originally JSONB used only offsets, but that made it incompressible. That held up a PostgreSQL release for some time while they figured out what to do about it.

cryptonector · on Dec 6, 2023

Ah, https://sqlite.org/draft/jsonb.html says:

> The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but they have wildly different internal representations and are not in any way binary compatible.

masklinn · on Dec 6, 2023

> TFA doesn't say that this is _the_ JSONB of PostgreSQL fame, but I assume it must be.

Definitely is not: Hipp states

> JSONB is also slightly smaller than text JSON in most cases (about 5% or 10% smaller)

Whereas in Postgres jsonb commonly takes 10~20% more space than json.

ks2048 · on Dec 6, 2023

Link? My reading last time I looked at this was that the sqlite and postgres “jsonb” were different.

cryptonector · on Dec 6, 2023

This is the first time that SQLite3 is getting JSONB. Idk if it's the same as PG's. TFA doesn't say. I assume they are the same or similar because I seriously doubt that D.R. Hipp is unaware of PG's JSONB, but then for that reason I am surprised that he didn't say in TFA.

anarazel · on Dec 6, 2023

https://sqlite.org/draft/jsonb.html references postgres:

> The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but they have wildly different internal representations and are not in any way binary compatible.

cryptonector · on Dec 6, 2023

https://sqlite.org/draft/jsonb.html says they're NOT the same.

lovasoa · on Dec 6, 2023

> If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick.

There is no json data type ! If you are just storing json blobs, then the BLOB or TEXT data types will be the best picks.

gwd · on Dec 6, 2023

In context, what is clearly meant is, "If you're just reading and writing full JSON blobs, use `insert ... ($id,$json)`; if you're primarily querying the data, use `insert ... ($id, jsonb($json))`, which will convert the text into JSONB before storing it.

dheera · on Dec 6, 2023

> There is no json data type

Why not? I feel like a database should allow for a JSON data type that stores only valid JSON or throws an exception. It would also be nice to be able to access subfields of the JSON.

SELECT userid, username, some_json_field["some_subfield"][0] FROM users where ...

Not sure where to give feature suggestions so I'm just leaving this here for future devs to find

sroussey · on Dec 6, 2023

SQLite has a surprisingly small number of types: null, integer, real, text, and blob. That’s it.

https://www.sqlite.org/datatype3.html

chungy · on Dec 6, 2023

> It would also be nice to be able to access subfields of the JSON.

Not that it's not _useful_ sometimes, but it amuses me that this is a huge violation of 1NF and people are often ok with it. It really depends on whether you're treating the JSON object as an atomic unit on its own, regardless of contents, or using the JSON to store more fine-grained information.

I guess the same argument can be made for XML data types, and everyone's OK with it too.

marcosdumay · on Dec 6, 2023

The principles of relational algebra aren't always compatible with real applications.

They are successful beyond anything that I can imagine people expecting when creating them. But they are not that complete silver bullet that solves every problem humanity will ever need solved.

icedchai · on Dec 7, 2023

Database design is, unfortunately, a lost art. Up through the early 2010's I remember having design reviews for database schemas, etc. That isn't "agile"... so you just fix it in the next sprint, as you explain to someone what a unique constraint is and why a table is now full of duplicates.

yencabulator · on Dec 7, 2023

1NF is a theory construct that makes no accommodation for real-world performance. I'm shuddering even thinking how many tables and joins I would need to store some of these 3rd party JSON things I need to import & refine.

mdaniel · on Dec 6, 2023

The H2 database uses almost that exact syntax:

  create table foo(my_json json);
  insert into foo values ('{"a":{"b":{"c":"d"}}}' FORMAT JSON);
  select (my_json)."a"."b"."c" from foo; -- where the () around the field is mandatory

https://h2database.com/html/grammar.html#field_reference

0x457 · on Dec 6, 2023

SQLite has a very few actual data types. There is a function that validates json and you can use it as a check on your `TEXT` or` `BLOB` column.

You can access subfields using `json_` and `jsonb_` functions.

renegade-otter · on Dec 6, 2023

Indeed. Those who use Postgres are already familiar with the difference.

Rule of thumb: if you are not sure, or do not have time for the nuances, just use JSONB.

btown · on Dec 6, 2023

There’s a huge nuance worth mentioning: with JSONB you lose key ordering in objects.

This may be desired - it makes two effectively-equal objects have the same representation at rest!

But if you are storing human-written JSON - say, configs from an internal interface, where one might collocate a “__foo_comments” key above “foo” - their layout will be lost, and this may lead to someone visually seeing their changes scrambled on save.

sgbeal · on Dec 6, 2023

> There’s a huge nuance worth mentioning: with JSONB you lose key ordering in objects.

JSON never guarantees any ordering. From json.org:

> An object is an unordered set of name/value pairs.

If you depend on a given implementation doing so, you're depending on a quirk of that implementation, not on a JSON-defined behavior.

pie_flavor · on Dec 6, 2023

Using a function to query a fact about a string does not imply that the string should change, regardless of what guarantees are present within the format that the string is in. If you expect to get out the same string you put in, then there's nothing implementation-quirk about that.

sgbeal · on Dec 6, 2023

> Using a function to query a fact about a string does not imply that the string should change, ... then there's nothing implementation-quirk about that.

Fair enough when considering only the string value. My ill-expressed point was more about JSON as a data exchange format - it may pass through any number of JSON parser/storage implementations, any of which may use an arbitrary order for the keys.

ric2b · on Dec 7, 2023

You're not storing a string, you're storing a JSON object. If you want to store a string use TEXT or VARCHAR.

pie_flavor · on Dec 8, 2023

That is what this post chain is about. The nuance between choosing to store your JSON as a string in a TEXT column or as JSONB in a BLOB column.

ash · on Dec 6, 2023

What about SQLite JSONB? Does it maintain order?

UPDATE: I think SQLite JSONB does maintain order. For example:

    select json(jsonb('{"a": 1, "b": 2, "c": 3, "d": 4}'));"
    -- {"a":1,"b":2,"c":3,"d":4}

And it does maintain order when adding new keys:

    select json(jsonb_insert(jsonb('{"a": 1, "b": 2, "c": 3, "d": 4}'), '$.e', 99));
    -- {"a":1,"b":2,"c":3,"d":4,"e":99}

(tested on https://codapi.org/sqlite/)

cryptonector · on Dec 6, 2023

JSON processors are not required nor expected to retain object key ordering from either input or object construction order.

btown · on Dec 6, 2023

The question is whether people think of their database as a processor of data or a storage location for data, exactly as it was provided. Given that a lot of people use SQLite as the latter, it's a worthwhile caveat to make people aware of.

Additionally, Javascript environments (formalized in ES2015), your text editor, and your filesystem all guarantee that they'll preserve object key order when JSON is evaluated. It's not unreasonable that someone would expect this of their database!

seanhunter · on Dec 6, 2023

If you're storing it exactly as provided then surely until you know it's actually valid json you want a string, not a json/jsonb column.

cryptonector · on Dec 6, 2023

> The question is whether people think of their database as a processor of data or a storage location for data

If you use the database's JSON type(s) and/or functions, then yes, your database is a JSON processor.

And, yes, your database is not a dumb store, not if it's an RDBMS. The whole point of it is that it's not a dumb store.

ash · on Dec 6, 2023

> Javascript environments ... guarantee that they'll preserve object key order when JSON is evaluated

Yes, but only for non-integer keys.

gmfawcett · on Dec 7, 2023

Is there such a thing as a non-string JSON key to be evaluated? https://www.json.org/ suggests not.

ash · on Dec 14, 2023

Yes, in the context of JSON handling by JavaScript. I meant string keys that can be interpreted as non-negative numbers. Those are traversed first, the rest of the keys are coming in insertion order. For example:

    > JSON.stringify({'one': 1, '3': 3, 'two': 2, '2': 2, 'three': 3, '1': 1})
    '{"1":1,"2":2,"3":3,"one":1,"two":2,"three":3}'

gmfawcett · on Dec 17, 2023

Thanks for explaining. That behaviour is... interesting!

fl0ki · on Dec 6, 2023

That compounds really badly with another lack of specificity in JSON: parsers can treat duplicate keys in arbitrary ways, e.g. keep only the first or last value, combine values somehow, error out, etc.

When keys are not unique, their order can matter again, and can matter differently to different parsers.

ash · on Dec 6, 2023

Yes, but it's not impossible to achieve it in practice.

For example, JavaScript JSON.stringify and JSON.parse have well-defined behaviour:

> Properties are visited using the same algorithm as Object.keys(), which has a well-defined order and is stable across implementations

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

> The traversal order, as of modern ECMAScript specification, is well-defined and consistent across implementations. Within each component of the prototype chain, all non-negative integer keys (those that can be array indices) will be traversed first in ascending order by value, then other string keys in ascending chronological order of property creation.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

Similarly, Python json module guarantees that "encoders and decoders preserve input and output order by default. Order is only lost if the underlying containers are unordered." Since 3.7 dict maintains insertion order.

https://docs.python.org/3/library/json.html

https://docs.python.org/3/library/stdtypes.html#dict

Yes, JS and Python behaviour are not the same for all cases, however, non-integer keys do maintain order across Python and JavaScript.

cryptonector · on Dec 6, 2023

> Yes, but it's not impossible to achieve it in practice.

For any one implementation. But there's a very large number of implementations. You just can't count on object key order being preserved, so don't.

ash · on Dec 6, 2023

There are situations where keeping order is useful. For example, human-editable json config files.

I also gave an example of two implementations that are compatible for a useful subset of keys. By the way, SQLite JSONB keeps object keys in insertion order, similar to Python: https://news.ycombinator.com/item?id=38547254

cryptonector · on Dec 6, 2023

> There are situations where keeping order is useful. For example, human-editable json config files.

You might have to just normalize every time you want to edit that.

anamexis · on Dec 6, 2023

You can count on object key order being preserved when you control the implementation. Which is why it's useful to know whether a given implementation preserves key order.

kevincox · on Dec 6, 2023

Slight caveat. It seems that you should default to JSONB for all cases except for values directly returned in the query result. As IIUC you will get a JSONB blob back and will be responsible for parsing it on the client.

So use it for writes:

    UPDATE t SET col = jsonb_*(?)

Also use it for filters if applicable (although this seems like a niche use case, I can't actually think of an example).

But if returning values you probably want to use `json_*` functions

    SELECT json_extract(col1, '$.myProp') FROM t

Otherwise you will end up receiving the binary format.

yencabulator · on Dec 7, 2023

Wait is it really true that if you use SQLite's JSONB format, `select my_json_column from foo` becomes unreadable to humans? That seems.. unacceptable. One would expect it to convert its internal format back to JSON for consumption.

blowski · on Dec 6, 2023

Is there any downside to storing JSON-B even if you’re not planning to query it? For example, size on disk, read/write performance?

oefrha · on Dec 6, 2023

There’s processing to be done with JSONB on every read/write, which is wasted if you’re always reading/writing the full blob.

lifthrasiir · on Dec 6, 2023

Which occurs with JSON as well (SQLite doesn't have a dedicated JSON nor JSONB type). The only actual cost would be the conversion between JSONB and JSON.

oefrha · on Dec 6, 2023

No, SQLite's "JSON" is just TEXT, there's no overhead with reading/writing a string.

lifthrasiir · on Dec 6, 2023

That's what I said I think? "JSON" is a TEXT that is handled as a JSON string by `json_*` functions, while "JSONB" is a BLOB that is handled as an internal format by `jsonb_*` functions. You generally don't want JSONB in the application side though, so you do need a conversion for that.

akira2501 · on Dec 6, 2023

Yes, but when you use the BLOB with jsonb functions, your application demands go from:

Read JSON from TEXT column.

Parse JSON into Internal Binary Format.

Run json_*() function on this format, which will Serialize Internal Binary Format to JSON as output.

To:

Read JSONB from BLOB column.

Run json_*() function on Internal Binary Format, which will serialize the Internal Binary Format to JSON as output.

Because:

The json_* and jsonb_* all accept _either_ JSON or JSONB as their input. The difference is jsonb_* functions also produces it as output. So even in the above case, if your function output is just being used to feed back into another table as a BLOB, then you can use the jsonb_* version of the function and skip the serialization step entirely.

lifthrasiir · on Dec 6, 2023

Oh, you are right! I later looked at the draft documentation and realized that `json(b)_` only determines the output type. That said, it is still true that you need one more function call `json(...)` to retrieve a textual form of JSON which you do often need instead of JSONB.

Someone · on Dec 6, 2023

> Run json_*() function on Internal Binary Format, which will serialize the Internal Binary Format to JSON as output

I don’t think that’s always correct (it is within this thread, which stated (way up) “if you’re always reading/writing the full blob”, but I think this discussion is more general by now). Firstly, it need not convert the full jsonb blob (example: jsonb_extract(field,'$.a.b.c.d.e'))

Secondly, it need not convert to text at all, for example when calling a jsonb_* function that returns jsonb.

(Technically, whether any implementation is that smart is an implementation detail, but the article claims this is

“a rewrite of the SQLite JSON functions that, depending on usage patterns, could be several times faster than the original JSON functions”

so it can’t be completely dumb.

oefrha · on Dec 6, 2023

There’s no point using json_* functions if you’re always reading/writing the full blob.

lifthrasiir · on Dec 6, 2023

Of course (and I never said that), but if you need to store a JSON value in the DB and use a JSONB-encoded BLOB as an optimization, you eventually read it back to a textual JSON. It's just like having a UNIX timestamp in your DB and converting back to a parsed date and time for application uses, except that applications may handle a UNIX timestamp directly and can't handle JSONB at all.

nonethewiser · on Dec 6, 2023

It doesnt validate that its valid json?

ElectricalUnion · on Dec 6, 2023

No, it doesn't. By default in sqlite3, you can store whatever you like in whatever field you want.

The "built-in validation mechanism" is invoking json_valid(X) [1] call within the CHECK [2] condition on a column.

[1] https://www.sqlite.org/json1.html#jvalid

[2] also assumes you didn't disable CHECKs with PRAGMA ignore_check_constraints https://www.sqlite.org/pragma.html#pragma_ignore_check_const...

jamesfinlayson · on Dec 6, 2023

If the order of items in the JSON blob matters then JSONB probably wouldn't preserve the order.

mike_d · on Dec 6, 2023

JSON is unordered. Nothing in your code should assume otherwise.

"An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array."

hakunin · on Dec 6, 2023

That’s exactly the kind of difference between json and jsonb that you gotta keep in mind. Object properties are unordered, but a json string is very much ordered. It’s the same sequence of characters and lines each time, unless you parse it and dump it again. So if you want to preserve an unmodified original json string for some (e.g. cosmetic) reasons, you probably want json.

rowanG077 · on Dec 6, 2023

I would expect you are threading on dangerous grounds to assume a type called JSON is going to preserve the data byte for byte. It might currently but I doubt that is in the API contract. You really want to use TEXT if that is your requirement

cztomsik · on Dec 6, 2023

I don't know what SQLite does, but in JS the order is actually defined. JS is not Java, object is not HashMap, if anything, it's closer to LinkedHashMap, but even that is not correct because there are numeric slots which always go first. https://tc39.es/ecma262/#sec-ordinaryownpropertykeys

dspillett · on Dec 6, 2023

> but in JS the order is actually defined

Defined yes, but still arbitrary and may not be consistent between JSON values of the same schema, as per the document you linked to:

> in ascending chronological order of property creation

Also, while JSON came from JS sort-of, it is, for better or worse (better than XML!) a standard apart from JS with its own definitions and used in many other contexts. JSON as specified does not have a prescribed order for properties, so it is not safe to assume one. JS may generally impose a particular order, but other things may not when [re]creating a JSON string from their internal format (JSONB in this case), so by assuming a particular order will be preserved you would be relying on a behaviour that is undefined in the context of JSON.

cztomsik · on Dec 6, 2023

Yes, I know, the point was not to say that it's safe to depend on this universally, but rather why it's safe in JS and why it's not elsewhere -> other languages use hash maps simply because authors were either lazy or unaware of the original behaviour. (which sucks, in my opinion, but nobody can fix it now)

ElectricalUnion · on Dec 6, 2023

> other languages use hash maps simply because authors were either lazy or unaware

Because it is idiomatic in that language, and you often don't need the higher overhead of tracking insertion order.

Before Python 3.7, json.loads used dict (undefined order) and you needed to explicitly need to override the load calls with the kwarg `object_pairs_hook=collections.OrderedDict` to accept ordered dictionaries.

Since Python 3.7 all `dict`s are effectively `collections.OrderedDict` because people now expect this kind of inefficient default behavior everywhere. ¯\_(ツ)_/¯

cdogl · on Dec 6, 2023

The same goes for maps in Go, which now explicitly randomizes map iteration with the range keyword to prevent developers from relying on a particular ordering. Neat trick.

mike_hock · on Dec 6, 2023

That might get them to rely on the randomization, though :)

plq · on Dec 6, 2023

You make it sound like it's one of the laws of physics.

ON part of JSON doesn't know about objects. When serialized, object entries are just an array of key-value pairs with a weird syntax and a well-defined order. That's true for any serialization format actually.

It's the JS part of JSON that imposes non-duplicate keys with undefined order constraint.

You are the engineer, you can decide how you use your tools depending on your use case. Unless eg. you need interop with the rest of the world, it's your JSON, (mis)treat it to your heart's content.

sltkr · on Dec 6, 2023

> It's the JS part of JSON that imposes non-duplicate keys with undefined order constraint.

Actually since ES2015 the iteration order of object properties is fully defined: first integer keys, then string keys in insertion order, finally symbols in insertion order. (Of course, symbols cannot be represented in JSON.)

And duplicate properties in JSON are guaranteed to be treated the same way they are in object literals: a duplicate overwrites the previous value but doesn't change the order of the key.

Concretely that means if you write:

    Object.entries(JSON.parse('{"a":10, "1":20, "b":30, "a":40}'))

This is guaranteed to evaluate to:

    [['1', 20], ['a', 40], ['b', 30]]

(Note that '1' was moved to front, and 'a' comes before 'b' even though the associated value comes from the final entry in the JSON code.)

Python made a similar change in version 3.6 (officially since 3.7), both with regards to insertion order and later values overwriting earlier ones while preserving order. I think the only difference at this point is that Python doesn't move integer-like keys to the front, because unlike JavaScript, Python properly distinguishes between different key types.

iskela · on Dec 6, 2023

All the keys in JSON are strings. I do not understand why string literal containing number would be moved first. Maybe in JS-object in runtimes.

mike_d · on Dec 6, 2023

> You make it sound like it's one of the laws of physics.

The text I quoted is from the RFC. json.org and ECMA-404 both agree. You are welcome to do whatever you want, but then it isn't JSON anymore.

The_Colonel · on Dec 6, 2023

That does not follow.

JSON formatted data remains JSON no matter if you use it incorrectly.

This happens all the time in the real world - applications unknowingly rely on undefined (but generally true) behavior. If you e.g. need to integrate with a legacy application where you're not completely sure how it handles JSON, then it's likely better to use plain text JSON.

You may take the risk as well, but then good luck explaining that those devs 10 years out of the company are responsible for the breakage happening after you've converted JSON to JSONB.

In some cases, insisting on ignoring the key order is too expensive luxury, since it basically forces you to parse the whole document first and only then process it. In case you have huge documents, you have to stream-read and this often implies relying on a particular key order (which isn't a problem since those same huge documents will likely be stream-written with a particular order too).

throwaway290 · on Dec 6, 2023

It is very much still JSON, and your code can very much assume keys are ordered if your JSON tools respect it. ECMA-404 agrees:

> The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

If you work in environments that respect JSON key order (like browser and I think also Python) then unordered behavior of JSONB would be the exception not the rule.

ako · on Dec 6, 2023

Assume you're writing an editor for JSON files. Don't think many users of that editor would be very happy if you change the order of the attributes in their json files, even though technically it's the same...

jchanimal · on Dec 6, 2023

According to the JSON spec it's also legal to round all numbers to zero.

golergka · on Dec 6, 2023

If you’re writing an editor, you will need to separate in progress text that user is editing from the parsed and validated data, and keep both.

jamesfinlayson · on Dec 6, 2023

I don't disagree, but people might still assume it. If you serialise a Map in Java, some Map implementations will maintain insertion order for example.

ifwinterco · on Dec 6, 2023

I think this is why JS and Python chose to make key ordering defined - the most popular implementations maintained insertion order anyway, so it was inevitable people would end up writing code relying on it

ash · on Dec 6, 2023

SQLite JSONB does maintain order: https://news.ycombinator.com/item?id=38547254

cryptonector · on Dec 6, 2023

PG's JSONB is compressible yet also indexed (objects' key/value pairs are sorted on key and serialized array-like; arrays are indexed by integers; therefore binary searching large objects works). The history of PG's JSONB type is very interesting.

kevincox · on Dec 6, 2023

According to the linked announcement the data size is 5-10% smaller. So you are exchanging a small processing cost for smaller storage size. It will depend on your application but in many cases smaller disk reads and writes as well as more efficient cache utilization will make storing as JSONB a better choice even if you never manipulate it in SQL. Although the difference is likely small.

Someone · on Dec 6, 2023

I haven’t delved into it, but you likely lose the ability to round-trip your json strings, just as in postgreSQL’s jsonb type.

  Jsonb_extract(jsonb(foo), '€')

likely

- will remove/change white space in ‘foo’,

- does not guarantee to keep field order

- will fail if a dictionary in the ‘foo’ string has duplicate fields

- will fail if ‘foo’ doesn’t contain a valid json string

chippiewill · on Dec 6, 2023

Not specifically JSONB, but I do recall that with MongoDB's equivalent, BSON, the sizes of the binary equivalent tend to be larger in practice, I would expect JSONB to have a similar trade off.

There'll also be a conversion cost if you ultimately want it back in JSON form.

cowsandmilk · on Dec 6, 2023

From the post:

> JSONB is also slightly smaller than text JSON in most cases (about 5% or 10% smaller) so you might also see a modest reduction in your database size if you use a lot of JSON.

tyingq · on Dec 6, 2023

> Your application will never see the raw JSONB content.

That's not exactly right, as the jsonb_* functions return JSONB if you choose to use them.

cryptonector · on Dec 6, 2023

And you can just read the BLOB column values out of the DB.

tuyiown · on Dec 6, 2023

> If you're just using SQLite to write and read full JSON blobs, the JSON datatype will be the best pick.

I don't know how the driver is written, but this is misleading if sqlite provides an api to read the jsonb data with a in-memory copy, an app can surely benefit from skipping the json string parsing.

nikeee · on Dec 6, 2023

If my application never sees a difference to normal JSON, everything is compatible and all there is are perf improvements, why is there a new set of functions to interact with it (jsonb_*)?

It seems that the JSON type is even able to contain JSONB. So why even use these functions, if the normal ones don't care?

The_Colonel · on Dec 6, 2023

As someone mentioned below, the order of keys is undefined in JSON spec, but applications may rely on it anyway, and thus conversion to JSONB may lead to breakage.

There are some other minor advantages of having exact representation of the original - e.g. hashing, signatures, equality comparison is much simpler on the JSON string (you need a strict key order, which is again undefined by the spec, but happens in the real world anyway).

dtech · on Dec 6, 2023

The difference is the storage format, which is important for a SQL database since you define it in the schema. Also the performance characteristics are different.

0x457 · on Dec 6, 2023

`json_` functions return json, `jsonb_` functions return jsonb. Both take either as input.

If you're modifying something "in-place" then `jsonb_` functions would be better since they avoid conversion.

zmmmmm · on Dec 6, 2023

I don't know if it's true of SQLite, but you missed the most important point at least with PostgreSQL : you can build indexes directly against attributes inside JSONB which really turns it into a true NoSQL / relational hybrid that can let you have your cake and eat it too for some design problems that don't fit neatly into either pure relational or pure NoSQL approaches.

euroderf · on Dec 6, 2023

What would that SQL look like in practice ?

ako · on Dec 6, 2023

Here's an example i have running on a pi with a temperature sensor. All data is received as json messages in mqtt, then stored as is in a postgres table. The view turns it into relational data.

If i just need the messages between certain temperatures i can speed this up by adding an index on the 'temperature' field in json.

    create or replace view bme680_v(ts, message, temperature, humidity, pressure, gas, iaq) as
    SELECT mqtt_raw.created_at                                          AS ts
     , mqtt_raw.message
     , (mqtt_raw.message::jsonb ->> 'temperature'::text)::numeric    AS temperature
     , (mqtt_raw.message::jsonb ->> 'humidity'::text)::numeric       AS humidity
     , (mqtt_raw.message::jsonb ->> 'pressure'::text)::numeric       AS pressure
     , (mqtt_raw.message::jsonb ->> 'gas_resistance'::text)::numeric AS gas
     , (mqtt_raw.message::jsonb ->> 'IAQ'::text)::numeric            AS iaq
    FROM mqtt_raw
    WHERE mqtt_raw.topic::text = 'pi/bme680'::text
    ORDER BY mqtt_raw.created_at DESC;

debugnik · on Dec 6, 2023

You can do this with text JSON in SQLite as well, but JSONB could speed it up.

lake-view · on Dec 6, 2023

Because it seems to not be common knowledge on this thread: JSONB is a format offered by Postgres for a while now, and is recommended over plain JSON primarily for improved read performance.

https://www.postgresql.org/docs/current/datatype-json.html

o11c · on Dec 6, 2023

Postgres's JSONB uses native numbers, which is faster but coerces values. It also removes duplicate keys but you shouldn't use those anyway.

Sqlite's JSONB keeps numbers as strings, which is slower but preserves your weird JSON (since there's no such thing as standard JSON). I'm not sure about duplicate keys.

chrismorgan · on Dec 6, 2023

> since there's no such thing as standard JSON

I-JSON is the most sensible JSON profile I know of: https://datatracker.ietf.org/doc/html/rfc7493. It says: UTF-8 only, prefer not to use numbers beyond IEEE 754-2008 binary64 precision, no duplicate keys, and a couple more things.

lifthrasiir · on Dec 6, 2023

SQLite's approach is faster if you are not frequently extracting numerical values out of JSONB. It also makes much easier to convert JSONB back to JSON. I think SQLite JSONB reserved enough space to define a native number type (among others) if this assumption turned out to be false.

rafaelmn · on Dec 6, 2023

> Postgres's JSONB uses native numbers, which is faster but coerces values.

You would probably be string quoting your numbers in your data/model if this matters to you so sounds like the right call from PG implementation.

M4v3R · on Dec 6, 2023

I don’t know about Sqlite’s implementation but in Postgres JSONB is not 100% transparent to the application. One caveat I’ve encountered while working on an application that stored large JSON objects in Postgres initially as JSONB is that it doesn’t preserve object key order, i.e. the order of keys in an object when you store it will not match the order of keys when you retrieve said object. While for most applications this is not an issue the one I was working on actually did rely on the order of keys (which I am aware is a bad practice but this is how the system was designed) and suddenly we noticed that the app started misbehaving. Changing the column type from JSONB to JSON fixed the problem.

magicalhippo · on Dec 6, 2023

Given that the order of the keys is specified as having no significance in regular JSON[1], this is out-of-spec usage.

If key order has to be preserved then a blob type would be a better fit, then you're guaranteed to get back what you wrote.

For example, SQLite says it stores JSON as regular text but MySQL converts it to an internal representation[2], so if you migrate you might be in trouble.

[1]: https://ecma-international.org/publications-and-standards/st...

[2]: https://dev.mysql.com/doc/refman/8.0/en/json.html

devit · on Dec 6, 2023

You can also add a property to all objects with an array of keys in the order you want (if you can guarantee it won't conflict with existing properties, or can escape those), or turn them into a {k, o} array where k is the key array and o the object or if you don't care about looking up keys then either a {k, v} array where v is a value array or an object where keys are prefixed with their position or putting the position in the value as an array of value and position.

clarkdave · on Dec 6, 2023

Indeed! There is at least one other difference, which is that the presence of null characters will cause an error in `jsonb`, but not `json`:

# select '{ "id": "\u0000" }'::json;

-> { "id": "\u0000" }

# select '{ "id": "\u0000" }'::jsonb;

-> ERROR: unsupported Unicode escape sequence

cryptonector · on Dec 6, 2023

The object key order thing is not a JSON spec compliance issue, but this one is. Either PG should store this escaped or use counted byte strings, but you can see why neither is desirable for a high-performance binary JSON format.

fbdab103 · on Dec 6, 2023

I suppose it is what it is, but if ordering matters, it is only JSON-like in appearance. json.org says:

>An object is an unordered set of name/value pairs.

cryptonector · on Dec 6, 2023

Never ever count on object key ordering, or even on there being no duplicate keys in objects. If the JSON texts you're dealing with can be processed -or even just stored- by other code then you're in for pain if you make any assumptions about object key ordering / object key dups.

Once you accept this then it stops being a problem.

vhiremath4 · on Dec 6, 2023

A warning to anyone who uses JSONB in Postgres. Past a certain size, the engine must go to the TOAST table off disk, even for indexed reads. If you store your JSONB in Postgres, don’t wait for it to grow out of hand or store a bunch of random stuff in it from an API call!

selcuka · on Dec 6, 2023

That JSONB is different from SQLite's JSONB, though.

cryptonector · on Dec 6, 2023

Is it? TFA doesn't say. I suppose we could check the source code.

EDIT: https://sqlite.org/draft/jsonb.html does say that they are different.

breadchris · on Dec 6, 2023

I didnt understand the purposes of document stores until the past couple of years and they are fabulous for building POCs. Enhanced JSON support will help a lot for making sqlite a suitable document store.

I get full type support by serializing and deserializing protobuf messages from a db column and not making this column JSONB means i can filter this column too, instead of having to flatten the searchable data to other columns.

p1necone · on Dec 6, 2023

Yeah as long as you're reading and writing to the database with the same language, and that language has good type safety the benefits of your database schema effectively being defined by the same types as the rest of your code is pretty nice for a lot of use cases.

You just have to be vigilant about correctly migrating existing data to the current shape if you ever make breaking changes to types.

BozeWolf · on Dec 6, 2023

This. Would be nice if there was a framework (in go, or python pydantic) which would help me migrate data made with old structs to new structs. And also deal with the transaction.

For now i use sqlite to deal with transactions and only make backward compatible updates to structs. Brittle, but it is a toy app anyways.

(Normally use django to deal with models and migrations, but wanted to do something different)

buzziebee · on Dec 6, 2023

Yeah migrations are the biggest issue for me. I really don't like not knowing what the actual shape of the document will be. Missing transactions, and not great relationship performance makes modelling some systems more hassle than it's worth.

I gave it a good go to use mongo and firestore for a few projects, but after a year or two of experimenting I'll be sticking to SQL based DBs unless there are super clear and obvious benefits to using a document based model.

piaste · on Dec 6, 2023

There's a gradual approach there, where you start out with a JSONB column, and then as each piece of the data structure stabilizes* you move it out of json fields and into its own columns/tables.

* meaning, when there's enough code that depends on it that changing it would require some planning

jadbox · on Dec 6, 2023

This is the way I build all my apps now. Eventually the jsonb field stores nothing as it all gets moved to defined fields.

rav · on Dec 6, 2023

Alternatively, keep it in the JSON/JSONB column until you need to search/filter/query on it, in which case you pull it out into a column.

piaste · on Dec 6, 2023

Even that may not be immediately necessary. I don't think SQLite has it yet, but Postgres can build partial indexes on JSONB fields.

Though most of the time, in that situation, I would pull it out.

halfcat · on Dec 6, 2023

Most commonly I see people use Alembic to migrate SQLAlchemy and Pydantic models.

But I tend to just use Django. Every time I try piecing together the parts (ex FastAPI, Pydantic, Alembic, etc) I reach a point where I realize I’m recreating a half baked Django, and kick myself for not starting with Django in the first place.

nalgeon · on Dec 6, 2023

You can try JSONB in the pre-release snapshot [1] or live in the playground [2].

[1]: https://sqlite.org/download.html

[2]: https://codapi.org/sqlite

OJFord · on Dec 6, 2023

Anyone know what their release process is like, will this be in v3.45? (Downloads page has latest release as 3.44, and TFA says this is in pre-release snapshot.)

I don't use SQLite directly much, but I'd be keen to use this in Cloudflare's D1 & Fly.io. Having said that though, I'm not sure they publicise sqlite version (or even that it isn't customised) - double-checking now they currently only talk about importing SQLite3 dumps or compatible .sql files, not actually that it is SQLite.

So API changes like this actually break that promise don't they? Even though you wouldn't normally think of an addition (of `jsonb_*` functions) as being a major change (and I know it isn't semver anyway) they are for Cloudflare's promise of being able to import SQLite-compatible dumps/query files, which wouldn't previously have but henceforth might contain those functions.

jadbox · on Dec 6, 2023

I came here to ask the same thing. Btw Cloudflare D1 has a lot of limitations while it has been in a beta. In my tests, almost all D1 read/writes go to a single region. It is highly recommended to try to limit edge function usage of D1 to no more than a single read query or 1read+1write to reduce edge latency.

evanjrowley · on Dec 6, 2023

I'm familiar with MongoDB's BSON, but not JSONB. Here is an article I found that talks about the differences: https://blog.ferretdb.io/pjson-how-to-store-bson-in-jsonb/

masklinn · on Dec 6, 2023

AFAIK jsonb is not a specific format, it’s just a generic term for “a json equivalent binary representation”. The sqlite blurb says jsonb is generally smaller than json, but IIRC from when postgres added jsonb postgres’ is generally slightly larger.

cryptonector · on Dec 6, 2023

PG's JSONB has a specification (well, inside PG, not a standard), and they won't change it except in backwards compatible ways (or, really, at all).

masklinn · on Dec 6, 2023

> PG's JSONB has a specification (well, inside PG, not a standard)

So does sqlite's. An implementation detail, even a necessarily stable one, does not a format make.

kevin_thibedeau · on Dec 6, 2023

Next step is to go full Ouroboros and have embedded SQLite DBs as records.

selcuka · on Dec 6, 2023

> Next step is to go full Ouroboros

That was the previous step. Binary BLOBs have been supported in SQLite for some time.

6510 · on Dec 6, 2023

Think of the children!

bvrmn · on Dec 6, 2023

Despite internal format I see immediate external usage in applications. For example batch insertions in Python. Per row insert call has noticeable overhead. And JSONB could bring performance back with CTE:

    CREATE TABLE data(id, name, age);

    WITH ins AS (
        SELECT c1.value, c2.value, c3.value
        FROM json_each('["some", "uuid", "key"]') c1
        INNER JOIN json_each('["joe", "sam", "phil"]') c2 USING (id)
        INNER JOIN json_each('[10, 20, 30]') c3 USING (id)
    )
    INSERT INTO data (id, name, age)
    SELECT * FROM ins

Each json_each could accept a bind parameter with JSONB BLOB from an app.

matharmin · on Dec 6, 2023

You don't need JSONB for this - doing this with plain JSON is simpler and already faster than individual inserts for most bindings in my experience.

I typically do bulk inserts using a single JSON argument like this:

    WITH ins AS (SELECT e.value ->> 'id', e.value ->> 'name', e.value ->> 'age' FROM json_each(?) e)
    INSERT INTO data (id, name, age)
    SELECT * FROM ins

The same approach can be used for bulk updates and deletes as well.

bvrmn · on Dec 6, 2023

I have quite wide records (over 50 fields) and ->> performs not well with text keys. I did not try it with array indexing though:

    WITH ins AS (
        SELECT value ->> 0, value ->> 1, value ->> 2
        FROM json_each('[["some", "joe", 10], ["uuid", "sam", 20], ["key", "phil", 30]]')
    )
    INSERT INTO data (id, name, value)
    SELECT * FROM ins

radarsat1 · on Dec 6, 2023

I have some cases where sometimes I want to store a vector of floats along with a data item. I usually don't need to match against the column, just store it. I know I could use BLOB but often I just use JSON for this kind of thing so that I don't have to deal with data types and conversion. It's wasteful and imprecise though due to the string conversion. Is JSONB a good middle ground option for these cases?

Edit: Sorry, just saw the comment below by o11c,

> Sqlite's JSONB keeps numbers as strings

which means the answer to my question is basically "no".

sjmiller609 · on Dec 6, 2023

I wish there was some way to compress JSON data across multiple rows, I often have very similar blobs in each row. Or some other way to reduce amount of storage for a lot of similar blobs across many rows, for example 10kB JSON where only one or two keys are different.

yencabulator · on Dec 8, 2023

This does not look at all usable. Your application is exposed to the "internal binary format" that SQLite has chosen:

    select jsonb_extract('{"foo": {"bar": 42}}', '$.foo');

    ┌────────────────────────────────────────────────┐
    │ jsonb_extract('{"foo": {"bar": 42}}', '$.foo') │
    ├────────────────────────────────────────────────┤
    │ |7bar#42                                       │
    └────────────────────────────────────────────────┘

1vuio0pswjnm7 · on Dec 6, 2023

"The central idea behind this JSONB specification is that each element begins with a header that includes the size and type of that element."

Why not add this size indication to JSON specification. Would reduce memory requirements for JSON processing.

1997: https://cr.yp.to/proto/netstrings.txt

NB. This header may be the "central idea" behind JSONB but JSONB has other differences from JSON. This comment refers only to the size indication not the other features.

DonHopkins · on Dec 6, 2023

Would be nice to have an even more compact csv-esque version of jsonb that knew how to store arrays of objects with the same keys in compressible column-major format, omitting the repeated keys (and even value types if possible), like a transposed csv file with a first column of headers + type info. And the ability to embed arrays of object encoded that way into arbitrary jsonb structures, as long as all the objects have the same keys and (simple enough) value types. Or is that what the (different?) Postgres jsonb format does?

filereaper · on Dec 6, 2023

JSONB has performance implications, many of you here are likely familiar with JSONB in Postgres.

I encourage you to view this talk from PGConf NYC 2021 - Understanding of Jsonb Performance by Oleg Bartunov [1]

Looking forward to a similar talk from the SQLite community on JSONB performance in the future.

https://www.youtube.com/watch?v=v_s-DH4PEVA

SigmundA · on Dec 6, 2023

I would prefer relational databases just have a compound hierarchal data type that can contain all the types it supports. Json is so anemic on types.

This way it can efficiently store numbers , dates, uuids or raw binary etc. and should really have some sort of key interning to efficiently store repeating key names.

Then just have functions to convert to/from json if thats what you want

cryptonector · on Dec 6, 2023

> Json is so anemic on types.

So is SQLite3 though.

I do wish that SQLite3 could get a CREATE TYPE command so that one could get a measure of static typing. Under the covers there would still be only the types that SQLite3 has now, so a CREATE TYPE would have to specify which one of those types underlies the user-defined type. I think this should be doable.

chubot · on Dec 6, 2023

Hm I googled and found this draft of the encoding - https://sqlite.org/draft/jsonb.html

It feels like it would be better to use a known binary encoding. I thought the MessagePack data model corresponded pretty much exactly to JSON ?

Edit: someone else mentioned BSON - https://bsonspec.org/

To be honest the wins (in this draft) don't seem that compelling

The advantage of JSONB over ordinary text RFC 8259 JSON is that JSONB is both slightly smaller (by between 5% and 10% in most cases) and can be processed in less than half the number of CPU cycles.

JSON has been optimized to death; it seems like you could get the 2x gain and avoid a new format with normal optimization, or perhaps compile-time options for SIMD JSON techniques

---

And this seems likely to confuse:

The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but they have wildly different internal representations and are not in any way binary compatible.

---

Any time data is serialized, SOMEBODY is going to read it. With something as popular as sqlite, that's true 10x over.

So to me, this seems suboptimal on 2 fronts.

fbdab103 · on Dec 6, 2023

SQLite says you should not attempt to access it: https://sqlite.org/draft/jsonb.html

> JSONB is not intended as an external format to be used by applications. JSONB is designed for internal use by SQLite only. Programmers do not need to understand the JSONB format in order to use it effectively. Applications should access JSONB only through the JSON SQL functions, not by looking at individual bytes of the BLOB.

> However, JSONB is intended to be portable and backwards compatible for all future versions of SQLite. In other words, you should not have to export and reimport your SQLite database files when you upgrade to a newer SQLite version. For that reason, the JSONB format needs to be well-defined.

If SQLite intends to own the format forever, I can believe that their requirements are such that leaning on an existing implementation is not worth the savings to implement.

chubot · on Dec 6, 2023

Users don't read docs, and any visible implementation detail becomes set in stone:

https://www.hyrumslaw.com/

It looks like they're aware of that ... it's probably fine -- not ideal, but fine

If users didn't care about what the blob format was, there wouldn't be JSON support in the first place! You would have started with something like JSONB

cryptonector · on Dec 6, 2023

Also SQLite3 will have to remain backwards compatible with its JSONB forever, which means it's stable, which means there will be users who will feel comfortable using the raw JSONB from SQLite3.

topspin · on Dec 6, 2023

> JSON has been optimized to death; it seems like you could get the 2x gain and avoid a new format with normal optimization

Either I'm experiencing a reading comprehension mishap or this is self contradictory. Where is a "2x gain" supposed to come from through "normal optimization" from after something has already been optimized "to death?"

> SIMD JSON techniques

Which are infeasible in key SQLite use cases.

brazzy · on Dec 6, 2023

> Credit to SQLite developers for adopting an extant binary JSON format, as opposed to inventing yet another one.

The comment you are replying to cites a statement saying explicitly that they are not adopting the Postgres JSONB binary format, only the name and the abstract concept. The API is not compatible with Postgres either.

topspin · on Dec 6, 2023

I noted that and removed that bit prior to your reply.

Points off for a.) inventing yet another binary JSON and/or b.) using the same name as an existing binary JSON.

chubot · on Dec 6, 2023

Those 2 points are exactly what my comment said

chubot · on Dec 6, 2023

I just meant there are a zillion different known ways of optimizing JSON, that maybe could be applied here. But maybe they already optimized it to the degree they're willing and wanted a binary format

topspin · on Dec 6, 2023

SQLite developers aren't in the habit of making poor technical choices. This choice certainly has precedent: Oracle, MySQL, MongoDB, Postgres and others have implemented some form of binary encoding for JSON. Perhaps they know something you don't and deserve the benefit of the doubt.

jomohke · on Dec 6, 2023

Re-using standards is a great idea, and should remain people's default, but I don't see the benefit here.

What's the advantage of re-using a format?

Ecosystem? That won't help SQLite here, who don't have dependencies.

Keep in mind that anyone trying to write a parser for this is also writing a parser for the entire SQLite file format (the only way to access bytes). And it's a spec simple enough to fit on one monitor.

Design?

BSON (and others?) seem to have different goals.

SQLite's format seems to minimise conversion/parsing (eg. it has multiple TEXT types depending on how much escaping is needed; BSON has one fully-parsed UTF-8 string type). BSON is more complex: includes many types not supported by json (dates, regexs, uuids...) and has a bunch of deprecated features already.

SQLite's on disk-format is something they intend to support "forever", and as with the rest of SQLite, they enjoy pragmatic simplicity.

cryptonector · on Dec 6, 2023

> JSON has been optimized to death; [...]

Not the JSON format.

> [..] it seems like you could get the 2x gain and avoid a new format with normal optimization, or perhaps compile-time options for SIMD JSON techniques

SIMD isn't always appropriate, and there are still no SIMD incremental JSON parsers either. There are things that can't easily be done with SIMD and JSON.

lifthrasiir · on Dec 6, 2023

It takes a lot of complex code to optimize a JSON parser. You don't need to optimize that hard to make an equally performant parser for binary serialization formats, and given SQLite's JSONB is purely an internal optimization, it doesn't have to be a well-known interchange format (which has much more concerns than SQLite).

> Any time data is serialized, SOMEBODY is going to read it. With something as popular as sqlite, that's true 10x over.

And this statement is equally true for the SQLite format itself. That doesn't mean that the SQLite format should be replaced with something more standard, of course.

gleenn · on Dec 6, 2023

I thought the point was that the DB could reach into JSONB for you (potentially indexed as well) instead of having to deserialize outside. So the exact serialization isn't maybe as interesting as the direct queryability.

fbdab103 · on Dec 6, 2023

Does SQLite make any promises on the internal implementation? I assumed that the only guarantee was at the SQL interface.

banana_giraffe · on Dec 6, 2023

They do promise it'll be a stable format:

> The JSONB format is not intended as an interchange format. Nevertheless, JSONB is stored in database files which are intended to be readable and writable for many decades into the future. To that end, the JSONB format is well-defined and stable. The separate SQLite JSONB format document provides details of the JSONB format for the curious reader.

And indeed the functions jsonb() and json() will let you convert to and from jsonb.

microtherion · on Dec 7, 2023

CBOR should compress even better than BSON.

mongol · on Dec 6, 2023

Unrelated question: what languages integrates best with sqlite? I am using it with go and cgo, but it is often advised to avoid cgo. Perhaps it doesn't matter so much, I can use it anyways, but would be interesting to hear about other experiences.

ncruces · on Dec 7, 2023

You have options that don't use cgo. Disclaimer, I made the second one.

https://modernc.org/sqlite https://github.com/ncruces/go-sqlite3

mongol · on Dec 7, 2023

That's very cool, never considered a WASM option, thanks!

doublerabbit · on Dec 6, 2023

> what languages integrates best with sqlite?

TCL.

quesera · on Dec 6, 2023

Any language with bindings will be fine.

The C integration API to Sqlite is very simple and small, so a) almost all languages will have bindings, and b) any bindings that exist will be reasonable.

hahn-kev · on Dec 6, 2023

Define best

mongol · on Dec 6, 2023

Little overhead, a "native" integration.

cryptonector · on Dec 6, 2023

I'm surprised that TFA doesn't say that the JSONB it refers to is PostgreSQL's JSONB. I assume it must be because SQLite3 adopts a lot of things from PostgreSQL, so D.R. Hipp and crew must be familiar with PG's JSONB and so they wouldn't create a different JSONB. Plus the PG JSONB is brilliant, so it would be wise to copy it or at least take inspiration from it.

There's.. no need to interop with PG's JSONB encoding, I think. If so then SQLite3's could differ from PG's.

Anyways, it'd be nice if TFA was clearer on this point.

EDIT: https://sqlite.org/draft/jsonb.html does refer to PG's JSONB, and says that they are different.

euroderf · on Dec 6, 2023

So what will JSONB look like in a standalone DB browser (DBeaver, etc.) ?

nikbackm · on Dec 6, 2023

It's just a BLOB.

euroderf · on Dec 7, 2023

I guess I'm thinking aloud about which (and when) DB browsers will auto-decode JSONB into JSON.

lovasoa · on Dec 6, 2023

As the maintainer of sql.js and SQLPage, I am very excited about this:

- json is used a lot by sql.js users to interact with JavaScript.

- to generate more sophisticated web pages in SQLPage, json is crucial

I can't wait for the next release !

_lnoc · on Dec 6, 2023

Interested to know what the Deno JavaScript/TypeScript project thinks of this addition given it has already has a key value store backed by the database and JSON is JavaScript friendly.

laurencerowe · on Dec 6, 2023

While this will be good for SQLite queries that look into JSON data I’m not sure it really changes much for Deno. V8’s JSON parser is very highly optimised so it may be tricky to make a faster SQLite JSONB parser.

Animats · on Dec 6, 2023

Is this just a data type, or did SQLite put in a JSON interpreter?

lifthrasiir · on Dec 6, 2023

What do you mean by a JSON interpreter? I think SQLite already has a full implementation of JSON including tree walkers, and this internal format (that is externally just another BLOB) makes them more efficient.

Animats · on Dec 6, 2023

Oh, sorry, I was thinking that someone had put a Javascript interpreter in the database itself. Fortunately, no.

mariusor · on Dec 6, 2023

Despite the name, the JSON format doesn't really have much to do with JavaScript.

sureglymop · on Dec 6, 2023

Pretty sure SQLite comes with a JSON parser. JSON is a very simple format for which it is easy to write a parser for...

HDThoreaun · on Dec 6, 2023

You mean a JSON parser? I don't think you can execute anything with json.

justinclift · on Dec 6, 2023

SQLite has had pretty decent support for JSON for a while now:

https://sqlite.org/json1.html

cryptonector · on Dec 6, 2023

It's a BLOB with a binary representation of JSON.

nojs · on Dec 6, 2023

What are the use cases for storing and manipulating json at the db level like this - why not use a relational schema and query it in the normal way?

dystroy · on Dec 6, 2023

Imagine you need to attach to some records a bag of data which you always fully need or not. This bag may have a tree structure, which is expensive to deal with in a relational model. An example would be the JSON configuration of a component/client/whatever. In such a case, trying to fit this bag of data in a relational model would be inefficient.

Those bags of data are usually called "documents". And a lot of systems need a way to store them along relational data.

If you're just storing and retrieving those documents, without any query, you don't need JSONB, a simple blob is enough.

If you sometimes need to do some queries, especially free queries (you want all component whose configuration has some property), then JSONB is suitable as it lets you do the filtering in the database.

nojs · on Dec 6, 2023

> If you sometimes need to do some queries, especially free queries (you want all component whose configuration has some property), then JSONB is suitable as it lets you do the filtering in the database.

This feels like a slippery slope into a denormalised mess though. Before you know it your whole client record is a document and you’re using Postgres as a NoSQL database

jsmith45 · on Dec 6, 2023

The PostgreSQL's devs are fine with that.

Such use cases may not be quite as optimized as relational usecases, but they should be possible. If you are doing that then perhaps a different database would be nicer or faster for that scenario, but again not PostgreSQL's concern.

If the PostgreSQL devs were relational purists they would never have added special support for querying JSON.

An application may be better served with a properly normalized schema, or it might not. That is a choice for the application developers to make.

In practice competent developers should quickly realize if they went too nosql when a relational approach would provide benefits (Like if multiple "documents" need to reference consistent shared data, or if json query performance is not good enough) and normalize as needed to get those advantages.

So long as the application does not contain random sql queries scattered everywhere (and doesnt treat its database as a sort of API for external access) then database refactoring is not impossible. Indeed the difficulty is often overestimated. It is seldom fun work, and tends to be a bit of a slog, and require more extensive testing before pushing to prod, but that happens.

spappal · on Dec 6, 2023

Align with the rest of the system, when sqlite is used as a component in a system which mainly speaks json.

Performance, when records are complex and usually not accessed.

Ease, avoiding/postponing table design decisions.

Flexibility, for one of many corner-cases (since sqlite is a very broad tool).

Incremental enhancement, e.g. when starting with sqlite as replacement to an ndjson-file and incrementally taking advantage of transactions and indexes on fields [1,2,3].

For example, several of these could apply when doing structured logging to a sqlite database.

    [1]: https://www.sqlite.org/expridx.html
    [2]: https://www.sqlite.org/gencol.html
    [3]: https://antonz.org/json-virtual-columns/
    See also: https://www.sqlite.org/json1.html

mkl · on Dec 6, 2023

Clickable:

[1]: https://www.sqlite.org/expridx.html

[2]: https://www.sqlite.org/gencol.html

[3]: https://antonz.org/json-virtual-columns/

See also: https://www.sqlite.org/json1.html

When you format links as code, they don't become links.

ggregoire · on Dec 6, 2023

There are good use cases, but from my personal experience, I've seen people use JSONB mainly to avoid to have too many columns when they inspect their tables, so they split their data into two groups, the ones supposedly important that have dedicated columns and the rest that is thrown into a single JSONB column.

ckok · on Dec 6, 2023

Does anyone know if SQLite has a public roadmap? This combined with a gin or gin like index would be a killer combination

ape4 · on Dec 6, 2023

Does Sqlite support Sqlite database format in tables

gymbeaux · on Dec 6, 2023

What’s the difference between JSONB and BSON?

mgaunard · on Dec 6, 2023

why not decode it as a struct or list of whatever are the native types in the database for arbitrarily nested objects?

cdcarter · on Dec 6, 2023

sqlite doesn't have any native type for arbitrarily nested objects.

mgaunard · on Dec 6, 2023

a good reason to move to DuckDB I guess?

hoerzu · on Dec 6, 2023

Is there any best practices around jsonb?

p-e-w · on Dec 6, 2023

There's all the rest of open source, and then there's SQLite. A public domain software that doesn't accept contributions from outsiders, and that happens to run much of the world.

And it just keeps getting better and better and better, and faster and faster and faster.

I don't know how these guys manage to succeed where almost all other projects fail, but I hope they keep going.

LVB · on Dec 6, 2023

The patterns I've noticed, watching it evolve for nearly 20 years are:

- They draft behind others, and I mean this in a good way. They seem to eschew trailblazing new features, but keep a close eye on alternatives (esp. Postgres) and how use cases are emerging. When a sufficiently interesting concept has stabilized, they bring it to SQLite.

- Closely related to this, they seem to stay clear of technical and community distractions. They have their plans and execute.

- I don't know if D. Richard Hipp is considered a BDFL, but that's my impression and he seems good at it.

vsnf · on Dec 6, 2023

When put this way, it reminds me how Apple generally chooses to add features to the iPhone. They wait for use cases to be proven, usually by Samsung, and then add a very polished version to the phone.

6510 · on Dec 6, 2023

Arguably the entire app eco system exists only to farm features. This is arguably the case whenever users are allowed to create things.

Go get a pro subscription with each of your competitors :)

jiehong · on Dec 6, 2023

And Java is similar nowadays

whalesalad · on Dec 6, 2023

Sometimes you gotta keep the riff raff out in order to march forwards with your vision. Too many cooks in the kitchen can be deadly.

kccqzy · on Dec 6, 2023

It's perhaps the best example of the cathedral model of open source.

zertrin · on Dec 6, 2023

Yeah, learned about it while reading the documentation of Fossil (from the same SQLite people). Their approach certainly has its own merits (and drawbacks).

Just wondering how they will transition once the original few people at the top of the hierarchy need to retire, eventually it will happen.

I guess they need to find younger trusted committers with the same dedication and spirit. That's not necessarily easy. But for a piece of software as important as SQLite, I have a feeling they will find those.

ykonstant · on Dec 6, 2023

>Just wondering how they will transition once the original few people at the top of the hierarchy need to retire, eventually it will happen.

I had always viewed this as a "future worry", and then Bram Moolenaar passed away :(

dspillett · on Dec 6, 2023

The problem with “future worry”s is that the Universe doesn't always agree with us on when that future will be, and it often happens to be closer than expected, even more often closer than desired.