PostgreSQL: Jsonb has committed

SixSigma · on March 24, 2014

If, like me a moment ago, you have no idea what jsonb is; see here for a full explanation http://www.postgresql.org/message-id/E1WRpmB-0002et-MT@gemul...

tl;dr storing json in a way that doesn't mean repeatedly parsing it to make updates

atesti · on March 24, 2014

Thanks for that link: It explains that insignificant whitespace, duplicate keys and key order are not preserved.

Does someone have a link to the storage format? I'm always curious and want to learn efficient encodings. Thanks!

andrewvc · on March 24, 2014

It's sad, but many developers actually count on JSON key ordering. Why, I don't know, but they do. Why people like to code to the implementation, not the spec, I do not know.

wiredfool · on March 24, 2014

I've seen cases where the actual api is XML based, and there's a simple mechanical translation from JSON to XML.

Standards People.

rspeer · on March 25, 2014

A prominent example is Solr's JSON update endpoint [1], which is just the XML update endpoint in disguise.

It expects input of the form:

    {
       "add": {document 1 goes here},
       "add": {document 2 goes here},
       ...
       "commit": {}
    }

And of course all the "add" values are different and the "commit" has to come at the end.

[1] https://wiki.apache.org/solr/UpdateJSON

ketralnis · on March 25, 2014

I've dealt with one of these. EAN, the Expedia Affiliate Network[0], has an API with what they call an "XML" mode and a "REST" mode, where "REST" means JSON. The "REST" mode is very clearly translated directly from the XML mode. I'm simplifying and doing this from memory so bear with me, but here is how translation works:

    <Hotel id="1">
      <Amenity>Bacon</Amenity>
    </Hotel>

turns into

    {'Hotel': {'@id': '1',
               'Amenity': 'Bacon'}}

Straight forward enough. But here's what happens when you have two <Amenity> elements:

    <Hotel id="1">
      <Amenity>Bacon</Amenity>
      <Amenity>Chocolate</Amenity>
    </Hotel>

Now turns into:

    {'Hotel': {'@id': '1',
               'AmenitysList': {'@size': 2,
                                'Amenitys':
                                    [{'Amenity': Bacon'}
                                     {'Amenity': Bacon'}]}}}

The translation engine appears to have no knowledge of the schema, it just adds ___List and ___s entries. So the schema of the JSON is different based on the presence of a repeated element

Also since it doesn't have any knowledge of the schema, all elements are textual except the special @size element.

Because @size is a special element for these lists (which of course you have no need for in JSON), but the engine turns XML attributes into @____ as well, there is no way to get at the actual "size" attribute if one exists.

[0] http://developer.ean.com/docs/hotel-list/

pif · on March 25, 2014

Because in software development industry "Because it works!" is considered an acceptable answer, that's why.

SixSigma · on March 25, 2014

"Because it works for me" is also prevelent.

Someone · on March 24, 2014

http://www.postgresql.org/message-id/E1WRpmB-0002et-MT@gemul... links to the git commit at http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitd.... The obviously interesting file is src/include/utils/jsonb.h (a new header file named jsonb).

It has a reasonably decent (highly decent for a header file) description.

One thing I had to take a guess at is that, in JEntry, the header is a combination of a bit mask and an offset. I also would guess that offsets for identical keys (can happen in nested json), and maybe offsets for other identical data will be identical. I didn't check either.

r4um · on March 25, 2014

Nice coverage http://pgeoghegan.blogspot.in/2014/03/what-i-think-of-jsonb....

fvt · on March 24, 2014

PostgreSQL was already able to compete with Oracle's RDBMS and Microsoft's SQL Server but could soon supplant Mongo for most jobs.

It's great to know that the only required storage components nowadays could be PG and ElasticSearch (as PG's full-text search can't compete with ES), and that the former is a no-brainer to setup (on top of AWS, Rackspace, etc.) or cheap to acquire (with Heroku Postgres for example).

Good job !

arethuza · on March 24, 2014

Out of interest, what would you say the weaknesses are of the full-text search in PostgreSQL?

NB I've been using PostgreSQL for a few months on a side project and I've been hugely impressed. I wanted to add full text searching at some point and rather than using Lucene or Solr (or similar) I thought I would use PostgreSQL's own search capabilities - which certainly makes some things a lot simpler than using a separate search engine.

pilif · on March 24, 2014

I can't speak for the original poster, but I have three issues with the postgres offering:

1) it has suboptimal support for handling compound words (like finding the "wurst" in "bratwurst"). If the body you're searching is in a language that uses compounds (like german), then you have to use ispell dictionaries which have rudimentary support for compounds and which aren't maintained any more in many cases because ispell has been more or less replaced by hunspell which has far superior compound support which in turn is not supported by postgres.

2) If you use a dictionary for FTS (which you have to if you need to support compounds), the dictionary has to be loaded once per connection. Loading a 20MB dictionary takes about 0.5 seconds, so if you use Postgres FTS, you practically have to use persistent connections or some kind of proxy (like pgbouncer). Not a huge issue, but more infrastructure to keep in mind.

3) It's really hard to do google-suggest like query suggestions. In the end I had to resort to a bad hack in my case.

Nothing unsolvable, but not-quite-elastic search either.

karavelov · on March 24, 2014

1) I do not have experience with languages with heavy use of compound words so no suggestion from me.

2) There is a shared dict extension that load the dict only once. See: http://pgxn.org/dist/shared_ispell/

3) About the suggestions: you should look at pg_trgm contrib extension.

pilif · on March 24, 2014

Thank you very much for 2) - this will allow me to considerably lessen the amount of infrastructure I'm having to take care of (pgbouncer can go now).

About 3: That's what I was using before moving to tsearch, but pg_trgm based suggestions sometimes were really senseless and confusing to users. Also, using pg_trgm for suggestion will cause suggestions to be shown that then don't return results when I later run the real search using tsearch.

I'm happy with my current hack though, so I just wanted to give a heads-up.

arethuza · on March 24, 2014

Thanks - I'd be delighted to get to the point where I have those kinds of problems!

What I will do is bear those limitations in mind and if I ever do have those problems there is, I guess, a reasonable chance that postgres development will have addressed them by them or I will redesign things to use a separate search engine.

JohnBooty · on March 24, 2014

This was a really informative post; thanks!

  If you use a dictionary for FTS (which you have to if 
  you need to support compounds), the dictionary has to 
  be loaded once per connection. Loading a 20MB dictionary
  takes about 0.5

Does this happen for every connection, or just those that use FTS?

I'm thinking of a case where only maybe ~0.5% of queries will use FTS. I'm working on discussion software, and I'd like the discussions to be searchable, but realistically only a small percentage of queries will actually involve full text searching.

pilif · on March 24, 2014

Only happens for connections that use FTS and only if you have configured it to use a dictionary (I believe there's no need to in case of an english body). But keep in mind that this means that you'll have to pay at least 0.5 seconds per search. If you do some google suggest style incremental searching on key press, you'll need to be faster than that.

So either use a persistent connection for full text searching (I'm connecting via pgbouncer for FTS-using connections) or, as karavelov recommended below, use http://pgxn.org/dist/shared_ispell/ which will load the dictionary once into shared memory (much less infrastructure needed for this one)

JohnBooty · on March 24, 2014

Thanks so much for the tip, you saved me some hours for sure!

Totoradio · on March 24, 2014

About your first point, you should look at GiST and GIN indexes, they are made for full text searches: http://www.postgresql.org/docs/current/static/textsearch-ind...

pilif · on March 24, 2014

This isn't a problem of the index type (I'm using GIN in this case), but of the tsearch lexizer which only has limited support for hunspell.

The step that causes the trouble is ts_lexize(), not storing and consequently looking it up.

Again, I made it work for my case, but it was some hassle and involved running a home-grown script over an ispell dictionary I've created by converting a hunspell one into an ispell one.

aragot · on March 24, 2014

1) Subwords

Why not using: MY_COLUMN ~ '.wurst\y.' ? Here is the doc for LIKE, SIMILAR and regexes: http://www.postgresql.org/docs/9.2/static/functions-matching...

pilif · on March 24, 2014

Because using a real full text solution goes much farther than using regexes and LIKE. It also allows the use of indexes which many regexes and many LIKE patterns would not allow.

For example, I can now find the "wurst" in "Weisswürste" which, yes, I could do with a regex, but I can also find the "haus" in "Krankenhäuser" and all other special cases in the language I'm working with without having to write special regexes for every special term I might come up with.

rch · on March 24, 2014

Exactly. The only reason I have to bother with Solr at the moment is to get efficient ngram indexing for sequence data; which consists of lists of 7-8 character strings from various sources. What I have works, but feels like overkill for my case.

Intermernet · on March 24, 2014

Thankyou.

That's an incredibly helpful answer, which provides solutions as well as pertinent info.

petergeoghegan · on March 24, 2014

Full text search is expected to benefit considerably from work in Postgres 9.4: https://www.postgresql.eu/events/sessions/pgconfeu2013/sessi...

These advances within the GIN inverted index infrastructure will also greatly benefit jsonb, since it has two GIN operator classes (this is more or less the compelling way to query jsonb).

elchief · on March 24, 2014

Some differences between Solr and PostgreSQL Text Search:

1. Solr doesn't handle multi-word synonyms (without a hack), PG does. (ex: "Northern Ireland" => "UK")

2. Solr uses TF-IDF out of the box, and PG doesn't.

3. PG is good enough for 90% of cases, but Solr has some advanced stuff that PG doesn't. Like integration with OpenNLP, things like the WordDelimiterFilter. (Andre3000 = Andre 3000)

4. PG is kinda annoying in that it will parse "B.C." as a hostname, even though I want it to be a province.

5. Solr is faster than PG, but PG has everything in one server.

6. Solr handles character-grams and word-grams better.

jeltz · on March 24, 2014

> 5. Solr is faster than PG, but PG has everything in one server.

With the GIN optimizations in 9.4 this need to re-evaluated. Solr will probably still be faster but maybe not enough for it to matter.

bjourne · on March 24, 2014

Another weakness not mentioned is document similarity search. You have one document and you want to find the N most similar in your collection to offer suggestions like Amazon does. You can do it with fts in pg, but it is way to slow to be practical. Custom fts engines are much better at that task.

netghost · on March 24, 2014

If search is your application's main purpose, you'll want something more flexible.

Otherwise, it's fairly easy to implement, and you get full SQL support so joins, transactions, etc. So you can always prototype it and see what limitations you run into.

If you're just getting started with Postrges, I have some examples of using its' FTS search here: http://monkeyandcrow.com/blog/postgres_railsconf2013/

It's targeted at Rails, but I always show the SQL first, so you should be able to adapt it.

m_mueller · on March 24, 2014

One thing that I don't see solved with PG + ElasticSearch is replication of data subsets to mobile devices. How would you solve that with this stack?

anentropic · on March 24, 2014

Application code? Send the data down as JSON?

m_mueller · on March 24, 2014

I mean for offline usage. Just using JSON basically means you ll have to reimplement all the database behavior for the offline client. This is where TouchDB comes in handy, which is the reason I'm using CouchDB - you can basically run most the server on the mobile device.

LunaSea · on March 24, 2014

I still think that performance aside, MongoDB is more interesting for developers using Node.js because the development time is shorter and the API more intuitive.

cwp · on March 24, 2014

It depends on what your data looks like. If you've got a collection of JSON documents that gets served verbatim through a JSON API, Mongo is a natural fit. If you've got a domain model with different types of entities and links between them, Mongo is aweful.

The app I'm working on right now evolved from the former to the latter, prompting me to switch from Mongo to Postgres, and it's made the code base much, much simpler. Mongo gets really painful when you have to fetch several documents (serially, because you have to follow the links between them) and join them at the application level before rendering the output.

For that kind of application, SQL with joins and sub-selects is so much better.

AlisdairO · on March 24, 2014

The nice thing here is that it's easy enough to write a mongo compatability layer on top of Postgres (hell, it probably already exists...), allowing you the just-get-to-it ease of mongo combined with the substantially superior engineering of Postgres.

xdanger · on March 24, 2014

https://github.com/JerrySievert/mongolike https://github.com/olegp/pg-mongo/ and my favorite: http://pgre.st/

pilif · on March 24, 2014

For those interested, the online documentation for the development version of PostgreSQL has been rebuilt and contains documentation about the new feature:

http://www.postgresql.org/docs/devel/static/datatype-json.ht...

nailer · on March 24, 2014

Pardon if this is a little ignorant - I haven't done a lot of SQL before, so http://www.postgresql.org/docs/9.3/static/functions-json.htm... isn't that obvious to me - but:

Does this mean I can do Mongo-style queries, retrieving a set of documents which match particular key: value criteria, using PostgreSQL?

arethuza · on March 24, 2014

Yes, you can do extract values from JSON, filter based on those values and even create indices - not sure if that counts as "Mongo-style queries" but I've used it and it works pretty well:

http://clarkdave.net/2013/06/what-can-you-do-with-postgresql...

The only thing to note about current JSON support in PostgreSQL is that, as far as I know, there is no way to update parts of a JSON document - you have to rewrite the entire document each time. I've not found this to be a big headache YMMV.

magnush · on March 24, 2014

One important difference is that you don't need to extract the values to create an index. E.g. in Mongo I believe you still have to decide which keys to index. A GIN index on top of jsonb in PostgreSQL will make it possible to do an indexed search on any key, not just those that are specified at create index time.

marcosdumay · on March 24, 2014

> The only thing to note about current JSON support in PostgreSQL is that, as far as I know, there is no way to update parts of a JSON document

That's what jsonb fixes.

magnush · on March 24, 2014

No, jsonb will still update the whole value when you edit it. For now at least.

roller · on March 24, 2014

Isn't this the case for postgresql data in general? ie, on updates, new data rows are written containing all the field values, whether or not they have changed.

arethuza · on March 24, 2014

It's more they case that when you want to reach into an existing stored document and change a single value you have to re-write the entire document from your application.

On the reading side there are functions and operators that allow you to reach into stored JSON and extract the parts you want. What would be nice would be to be able to do something similar for updates - although this is clearly more complex than reading, so I can see why it has been done this way.

Edit: I guess the most general solution would be to directly support something like JSON Patch:

https://tools.ietf.org/html/rfc6902

joevandyk · on March 24, 2014

Yes, but you don't have to specify all the other columns that aren't being updated.

With json/jsonb, you have to provide the entire object graph each time you are updating it. You can't update one field in the graph. Which could be a pain if you have concurrent updates.

Hopefully we'll be able to update parts of the jsonb columns sometime.

ants_a · on March 25, 2014

You can do it right now, you just need to write your own updating function. With PL/Python or PL/V8 it would be quite simple.

Tobu · on March 25, 2014

You're right, because MVCC works on immutable rows. The structure can be broken up to accommodate large values[1], which saves updating parts that don't change, but json data is a single column and updates need to copy the entire document.

[1] http://www.postgresql.org/docs/devel/static/storage-toast.ht...

jeltz · on March 24, 2014

Didn't the hstore2 patch add this for hstore? Wouldn't it be easy to take the same function and make ti work for jsonb?

plq · on March 25, 2014

There were lengthy discussions about this in the [postgres] hackers mailing list and it turned out that, no, it wasn't.

rpedela · on March 24, 2014

Partial updates of a single JSON document are not supported yet.

ants_a · on March 25, 2014

And will not be supported for the foreseeable future due to the fact that PostgreSQL does row level multi version concurrency control.

rpedela · on March 25, 2014

I proposed supporting JSON Patch which may or may not be implemented, but it does provide a way forward.

http://www.postgresql.org/message-id/CACu89FSYikxdUj+J01BoAv...

ants_a · on March 25, 2014

I should have been more clear. What I meant was that in-place partial updates are not going to happen any time soon. This is mostly a performance issue, updating a value inside a JSON document will never be lightning fast because you need to copy the whole row and clean up the old version after the transaction has committed. JSON patch would avoid transfering the JSON document back and forth over the network, but the local copy will still need to happen.

rpedela · on March 25, 2014

True. I think reducing network overhead, removing one SQL call (currently need SELECT + UPDATE), and simplifying application code is a good start and probably fast enough for most use cases.

arethuza · on March 24, 2014

Do you have a reference for that - I had a quick look through the documentation for this new feature and it looks mainly like a better way of storing and accessing JSON data - not sure it helps with updates.

[NB Not being snarky... genuinely curious about this!]

e1ven · on March 24, 2014

One project I've used in the past is https://github.com/JerrySievert/mongolike

It provides a Mongo-like API directly on Postgres, using normal tables/columns. I've used it in some projects in the past on some projects and it's fairly staightforward, and makes it easy to port an app from Mongo->Postgres.

olegp · on March 24, 2014

Yes! In fact, you can have a Mongo interface to Postgres. I've written just that for Node.js: https://github.com/olegp/pg-mongo/

xdanger · on March 24, 2014

here is another mongo like postgresql addon: http://pgre.st/ it has quite nice other features also.

skrause · on March 24, 2014

> Jsonb has several missing useful features (from nested hstore), but that could be added later as an extension.

Which are those missing useful features?

craigkerstiens · on March 24, 2014

All of the equivalent operators don't exist, though those are only slightly interesting. Perhaps most interesting would be all the index types, right now currently GIN works, but GiST didn't quite make it. GIN was definitely showing the better performance on JSONB but there are still valuable cases for GiST which often appears smaller on disk.

magnush · on March 24, 2014

Another important part about gist is that GIN has much higher overhead for tables that are frequently updated.

saltvedt · on March 24, 2014

Slightly off-topic, does anyone know if there is progress being made on the proposed HTTP API for PostgreSQL?(http://wiki.postgresql.org/wiki/HTTP_API)

twic · on March 24, 2014

What does this binary format actually look like on disk? Is this documented anywhere other than in the code that reads and writes it?

I do remember a description of it in some slides a few months back, but i can't find them now.

rch · on March 24, 2014

Maybe this is what you are looking for:

http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2...

How the ideas migrated from hstore to jsonb is briefly described here:

http://obartunov.livejournal.com/177247.html

http://www.postgresql.org/message-id/53304439.8000602@dunsla...

twic · on March 24, 2014

Yes, that's it, thanks!

I suppose the definitive description of the format is the function JsonbToCString, which is what writes the in-memory structures out as bytes:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=...

illuminated · on March 24, 2014

I only hope it won't be too long before various "drivers" and frameworks start supporting it, otherwise the acceptance rate will be too low. And Jsonb deserves to be accepted as fast as possible.

masklinn · on March 24, 2014

I'd suggest opening bugs on drivers (that's the most important one) right now since the feature is officially in the development version. If it doesn't already work that is.

That's how it was done for 9.3's expanded diagnostics[0], and as a result both Python's psygopg2[1] and Ruby's ruby-pg[2] (and maybe others) added extended diagnostics support before 9.3 was even released[3], and working from day 1.

[0] http://www.depesz.com/2013/03/07/waiting-for-9-3-provide-dat...

[1] http://psycopg.lighthouseapp.com/projects/62710/tickets/149

[2] https://bitbucket.org/ged/ruby-pg/issue/161/add-support-for-...

[3] http://www.postgresql.org/about/news/1481/

guptaneil · on March 24, 2014

Considering Postgres is the first relational database to support jsonb, portability (which is the main benefit of drivers) doesn't really apply. Any decent driver or framework should let you write native psql commands, so you should be able to start using this as soon as it's released.

illuminated · on March 24, 2014

Being able to use Jsonb through a framework and fully utilize it (through it) are not the same things. I will be definitely use it soon, but unless it's fully supported in frameworks (...) it won't be widely accepted.

lmb · on March 24, 2014

Seems like there is room for a PostgreSQL-only ORM that specifically makes all these nice features available. Anyone heard of such a thing?

masklinn · on March 24, 2014

Most "frameworks"/ORM can provide database-specific features, and it's especially easy for field types. e.g. http://docs.sqlalchemy.org/en/rel_0_9/dialects/postgresql.ht... (data types local to SQL Alchemy's Postgres dialect) or https://github.com/djangonauts/django-hstore (hstore in Django)

lucian1900 · on March 24, 2014

There isn't really a need, though. SQLAlchemy exposes all of SQL, most unique things dialects offer and it makes it very easy to add types too.

blakn · on March 24, 2014

Not exactly a PostgreSQL-only ORM

The excellent slick-pg library which extends the FRM library slick. https://github.com/tminglei/slick-pg

zapov · on March 24, 2014

Not really an ORM but highly integrated with Postgres: https://github.com/ngs-doo/revenj

jeltz · on March 26, 2014

There are ORMs like SQLAlchemy (Python) and Sequel (Ruby) which try to expose as much as possible of the features of all supported databases. It is common for Sequel to add support for new PostgreSQL features while the new PostgreSQL version is still in beta testing.

MartinMond · on March 24, 2014

Rails already has support for Postgres' json type, I don't doubt that they'll add jsonb support asap.

jherdman · on March 24, 2014

What's involved in supporting JSONB in Rails? Given that you probably want access to the entire JSON blob at the Rails level, aren't you good to go already?

njharman · on March 24, 2014

Abstraction layers (such as drivers) have significant utility beyond portability.

acjohnson55 · on March 24, 2014

Sure, but it's also nice to have a layer of abstraction over psql. I personally really enjoy working with Django's ORM, for example, over writing native queries.

andybak · on March 24, 2014

Two new features in 1.7 might make it easy to extend the ORM yourself:

https://docs.djangoproject.com/en/dev/ref/models/custom-look...

sfjailbird · on March 24, 2014

TeamPostgreSQL, the PostgreSQL web interface, has a JSON builder/editor built in which makes browsing JSON data very nice.

lamby · on March 24, 2014

Mm, I agree. As an example, historically, Django has been quite weak at this due to its Field.get_internal_type() system.

luikore · on March 24, 2014

What's the difference between jsonb and bson? They look so similar to each other...

MartinMond · on March 24, 2014

BSON actually was considered as the JSON storage format for PostgreSQL, but was discarded once people figured out that BSON stores ["a", "b", "c"] as {0: "a", 1: "b", 2: "c"} which is just silly.

cies · on March 24, 2014

wow.. thanks for that info, i came here to see the diference and and a little shocked by the silliness of BSON (and the team that created it, apparently Mongo); compared quality devotion of the postgres community..

well done postgres!

pilif · on March 24, 2014

jsonb is the name of a data type used in postgres in order to store, index and update JSON. bson is the same used by MongoDB and also made a semi-official standard.

andybak · on March 24, 2014

Can anyone with some familiarity with PostgreSQL's release schedule comment on how soon this will be in an official release?

pilif · on March 24, 2014

This will go into 9.4 (or maybe they'll call it 10.0) which will be released September-ish barring no bigger issues.

jeltz · on March 24, 2014

I think they will reserve 10.0 for when logical replication is done.

craigkerstiens · on March 24, 2014

This is planned for 9.4 which if history shows is likely late August, but could be anywhere from early August to as late as October.

davyjones · on March 24, 2014

I wouldn't be too quick to swap it out yet. Especially if you are using replication. I usually wait for a few minor releases before upgrading.

optimiz3 · on March 24, 2014

Can anyone comment on the storage overhead of JSONB key names? One of the worst things about MongoDB's BSON format is that 8M instances of the following record:

{ "user": "test", "isValid": true }

will at a minimum cost 8M * (4 + 7) simply due to field name overhead. MongoDB isn't smart enough to alias common field names, and this has remained as one of its biggest problems.

https://jira.mongodb.org/browse/SERVER-863

subleq · on March 24, 2014

If only there were a way to store data where each record has the same structure...

The JSON support in postgres allows a hybrid approach. Put the common data in a tabular format, with a JSON column that stores all the extra, variable data.

optimiz3 · on March 24, 2014

Internal pseudo-schematized data storage and JSON support aren't mutually-exclusive. Nice to have a checkbox feature like this but the implementation isn't particularly exciting for large datasets.

magnush · on March 24, 2014

PostgreSQL jsonb has the same problem in the actual document storage, with repeated keys. I believe it does better in the index, when you use the correct operator, but the table storage has the same type of overhead.

rpedela · on March 24, 2014

This email talks about how the indexes are 50% the size of the data.

http://www.postgresql.org/message-id/5323D45F.1080503@agliod...

Tobu · on March 24, 2014

Document stores trade off some performance and robustness to increase ease of development. When that becomes a problem you can switch to more structured tables. One possibility is to structure frequently accessed fields and stuff the remainder into a json column.

optimiz3 · on March 24, 2014

This just forces complexity onto the caller - it's not unreasonable to expect the DB to "pseudo-schematize" as most keys will be repeated.

Tobu · on March 25, 2014

With a power law, only a few keys occur frequently. Changing the data representation implicitly would cause unpredictable performance as columns get rewritten. If you want to do more explicit storage optimisations, maybe one of the mongo-like overlays could accommodate that.

optimiz3 · on March 25, 2014

It's more than that - virtually large all datasets I've worked with (> 20B documents) had a keyspace of under 128 keys/columns per nested level of a JSON document. Meaning an optimizing DB could express most keys as single byte (or less with packing). I don't thing column reordering would be necessary unless you exhausted the first 128 keys and you wanted to optimize the base-128 keyspace assignment. Such an operation could be done during compaction or any other operation that rewrites a table.

userbinator · on March 24, 2014

Is there a trend of making verbs reflexive among developers/managers? I've seen "product X has released", "patch Y has applied", and a lot of similar phrases which I'm still having trouble parsing. When I saw the title I thought "has committed what?"

Intermernet · on March 24, 2014

As others have pointed out, this seems to be due to the writer not learning English as their first language.

I love this, and it reminds me of Steven Pinker's "The Language Instinct" where he mentions some stats about the most common dialects of English world-wide are not actually in "native" English speaking countries, but in regions in south-east Asia, as well as India and, increasingly, China.

It's highly likely that, in two or three generations, all of the English speaking population of the world will be "reflexifying" their verbs... Please excuse my mangling of my native tongue :-)

aragot · on March 24, 2014

Native English is the new Latin. Born French, I'm good in English, but I remarked I had something to learn when I noticed I couldn't understand anyone in that country [1] although they could understand each other and were speaking with the same actual rules.

[1] http://en.wikipedia.org/wiki/Singlish#Grammar

portmanteaufu · on March 24, 2014

I've also noticed how very common it has become to find questions in the format "How to do [task]?" rather than "How do I do [task]?"

I strongly suspect that this has arisen from Google searching, where people naturally write their queries to complete the sentence "I would like some information about ________."

nl · on March 25, 2014

"How to do [task]?"

I've seen that most often from people of Indian origin. Not so much from Pakastanis though.

Around 5-10 years ago I used to commonly see "I have doubt on how to <XXX>"[1]. I don't see it quite as often now days.

[1] https://www.google.com.au/search?q="I+have+doubt+on+how+to+*...

fhars · on March 24, 2014

Excused, but not that you missed the obvious "Most English verbs will have reflexified".

Intermernet · on March 24, 2014

Thanks! Much better :-)

richardwhiuk · on March 24, 2014

I think it should read "Jsonb has been committed."

mbesto · on March 24, 2014

Better yet, "Jsonb functionality has been committed to PG"

garenp · on March 24, 2014

Your spidey senses probably just need to first think "non-native English speaker?" :)

pekk · on March 24, 2014

Setting aside speculations of a second-language speaker, it's common to hear things like "landed" from manager types. I think it's a trend of increasingly depersonalized corporate-speak.

Sam added a feature. [Sam did or achieved something and has gained some partial responsibility for the end product]

We added a feature. [Sam is lost in the collective, but at least he is part of us]

A feature was added. [name omitted and agency de-emphasized, but the omission is a bit pointed; one still could wonder who added it]

The feature landed. [not even an implied agent here]

This progression removes more and more of the ownership and achievement of the paid worker-bees until they aren't even there. Then, features are simply landing left and right out of the sky, according to the Jobs-like "vision" of designers, PMs, executives.

Edit: it's like "damnatio memoriae" for the people who are actually laying the brick.

djur · on March 25, 2014

To be fair, that kind of manager-speak is also used to avoid openly assigning blame for production mistakes, which is generally a good thing.

The problem is when it's "the feature landed" but then "Sam broke the feature".

riffraff · on March 24, 2014

I think in this case the writer is not a native speaker. Maybe the trend is "more non native english speakers become developers/managers".

pgeorgi · on March 24, 2014

more like "more non-native english speakers write about their work on the internet".

yread · on March 24, 2014

I thought it was a reference to The Eagle has landed (from Apollo 11)...

booleanbetrayal · on March 24, 2014

This is fantastic news. Hats off to the PostgreSQL team! Hoping this lands on AWS RDS shortly after release.

abstrct · on March 24, 2014

Does the JSON type still have a set of use cases as well or is it just obsolete now? If I was writing large quantities of json documents to a table, and I didn't need tag indexing or targeted updating, would using the JSON type be faster than JSONB?

magnush · on March 24, 2014

Probably the most important difference is that the old JSON types preserves whitespace and attribute ordering, whereas the new one doesn't. If you don't care about those, the old one is also slightly smaller on disk - but I would say not worth it based on the simple fact that you might want to search it in the future some time...

netghost · on March 24, 2014

I think this is largely a new internal storage format that is more performant.

piratebroadcast · on March 24, 2014

Someone explain like I'm 5?

reactor · on March 24, 2014

As of ver 9.4, PostgreSQL support storing data in JSON format. Which it is nothing but 'text' datatype with additional syntax check built-in. You can't do a query or do update partial/selected node inside the JSON doc. But with this new jsonb (JSON Binary) format, you can do simple query as well as update certain node (if it can reach through the query).

Think it like a very basic "MongoDB" implementation :)

Take a look at their devel doc http://www.postgresql.org/docs/devel/static/datatype-json.ht... for more details.

pilif · on March 24, 2014

just nitpicking, but the json data type was added with 9.2 where it was not much more than syntax checked text plus one or two functions to convert from records and arrays into json.

In 9.3 we got many more useful functions that allow to query into json documents, though if you wanted to select rows based on the content of a json document, you'd have to resort to functional indexes.

9.4 (which is what the above post is talking about) will provide index support for arbitrary queries and a much more efficient format to store JSON data which will not involve reparsing it whenever you use any of the above functions.

gdulli · on March 24, 2014

You don't need an index to select rows based on the contents of a json document in 9.3. You may want or need them depending on your queries. (I use them sparingly, the most common pattern for me is that an indexed non-json column filters a query down to a smaller set of rows, and then a further index on the attributes referenced in the query isn't necessary.)

And if you do want to create an index now on a JSON attribute in 9.3, a function isn't necessary.

hackerboos · on March 25, 2014

Am I right in thinking this is only good for reads though?

Do writes still require loading the field into memory, parsing, updating and the writing back?

pilif · on March 25, 2014

Yes. You are right. At the moment there's no exposed functions for altering that binary data structure.

Because of the way how Postgres works though, the row will always have to be rewritten in the future (all updates to a row will cause a new copy to be written - rows are immutable in Postgres). What we might gain in the future is a way to skip the parsing process, but the document will always have to be rewritten.

rch · on March 24, 2014

I'd love to manage HDF5 this way too. Anyone know of an effort already underway?

_pmf_ · on March 24, 2014

First comment: > I believe that this is one of the most significant breakthroughs in Postgres history. Congratulations.

Yeah, right.

avenger123 · on March 24, 2014

I don't follow Postgres too much and I saw the tweet by Peter van Hardenberg first. It's great to see the enthusiasm for the project.

It's likely that 9.5 or even 9.6 will actually see the real significance of this new feature. I don't know how solid the implementation is but once its in the wild and these missing features are added to bring it to parity with MongoDB and others, and the bugs have been fixed, it will be a great solution for a NoSQL database instead of MongoDB and others.

That is significant and worthy of celebrating. This will also give MongoDB and others another yardstick to stay on top of their game.

capkutay · on March 24, 2014

Yeah but MemSQL can do this a bajizillion times faster!

capkutay · on March 24, 2014

I don't know if I got downvoted for sarcasm (which is a valid reason for downvoting). But just incase no one noticed, this was sarcasm.