JSON will be a core type in PostgreSQL 9.2

samstokes · on Jan 16, 2012

This is nice, but it's a shame that after going to the trouble of writing a JSON parser in C and incorporating it into the main Postgres codebase, they didn't go one step further and add a json_project_key function that pulls values out of a JSON blob. That, combined with functional indices, allows you to store JSON blobs in your database and index on values inside them, or put another way, get the data model benefits of a NoSQL database [1] without sacrificing the mature Postgres internals.

The Heroku Postgres guys have been playing with this idea [2] using the PL/V8 plugin, which embeds Javascript as a supported language inside Postgres (and thus makes it trivial to implement the json_project_key function), but if Postgres is going to natively support JSON parsing then it shouldn't take an addon module to achieve this.

[1] Attempting to forestall the thread-jacking: I know NoSQL databases have other benefits besides their data model, but for some applications that's certainly one of the benefits. [2] https://gist.github.com/1150804

pilif · on Jan 16, 2012

Well. The parser is in the code now. The data type exists. So I assume it's possible to quite easily write a Postgres extension module that adds the missing pieces.

Reading from the linked article I would assume that they were kind of running out of time for the feature freeze.

I would expect the querying functions to be added in the release after this, or before in form of an extension.

megaman821 · on Jan 16, 2012

I can't help but think this is just step 1 in bringing native support for JSON indexing and querying.

masklinn · on Jan 17, 2012

On the other hand, as far as I can see XML support still does not allow for indexing or querying of the document itself, at least in core, even though it's been a postgres datatype for a long time.

jph · on Jan 17, 2012

I will donate money toward a json_project_key feature this way.

troll24601 · on Jan 16, 2012

This is typical for how Postgres adds features and functionalities. They tend to add a new core feature in one release, then add adjacent functionality and performance improvements in later releases.

I'm not on their core developer mailing lists, but I presume this is because of a prioritization of stability over most everything else.

jacques_chester · on Jan 16, 2012

It's also because Postgres work to internal release deadlines.

They'll take a stable half-feature over a buggy full-feature at any given deadline. What matters is that over time they patiently and carefully expand those half features.

The most visible example is the slowly increasing coverage of replication. This JSON feature is another example -- I expect it will grow in future into a fuller feature set.

SomeOtherGuy · on Jan 17, 2012

>get the data model benefits of a NoSQL database [1] without sacrificing the mature Postgres internals.

They already have that: http://www.postgresql.org/docs/9.1/static/hstore.html

samstokes · on Jan 17, 2012

HStore looks great, but as I understand it, it's just one level of key-value - JSON (and any document database) lets you store nested hashes.

untog · on Jan 16, 2012

Interesting. Being able to output the results of a query as JSON will be useful when running something like Node.js. I can't shake my feeling that something like this doesn't belong in the SQL statement, though- choice of output format seems like it's something connecting libraries ought to do.

That said, having seen 'core type' I instantly imagined being able to query based on JSON properties, which doesn't appear to be the case. Not surprising, because it would be a huge amount of work.. but it's nice to imagine.

(before anyone says anything- yes, I know NoSQL exists. But a hybrid solution using Postgres would be very interesting)

conesus · on Jan 16, 2012

You might be interested in postgresql's hstore: http://www.postgresql.org/docs/9.0/static/hstore.html

It's a key-value store that allows querying, which is what I think you are lamenting in your comment. Here's a bit more about how to query and index with hstore: http://lwn.net/Articles/406385/. It's pretty simple and doesn't have anywhere near the number of querying possibilities that MongoDB has, but it can be used for on-the-fly column names (similar to JSON). You can only query on the root node's children, unlike the open-ended possibilities in NoSQL.

untog · on Jan 16, 2012

Oh, that is very intersting, thanks!

I'm (quite happily) tied to Postgres because I'm using PostGIS, but the ability to add freeform data to a location would be ideal. Looks like I may already have a solution here.

conesus · on Jan 16, 2012

@scorpioxy -- I can't reply to you directly, but you've been [dead] for a few months now. I see you've posted quality comments, but you posted what could be perceived as a spam post 138 days ago and so you got marked as a spambot. But you're clearly not. Might want to clear that up, but for now, you're dead when you probably ought to not be.

scorpioxy · on Jan 16, 2012

With hstore and psycopg, you'd gain a lot of flexibility that's very similar to another NoSql store. Psycopg does the mapping to/from python dicts.

It's not without a learning curve, but I found it helpful when I hit shortcomings in Mongo's design.

wlievens · on Jan 16, 2012

Cool, PostGIS! What are you building? I worked as GIS engineer in 2010. PostGIS is pretty cool technology.

untog · on Jan 16, 2012

I'm building a 'taxi tracker' app, for want of a better term- gives you estimated trip times, allows you to share the journey with other people so they know where you are.

That part doesn't require PostGIS as such, but it's for an NYC city government app competition, so we have all sort of city datasets to use. I've used PostGIS to help make custom map tiles (preview at https://twitter.com/#!/taxonomyapp/status/149565007384940545), highlight the outline of the building you're heading to... all sorts. It's been a fantastic learning exercise.

mathias_10gen · on Jan 17, 2012

I don't know exactly what your GIS needs are, but it might be worth seeing if MongoDB's geospatial indexing meets them if you are interested in querying free-form data.

http://www.mongodb.org/display/DOCS/Geospatial+Indexing http://www.mongodb.org/display/DOCS/Geospatial+Haystack+Inde...

mbell · on Jan 17, 2012

As someone who is building a similar app, for a similar competition (but in Boston):

Don't combine your data structures / stores. Use the store / structure that is most appropriate for the given task. Memory is cheap vs compute time.

Hint: many can be only in memory stores that periodically sync to a backing store for updates.

kls · on Jan 16, 2012

What would be interesting would be an extension to the SQL parser that supports JSONPath in SQL queries. like a WHERE JSON_COLUMN = some JSONPath query. Where it would act like a WHERE IN clause and could use a JSON style query syntax. I see no need to create a whole new construct when a valid one is available at this point. The two could be blended and it would create a very powerful way to query JSON data types in a relational system.

samstokes · on Jan 16, 2012

You wouldn't even need an extension to the SQL parser - you could just expose a SQL function to do the JSONPath query. e.g.

    SELECT users.id, jsonpath('$.timezone', users.preferences_json) AS tz FROM users WHERE tz LIKE 'America/%';

jaylevitt · on Jan 17, 2012

That's not going to help when you need to restrict your query with JSON, though; the select list is processed after the whole FROM clause, including JOINs, so you'd be producing the whole table and then throwing away whatever didn't match.

In general, a few core limitations of the current optimizer - lack of LATERAL key among them - make it impossible to create a PostgreSQL function that blends well with SQL, AFAICT.

pilif · on Jan 17, 2012

> select uid from users wehere jsonpath('$.country', address) = 'CH'

PostgreSQL does support functional indexes, so you can easily make this use indexes and thus be fast enough for real use (it's a bad example - I'd probably put the address into an addresses table, but it's enough to illustrate the point)

wulczer · on Jan 17, 2012

For the record, the title is misleading and Andrew Dunstan (the guy whose blog this entry links to, a major PostgreSQL contributor and one of the people involved in having a JSON type in Postgres) clarifies:

http://people.planetpostgresql.org/andrew/index.php?/archive...

kemiller · on Jan 16, 2012

My first reaction to this is - "Ugh. Haven't we been down this road with XML?" I don't want the format du jour baked into my datastore. Now, if it had a field that stored some sort of abstract lists-and-hashes structure (sort of redis-in-a-field), that sounds more interesting to me. I realize JSON is isomorphic to just that, but it still seems like flavor of the week.

elehack · on Jan 16, 2012

Look at PostgreSQL's hstore data type - that's pretty much what you're asking for.

jaylevitt · on Jan 16, 2012

hstore is all kinds of awesome, but one limitation is that the values are pure text. You can nest an hstore inside another, but only if you're smart enough to know that it SHOULD be treated as an hstore; as far as Postgres is concerned, it's text that happens to contain '=>'.

If JSON support means first-class type support in a nested object, that's a huge leap forward.

thangalin · on Jan 17, 2012

Database-generated XML offers an excellent abstraction layer, allowing the business language to vary independently of the data format. (Swapping Java for C would require rewriting the entire XML construction code.) Coupling it to the database makes sense: swap rows for XML content. For example:

    SELECT xml_user_roster( 1 );

Given the extensive use of JSON in jQuery (Ajax) web applications, integrating JSON with PostgreSQL is smart. It allows the application code to vary independently of the business code that relies on JSON as a data interchange format.

That said, it would be great if JSON and XML were pluggable modules (similar to PL/R) that could be installed when needed.

kemiller · on Jan 17, 2012

Pluggable modules would be fine. But surely rewriting marshaling code is a relatively small part of swapping out your application code... And what if you can't or don't want to work with the particular flavor of XML it generates? JSON is better in that regard, I suppose, since there are fewer options to begin with, but you still have to deal with things like date/time formats.

thangalin · on Jan 17, 2012

The XML is generated manually through statements such as:

    SELECT
        xmlroot(
          xmlelement( name root,
          xmlelement( name description,
            xmlelement( name title, table.title ),
            xmlelement( name diet, table.diet ) ), etc.

If the format of the XML was out of the developer's control, XSL is a relatively easy way to convert XML from one format to another. If XSL won't handle it, then any number of ETL tools would be more than sufficient. Unless you mean something else by "flavour of XML"?

To get the same document using a language external to the database requires the following steps:

  1. Write the SQL statement (a stored procedure, view, or string).
  2. Instantiate an XML document library (e.g., PHP's DOM).
  3. Iterate over the result set(s).
  4. Build the XML document from the results.

Note that the first step is always required in both situations (whether the query is internal or external to the database).

Steps 2 to 4 effectively echo the first step: they tightly couple the XML format to the expected result set(s) from the database query. There is no abstraction to the application, there is no visible gain.

Also, with XML you can add meta information to the element: <date format="dd-MMM-yyyy">02-FEB-2012</date>. In PostgreSQL, you would use the xmlattribute function.

heyrhett · on Jan 16, 2012

Pretty neat. I think this is the important bit though: "Basically, his patch just parses the text to make sure it was valid JSON, and stores it as text."

So, if you already have an environment where you are certain you are inserting valid JSON, it's not much different than just using the text type today.

jacques_chester · on Jan 16, 2012

> So, if you already have an environment where you are certain you are inserting valid JSON, it's not much different than just using the text type today.

Today, yes.

Tomorrow, when you launch an API and suddenly hundreds of different apps are talking to your systems, no.

Putting rich descriptions of data right next to the data is a good thing in the long run.

igrigorik · on Jan 16, 2012

I think it's worth mentioning that MySQL has a nice collection of UDF's to handle JSON object output, arrays, and so on: http://www.mysqludf.org/lib_mysqludf_json/index.php

leeoniya · on Jan 16, 2012

it would be super awesome if JSONH https://github.com/WebReflection/JSONH (json for homogenous collections) was standardized and adopted. it would be a great csv replacement and a perfect fit for recordsets, providing minimal overhead.

bdfh42 · on Jan 16, 2012

Erm... JSON is pure text - so not much of an innovation in truth - I can store it in SQLight right now.

TylerE · on Jan 16, 2012

I think you're missing the point. With this feature you'll be able to write a query that returns a JSON string of the results, possible as a field in a larger result set. It also validates that the JSON parses on the way in.

bdfh42 · on Jan 16, 2012

Thanks for the help - so it is a "result type" (not so much core) - OK that will save a few lines of code - unless I have to write as many lines to handle result sets that do not parse...

I suspect I am not a big fan of abstractions.

julian37 · on Jan 16, 2012

I'm with you, this smells of feature creep. I don't see a big benefit over serializing the result set on the client. This is literally a one-liner if you're using an ORM or something like PEP 249.

The new constraint allowing you to ensure the string stored is valid JSON sounds useful though.

jeffdavis · on Jan 17, 2012

> I'm with you, this smells of feature creep.

There are good reasons to have JSON support in the DBMS, such as querying and indexing on the fields within a JSON document.

My only criticism is that postgres has a great extensions mechanism, and it would be better if effort were spent improving the ecosystem around that. I can't think of any reason for it to be in core if installing an extension were a trivial and widely accepted practice.

But, it takes time to really develop that ecosystem. And there are users that want JSON support yesterday.

EDIT: from a technological standpoint, there is no reason why it can't exist as an extension. PostGIS is much more sophisticated on all counts, and it's an extension.

fdr · on Jan 17, 2012

I agree with you insomuch as it's ideal for an extension, and argued for its inclusion as a 9.3 feature (looks like Christmas might come early, though). However, I concluded with this:

  Right now the perception of Postgres...actually, databases in general,
  including virtually all of the newcomers -- is that they are
  monolithic systems, and for most people either "9.3" will "have"
  javascript and indexing of JSON documents, or it won't.  In most cases
  I would say "meh, let them eat cake until extensions become so
  apparently dominant that we can wave someone aside to extension-land",
  but in this case I think that would be a strategic mistake.

http://archives.postgresql.org/pgsql-hackers/2011-12/msg0078...

So this is definitely a social problem. Why not move UUID out of the core? (not uuid-ossp, which implements generation, but uuid parsing and storage) Because, for now, one cannot assume extensions for really common useful functionality.

Luckily, I think things are on the right path to at a future juncture that even a commonly desired data type can live as an extension.

jacques_chester · on Jan 16, 2012

OT nitpick: SQLite.

tzury · on Jan 17, 2012

I was staring at that comment for few seconds trying to figure out what is wrong with what I see.

SQLight. Nice pick.