Hacker News new | past | comments | ask | show | jobs | submit login
JSON will be a core type in PostgreSQL 9.2 (planetpostgresql.org)
358 points by zmanji on Jan 16, 2012 | hide | past | favorite | 41 comments



This is nice, but it's a shame that after going to the trouble of writing a JSON parser in C and incorporating it into the main Postgres codebase, they didn't go one step further and add a json_project_key function that pulls values out of a JSON blob. That, combined with functional indices, allows you to store JSON blobs in your database and index on values inside them, or put another way, get the data model benefits of a NoSQL database [1] without sacrificing the mature Postgres internals.

The Heroku Postgres guys have been playing with this idea [2] using the PL/V8 plugin, which embeds Javascript as a supported language inside Postgres (and thus makes it trivial to implement the json_project_key function), but if Postgres is going to natively support JSON parsing then it shouldn't take an addon module to achieve this.

[1] Attempting to forestall the thread-jacking: I know NoSQL databases have other benefits besides their data model, but for some applications that's certainly one of the benefits. [2] https://gist.github.com/1150804


Well. The parser is in the code now. The data type exists. So I assume it's possible to quite easily write a Postgres extension module that adds the missing pieces.

Reading from the linked article I would assume that they were kind of running out of time for the feature freeze.

I would expect the querying functions to be added in the release after this, or before in form of an extension.


I can't help but think this is just step 1 in bringing native support for JSON indexing and querying.


On the other hand, as far as I can see XML support still does not allow for indexing or querying of the document itself, at least in core, even though it's been a postgres datatype for a long time.


I will donate money toward a json_project_key feature this way.


This is typical for how Postgres adds features and functionalities. They tend to add a new core feature in one release, then add adjacent functionality and performance improvements in later releases.

I'm not on their core developer mailing lists, but I presume this is because of a prioritization of stability over most everything else.


It's also because Postgres work to internal release deadlines.

They'll take a stable half-feature over a buggy full-feature at any given deadline. What matters is that over time they patiently and carefully expand those half features.

The most visible example is the slowly increasing coverage of replication. This JSON feature is another example -- I expect it will grow in future into a fuller feature set.


>get the data model benefits of a NoSQL database [1] without sacrificing the mature Postgres internals.

They already have that: http://www.postgresql.org/docs/9.1/static/hstore.html


HStore looks great, but as I understand it, it's just one level of key-value - JSON (and any document database) lets you store nested hashes.


Interesting. Being able to output the results of a query as JSON will be useful when running something like Node.js. I can't shake my feeling that something like this doesn't belong in the SQL statement, though- choice of output format seems like it's something connecting libraries ought to do.

That said, having seen 'core type' I instantly imagined being able to query based on JSON properties, which doesn't appear to be the case. Not surprising, because it would be a huge amount of work.. but it's nice to imagine.

(before anyone says anything- yes, I know NoSQL exists. But a hybrid solution using Postgres would be very interesting)


You might be interested in postgresql's hstore: http://www.postgresql.org/docs/9.0/static/hstore.html

It's a key-value store that allows querying, which is what I think you are lamenting in your comment. Here's a bit more about how to query and index with hstore: http://lwn.net/Articles/406385/. It's pretty simple and doesn't have anywhere near the number of querying possibilities that MongoDB has, but it can be used for on-the-fly column names (similar to JSON). You can only query on the root node's children, unlike the open-ended possibilities in NoSQL.


Oh, that is very intersting, thanks!

I'm (quite happily) tied to Postgres because I'm using PostGIS, but the ability to add freeform data to a location would be ideal. Looks like I may already have a solution here.


@scorpioxy -- I can't reply to you directly, but you've been [dead] for a few months now. I see you've posted quality comments, but you posted what could be perceived as a spam post 138 days ago and so you got marked as a spambot. But you're clearly not. Might want to clear that up, but for now, you're dead when you probably ought to not be.


With hstore and psycopg, you'd gain a lot of flexibility that's very similar to another NoSql store. Psycopg does the mapping to/from python dicts.

It's not without a learning curve, but I found it helpful when I hit shortcomings in Mongo's design.


Cool, PostGIS! What are you building? I worked as GIS engineer in 2010. PostGIS is pretty cool technology.


I'm building a 'taxi tracker' app, for want of a better term- gives you estimated trip times, allows you to share the journey with other people so they know where you are.

That part doesn't require PostGIS as such, but it's for an NYC city government app competition, so we have all sort of city datasets to use. I've used PostGIS to help make custom map tiles (preview at https://twitter.com/#!/taxonomyapp/status/149565007384940545), highlight the outline of the building you're heading to... all sorts. It's been a fantastic learning exercise.


I don't know exactly what your GIS needs are, but it might be worth seeing if MongoDB's geospatial indexing meets them if you are interested in querying free-form data.

http://www.mongodb.org/display/DOCS/Geospatial+Indexing http://www.mongodb.org/display/DOCS/Geospatial+Haystack+Inde...


As someone who is building a similar app, for a similar competition (but in Boston):

Don't combine your data structures / stores. Use the store / structure that is most appropriate for the given task. Memory is cheap vs compute time.

Hint: many can be only in memory stores that periodically sync to a backing store for updates.


What would be interesting would be an extension to the SQL parser that supports JSONPath in SQL queries. like a WHERE JSON_COLUMN = some JSONPath query. Where it would act like a WHERE IN clause and could use a JSON style query syntax. I see no need to create a whole new construct when a valid one is available at this point. The two could be blended and it would create a very powerful way to query JSON data types in a relational system.


You wouldn't even need an extension to the SQL parser - you could just expose a SQL function to do the JSONPath query. e.g.

    SELECT users.id, jsonpath('$.timezone', users.preferences_json) AS tz FROM users WHERE tz LIKE 'America/%';


That's not going to help when you need to restrict your query with JSON, though; the select list is processed after the whole FROM clause, including JOINs, so you'd be producing the whole table and then throwing away whatever didn't match.

In general, a few core limitations of the current optimizer - lack of LATERAL key among them - make it impossible to create a PostgreSQL function that blends well with SQL, AFAICT.


> select uid from users wehere jsonpath('$.country', address) = 'CH'

PostgreSQL does support functional indexes, so you can easily make this use indexes and thus be fast enough for real use (it's a bad example - I'd probably put the address into an addresses table, but it's enough to illustrate the point)


For the record, the title is misleading and Andrew Dunstan (the guy whose blog this entry links to, a major PostgreSQL contributor and one of the people involved in having a JSON type in Postgres) clarifies:

http://people.planetpostgresql.org/andrew/index.php?/archive...


My first reaction to this is - "Ugh. Haven't we been down this road with XML?" I don't want the format du jour baked into my datastore. Now, if it had a field that stored some sort of abstract lists-and-hashes structure (sort of redis-in-a-field), that sounds more interesting to me. I realize JSON is isomorphic to just that, but it still seems like flavor of the week.


Look at PostgreSQL's hstore data type - that's pretty much what you're asking for.


hstore is all kinds of awesome, but one limitation is that the values are pure text. You can nest an hstore inside another, but only if you're smart enough to know that it SHOULD be treated as an hstore; as far as Postgres is concerned, it's text that happens to contain '=>'.

If JSON support means first-class type support in a nested object, that's a huge leap forward.


Database-generated XML offers an excellent abstraction layer, allowing the business language to vary independently of the data format. (Swapping Java for C would require rewriting the entire XML construction code.) Coupling it to the database makes sense: swap rows for XML content. For example:

    SELECT xml_user_roster( 1 );
Given the extensive use of JSON in jQuery (Ajax) web applications, integrating JSON with PostgreSQL is smart. It allows the application code to vary independently of the business code that relies on JSON as a data interchange format.

That said, it would be great if JSON and XML were pluggable modules (similar to PL/R) that could be installed when needed.


Pluggable modules would be fine. But surely rewriting marshaling code is a relatively small part of swapping out your application code... And what if you can't or don't want to work with the particular flavor of XML it generates? JSON is better in that regard, I suppose, since there are fewer options to begin with, but you still have to deal with things like date/time formats.


The XML is generated manually through statements such as:

    SELECT
        xmlroot(
          xmlelement( name root,
          xmlelement( name description,
            xmlelement( name title, table.title ),
            xmlelement( name diet, table.diet ) ), etc.
If the format of the XML was out of the developer's control, XSL is a relatively easy way to convert XML from one format to another. If XSL won't handle it, then any number of ETL tools would be more than sufficient. Unless you mean something else by "flavour of XML"?

To get the same document using a language external to the database requires the following steps:

  1. Write the SQL statement (a stored procedure, view, or string).
  2. Instantiate an XML document library (e.g., PHP's DOM).
  3. Iterate over the result set(s).
  4. Build the XML document from the results.
Note that the first step is always required in both situations (whether the query is internal or external to the database).

Steps 2 to 4 effectively echo the first step: they tightly couple the XML format to the expected result set(s) from the database query. There is no abstraction to the application, there is no visible gain.

Also, with XML you can add meta information to the element: <date format="dd-MMM-yyyy">02-FEB-2012</date>. In PostgreSQL, you would use the xmlattribute function.


Pretty neat. I think this is the important bit though: "Basically, his patch just parses the text to make sure it was valid JSON, and stores it as text."

So, if you already have an environment where you are certain you are inserting valid JSON, it's not much different than just using the text type today.


> So, if you already have an environment where you are certain you are inserting valid JSON, it's not much different than just using the text type today.

Today, yes.

Tomorrow, when you launch an API and suddenly hundreds of different apps are talking to your systems, no.

Putting rich descriptions of data right next to the data is a good thing in the long run.


I think it's worth mentioning that MySQL has a nice collection of UDF's to handle JSON object output, arrays, and so on: http://www.mysqludf.org/lib_mysqludf_json/index.php


it would be super awesome if JSONH https://github.com/WebReflection/JSONH (json for homogenous collections) was standardized and adopted. it would be a great csv replacement and a perfect fit for recordsets, providing minimal overhead.


Erm... JSON is pure text - so not much of an innovation in truth - I can store it in SQLight right now.


I think you're missing the point. With this feature you'll be able to write a query that returns a JSON string of the results, possible as a field in a larger result set. It also validates that the JSON parses on the way in.


Thanks for the help - so it is a "result type" (not so much core) - OK that will save a few lines of code - unless I have to write as many lines to handle result sets that do not parse...

I suspect I am not a big fan of abstractions.


I'm with you, this smells of feature creep. I don't see a big benefit over serializing the result set on the client. This is literally a one-liner if you're using an ORM or something like PEP 249.

The new constraint allowing you to ensure the string stored is valid JSON sounds useful though.


> I'm with you, this smells of feature creep.

There are good reasons to have JSON support in the DBMS, such as querying and indexing on the fields within a JSON document.

My only criticism is that postgres has a great extensions mechanism, and it would be better if effort were spent improving the ecosystem around that. I can't think of any reason for it to be in core if installing an extension were a trivial and widely accepted practice.

But, it takes time to really develop that ecosystem. And there are users that want JSON support yesterday.

EDIT: from a technological standpoint, there is no reason why it can't exist as an extension. PostGIS is much more sophisticated on all counts, and it's an extension.


I agree with you insomuch as it's ideal for an extension, and argued for its inclusion as a 9.3 feature (looks like Christmas might come early, though). However, I concluded with this:

  Right now the perception of Postgres...actually, databases in general,
  including virtually all of the newcomers -- is that they are
  monolithic systems, and for most people either "9.3" will "have"
  javascript and indexing of JSON documents, or it won't.  In most cases
  I would say "meh, let them eat cake until extensions become so
  apparently dominant that we can wave someone aside to extension-land",
  but in this case I think that would be a strategic mistake.
http://archives.postgresql.org/pgsql-hackers/2011-12/msg0078...

So this is definitely a social problem. Why not move UUID out of the core? (not uuid-ossp, which implements generation, but uuid parsing and storage) Because, for now, one cannot assume extensions for really common useful functionality.

Luckily, I think things are on the right path to at a future juncture that even a commonly desired data type can live as an extension.


OT nitpick: SQLite.


I was staring at that comment for few seconds trying to figure out what is wrong with what I see.

SQLight. Nice pick.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: