Hacker News new | past | comments | ask | show | jobs | submit login
The Fixing-JSON Conversation (tbray.org)
49 points by robin_reala on Aug 26, 2016 | hide | past | favorite | 54 comments



SDLang !!!! : https://sdlang.org/

Full example : https://github.com/Abscissa/SDLang-D/wiki/Language-Guide#exa...

Examples:

Creating a Tree

    plants {
        trees {
            deciduous {
                elm
                oak
            }
        }
    }
Creating a Matrix

    myMatrix {
       4  2  5
       2  8  2
       4  2  1
    }
A Tree of Nodes with Values and Attributes

    folder "myFiles" color="yellow" protection=on {
        folder "my images" {
            file "myHouse.jpg" color=true date=2005/11/05
            file "myCar.jpg" color=false date=2002/01/05
        }
        folder "my documents" {
            document "resume.pdf"
        }
    }
Date and Date/Time Literals (and comments!)

    # create a tag called "date" with a date value of Dec 5, 2005
    date 2005/12/05

    # a date time literal without a timezone
    here 2005/12/05 14:12:23.345

    # a date time literal with a timezone
    in_japan 2005/12/05 14:12:23.345-JST


That's beautiful!


For simple data exchange? Way over-engineered is what it is, TFA specifically noted that they want to stay in the overall conceptual complexity of JSON:

> “Just use X” […] Nah, most of them are way, way rich­er than JSON, of­ten with fully-worked-out type sys­tems and Con­cep­tu­al Tu­to­ri­als and so on.


I think now is a good time to re-quote the man himself... Douglas Crockford..

https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaG...

  I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability.  I know that the lack of comments makes some people sad, but it shouldn't. 

  Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.


>I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability.

I call BS. If people want to have custom parsing directives they can send them out of band, encode them in the filename, or whatever. But they don't. And I've not seen this happening with most other serialisation formats either, so why would JSON be a particular target? After all it's value comes from being trivially parsable across languages, and that would be killed by custom parsing directives. Those wanting those would also implement their own parsers etc.

Addition: Besides, reading comments to decide how to parse, implies either "comments on top of the file" or a "2 stage parsing".

With 2 stage parsing, you could implement comments and whetever else yourself, even in pure JSON anyway.

As for "comments on top of the file", well, just disallow them (only allow comments after the first JSON object starts), and no issue with "parsing directives" anymore...


When I read this statement, I was thinking of Pascal style comment directives, e.g.

    { 
      field1: 'hello', 
      //#if (protocol_version>3)
      field2: 'hello',
      //#endif

      /* #charset utf-8 */ field3: 'world world' /* #charset default */
    }
Which falls into neither "comments on top" nor "2 stage parsing". Unlike the C preprocessor, which "eats" whole lines starting with #, Pascal used a {$directive arguments} inline comment style.


If people really wanted parsing directives, they could just say that keys starting with # and their values are parsing directives - e.g:

    {
        "#if": "parserversion > 1.5",
        "key": "somevalue",
        "#else": "",
        "key": "othervalue"
    }
Thought I also don't really see any reason to include parsing directivesin JSON.


Alas, there's no guarantee that the order of key/value pairs within an object will be preserved (not even within JS anymore!). But that's okay, we can always reformat the ordered data as an array or write a custom parser ... oh.

I don't see any reason to include parsing directives in JSON either, but it's a wild world out there and people do all sorts of strange things. Seems that a few of those folks were working at Yahoo and made the mistake of letting Doug see their code when JSON was in prototype phase, so no JSON comments for anyone. Whoops!


If you're going to design something like that, it's wise to make sure it can be represented as valid JSON.

Since JSON objects don't support order or repeated keys, that syntax can't be represented, edited or processed by the rich ecosystem of JSON tools. Most decent JSON editors will show that text with squigly red underlines. It's not worth giving up interoperability, and having to make yet another new set of tools for an incompatible syntax.

That was the mistake that the Angular 2 template syntax made.

But XML-based templating languages like Genshi [1] show how you can obey the rules of XML, use namespaces correctly, support element, attribute and text based expressions, looping, logic and macros, and it works just fine and interoperates perfectly with existing tools.

Genshi was based on another Python based XML templating system called Kid [2], which itself was influenced by Zope's page templates, TAL template attribute language, TALES expressions [3] and METAL templates [4]. Genshi and Kid templates are simple and easy to use compared to the conglomeration of Zope stuff.

Here is the essential trick, described in the Zope manual, which all those languages share, that makes it possible to sidestep the fact that attributes are not ordered:

When there is only one TAL statement per element, the order in which they are executed is simple. Starting with the root element, each element’s statements are executed, then each of its child elements is visited, in order, to do the same.

Any combination of statements may appear on the same elements, except that the content and replace statements may not appear together.

Due to the fact that TAL sees statements as XML attributes, even in HTML documents, it cannot use the order in which statements are written in the tag to determine the order in which they are executed. TAL must also forbid multiples of the same kind of statement on a single element, so it is sufficient to arrange the kinds of statement in a precedence list.

When an element has multiple statements, they are executed in this order:

    1) define
    2) condition
    3) repeat
    4) content or replace
    5) attributes
    6) omit-tag
It would be great to have a Genshi-like templating language for JSON, tightly integrated with JavaScript the same way Genshi is integrated with Python.

[1] https://genshi.edgewall.org/

[2] http://turbogears.org/1.0/docs/GettingStarted/Kid.html

[3] https://docs.zope.org/zope2/zope2book/AppendixC.html

[4] https://docs.zope.org/zope2/zope2book/AppendixC.html#metal-o...


> It would be great to have a Genshi-like templating language for JSON, tightly integrated with JavaScript the same way Genshi is integrated with Python.

But… that already exists. It's called "Python". Just define your data structure using bog-standard Python and serialise it.


What I mean is using JSON as the syntax, and JavaScript as the expression and scripting language.

Zope has this supposedly "restricted subset" of Python that you're allowed to use as expressions and scripts, but it's missing important features and isn't meaningfully safe from a paranoid security perspective, and you end up having to drop down to lower level Zope external methods to write real Python code, which is very inconvenient.

If you trust someone enough to give them access to editing templates with "restricted python expressions", then you can probably trust them enough to use real Python expressions. You'd be unwise to give somebody you don't actually trust access to even a "restricted subset" of Python running on your server. That's just asking for trouble.


> What I mean is using JSON as the syntax, and JavaScript as the expression and scripting language.

… it's the exact same process except with Javascript as the language? Generate your JS datastructure then JSON.stringify it? I don't understand what the issue is or why you'd want a templating language when much of JSON's point is that it matches directly to common standard datastructures.


Okay, so the reasoning is we remove a highly useful feature that most people who use JSON regularly want, because some people were abusing it and using terrible practices?

That's terrible reasoning.


http://www.haskellforall.com/2016/04/worst-practices-should-... . And it seems to have worked out well for JSON.


The second part basically says "don't use JSON for config files, use some unspecified JSON superset".


I find that YAML (a JSON superset) for config and JSON/CBOR on the wire works quite well.



Ugh. Comments are also useful for disabling things without actually deleting them.

My hackey in-band work-around is for the persistence layer (and other code) to ignore any dictionaries in an array that have the "//" key, so I can put a "//": "DISABLED" key at the top of a dict to disable it (and document that and why it's disabled).


Timestamps are so complicated once you factor in timezones and daylight savings that it doesn't belong in JSON. Time zones are not static. They can change from country to country, or even states within countries. Ditto for when daylight savings is enacted during the year - even changing over the years. There is no rhyme or reason to any of this. The data for this has to be stored in tables and time zone meanings can change retroactively. The only reliable time stamp is UTC without leap seconds. (Speaking of leap seconds, who thought seconds going from 0 - 60 rather than 0 - 59 was a good idea?)

Accurate time is one of the most difficult things to model in computer science.


Time is actually quite simple if you have a good mental model of what you're trying to represent and don't try to mix different concepts into a single value.

This talk explains it VERY nicely: https://www.youtube.com/watch?v=2rnIHsqABfM

Basically just decide whether you're trying to store an absolute time (a timestamp will do) or a civil time (year, month, day, etc.) and treat them as two separate data types.

(If you just use "civil time + offset from UTC" like RFC 3339 does, then you can convert it to an absolute time, but you can convert only that one specific value using that offset, and not any other - i.e. that offset is not a substitute for an actual timezone identifier.)


If by timestamp you mean Unix time, you have to keep in mind that it evolves non-linearly. When a leap second happens, Unix time increases linearly by one second for that second, and jumps back one second, and again increases linearly by one second for the leap second.

If you have sub-second precision and expect Unix time seconds to always go forward, or if you expect every Unix second to last a second (eg. by computing Unix time differences), things might break.

https://en.wikipedia.org/wiki/Unix_time#Leap_seconds


>Time is actually quite simple if you have a good mental model of what you're trying to represent and don't try to mix different concepts into a single value.

Not even close.

http://www.creativedeletion.com/2015/01/28/falsehoods-progra...


If you watch the video I linked above, he explicitly mentions that 99% of the time you should not be dealing with any of those things manually - that if you find yourself working with offsets and DST values, it's a sign that you're most likely doing something wrong.


>that if you find yourself working with offsets and DST values, it's a sign that you're most likely doing something wrong.

The video is wrong then.

If you're writing an application that deals with times (e.g. stores and queries events with specific timestamps) and you don't take offsets and DST values into account, you get all kinds of weird edge cases.


No, the point is that you should not be doing those thing manually, i.e. you should never be adding integer offsets to something or doing similar operations. Instead, all the rules for conversions between times are already stored in the timezone database, so all you should do is something like

    ToAbsolute(CivilTime, TimeZone)
and the reverse. He also mentions (at 26:50) proper ways of dealing with repeating and non-existent civil times close to DST transitions (and sane default ways if you just don't want to bother).

When talking about JSON or serialization formats specifically, none of the complexities need to ever leak into the representation.


> Instead, all the rules for conversions between times are already stored in the timezone database, so all you should do is something like

That only works if you're completely detached from the user and don't care for them. Example:

On January 1st 2011, a Samoan user living in Samoa (timezone Pacific/Apia) records an event for January 1st 2012. You convert January 1st 2012 09:00:00 to UTC, storing 2011-01-01T20:00:00.

On January 2nd 2012 at 10AM, you remind your user that they had an event set.

Because in May 2011 Samoa announced they were going to skip a local day and move across the international date line. So 2011-12-30T09:00:00 UTC was 2011-12-29T23:00:00 Pacific/Apia, but 2011-12-30T10:00:00 UTC was 2011-12-31T00:00:00 Pacific/Apia.

And as far as your user is concerned, they told you to ping them on January 1st at 9AM and you pinged a day late. Just because you store absolute datetimes doesn't mean you won't fuck up, and when the data is user-provided, chances are as good that that decision will be the one fuckup.


That still doesn't mean that you should do any of those operations by hand (which is what the previous comment was about).

But sure, if you do scheduling for future times, you do need to be aware of such possibilities and store the future times as civil times (and have some sane way of handling non-existing/repeating times - but again, most of the time the system/library will do this for you).

Then even if the user is flying across the world, they can still get the alert at the right time wherever they are (assuming the device updates local timezone based on location).

I never said that you should only store absolute times - only that they are a separate data type and you shouldn't mix them or try to convert them by hand.


+1 serialise things without the processing complexities.

In general, an event is either scheduled in the future relative to a specific physical location, in which case you want to record it with a symbolic timezone reference (in case the TZ DB changes in the meantime), or it's timed absolutely, in which case you want to record it as UTC.

The trick is knowing which events are which. If you're scheduling something for a machine, probably just use UTC. If you're machine-recording the time of an event, even if it's based on a human pushing a button, use UTC and convert back to local time for display if necessary. If the user told you what time they wanted to store, store it as local time with a symbolic timezone.

Where it gets really difficult is interoperating with systems that don't do things properly, or working within systems that don't let you do things properly.


> The only reliable time stamp is UTC without leap seconds.

That doesn't make a lot of sense, as UTC does have leap seconds. It is similar to saying that the fastest car has no wheels, when you really mean that the fastest vehicle is a rocket.

TAI is the most reliable and easiest to work with. It relies on atomic clock seconds at sea level. https://en.wikipedia.org/wiki/International_Atomic_Time

However, it already has a difference of roughly 40 seconds with UTC (and therefore civil time), and dropping leap seconds in civil time will shift midnight to later in the day.

But as the frequency of leap seconds rapidly increases, maintaining UTC will become harder. They will consider dropping leap seconds from UTC in 2023. It is unlikely that people care about having the sun rise at midnight in 30000 years.


> That doesn't make a lot of sense, as UTC does have leap seconds.

Pedantic much? Posting in technical forums is such a bother. People pick at any unimportant detail.

Reworded just for you:

The only reliable time stamp would be something like UTC but without leap seconds.


I am sorry. My intention was not to sound pedantic. I only wanted to point out that we are fortunate enough to already have what you describe.

I too wish we used TAI-based timestamps instead of Unix timestamps in a more widespread manner!

It is a bit silly that we cannot determine any duration involving a UTC time a couple of years in the future…


> (Speaking of leap seconds, who thought seconds going from 0 - 60 rather than 0 - 59 was a good idea?)

Who thought February going from 1-29 instead of 1-28 was a good idea?

I don't understand why everyone seems to believe that the phenomena are so inherently different.


Because other months went to 30 or 31 already, so there was room for it in calendars and people never assumed months are always the same length.

60 seconds in a minute, OTOH, is a very strong assumption in many systems and people's minds.


> other months went to 30 or 31 already, so there was room for it in calendars and people never assumed months are always the same length

I really don't think this has much to do with it. It would predict no problems trying to add a November 31 to some year; I tend to make the opposite prediction.



I think the idea about removing whitespace is kinda hilarious, because that would mean one could write:

    ["a" "b" "c"]
or:

    {"a": "foo" "b": "bar" "c": "baz"}
The advantage over:

    (a b c)
or:

    (a foo b bar c baz)
or:

    ((a foo) (b bar) (c baz))
or:

    ((a . foo) (b . bar) (c . baz))
seems … non-existent.


I don't think # or // for comments is a very good idea as it would also make newline characters significant. I find it useful to be able store a JSON object per-line.


Personally, I would really like to see integer object keys (as opposed to only string keys). For simple numeric transformations, strings feel really heavy and require annoying conversion in languages. E.g. {"10": 60, "42": 2}.


The flipside is that an integer-keyed map is similar in meaning to an array, which associates by virtue of placement, an integer with the value sitting at that index.

While it's possible to spec it to forbid this interpretation, Lua has made this interpretation a language feature, and it'd become impossible to construct an unambigous parser/printer in Lua for this new format.


> it'd become impossible to construct an unambigous parser/printer in Lua for this new format

How so? JSON is just text; you can parse it however you like. For example, I can write a parser that produces the byte sequence "It checks out." for any valid JSON input. A Lua JSON parser that represents objects with integer keys as maps with string keys, and a Lua JSON parser that represents objects with integer keys as sparse arrays, are both unambiguous and both correct (as to a hypothetical JSON which allowed integer keys). JSON is a format, not a Lua structure.

Even if your JSON data is

    { 10: 10, "10": "ten" }
there's no problem being able to write an unambiguous parser. Define what your parser does in that situation, and it's unambiguous.


This guy is an idiot anyway. There's no way to "fix JSON". All you can do is create a new language, it doesn't matter if you call it JSON 2.0, it will still be incompatible with all the JSON parsers of today. I don't get why he is so mad at people suggesting him to use one of the JSON supersets that exist today.


If // and /* are used as comments, then most of this new extended-JSON will still be valid Javascript.

If # is used as comments, then this breaks documents being Javascript.

The post says that "don't eval() JSON ever", but that's like Crockford leaving out comments originally in order to stop them being abused as processor directives...


Like the post says, JSON is already not guaranteed to be valid JS so this isn't really a problem. The fact that 99% of the time it works to just eval it is great and granted, the "feature" that triggers this is incompatibility is a bit obscure.

But if you just do the right thing from the start you'll never have a thing to worry about in the first place, # comments or not.


This is a bit of a strawman argument though. JSON is rarely used in a generic form - but instead as a format between endpoints (servers, clients, whathaveyou). When you control the server, you control the API - so random UTF-8 magic crap? Wasn't valid in the first place.

Just because you can express things that aren't javascript with it doesn't mean that ever actually turns up in practice -- more importantly 95% of the time, you control both the producer and consumer of JSON because it's consumed internally anyhow - at which point - you're not _going_ to write illegal json to yourself.

This is a true statement, but virtually nothing in practice cares.


I'm jealous. I've had it show up in production. I think we were using script tags with sanitized JSON to re-hydrate an isomorphically rendered page.

Of course, a user entered one of the valid json, invalid JS characters in a field, which made its way into our database. Once there we started having weird errors show up between the frontend and client. That little gem took us days to track down.

JSON should be either a subset of javascript, or obviously not a subset of javascript. 99% compatible systems are dangerous landmines. That you haven't been blown up yet doesn't make it ok.


JSON (for the most part) is a nice format to work with, aside from loosely defined datetimes as mention.

The two areas where I believe the format can greatly be improved; 1# having a standard to define the structure (sometimes schemas can be handy!); 2# a stranded binary format, yes right now with have UBJSON (which doesn't have a date format, this is worse in binary) and BSON (which contains some MongoDB specific stuff).

I'm not saying they don't have their place, but.. Protocol Buffers are more akeen to .net or Java serialization, in the they're quite fragile if used with different versions and/ or with different vendors.


MessagePack is better supported than UBJSON and has far more implementations in almost every language.

http://msgpack.org/index.html

JSON Schema draft 4 is the defacto schema standard for JSON.

http://json-schema.org/latest/json-schema-core.html

Having used both XML schema and this one, I much prefer using JSON schema.


    “Just use X” · For val­ues of X in­clud­ing Hj­son, Ama­zon Ion, edn, Tran­sit, YAML, and TOML. ¶
    Nah, most of them are way, way rich­er than JSON, of­ten with fully-worked-out type sys­tems and Con­cep­tu­al Tu­to­ri­als and so on.
What? MOST OF THEM? YAML is not, Hjson is not, TOML is not.


YAML... really? Looking at the examples in the Wikipedia article (https://en.wikipedia.org/wiki/YAML) gives me a headache. Fortunately, most actual YAML files I've seen are not that complicated.



He summarized the most upvoted posts from the last thread [1] really well.

[1] https://news.ycombinator.com/item?id=12328088

Regarding datetimes, it's worth pointing out the conversation that TOML had about it. It's a pretty long read [2][3][4][5] with lots of points raised for and against, but it also shows some of the process of how consensus was eventually forged: through trial-and-error, some enlightening realizations, expert opinions, and a willingness to leave some aspects of the behavior it up to parser, to avoid requiring all other languages to reimplement half of Java 8 Time.

[2] https://github.com/toml-lang/toml/pull/414

[3] https://github.com/toml-lang/toml/pull/362

[4] https://github.com/toml-lang/toml/issues/412

[5] https://github.com/toml-lang/toml/issues/263

The salient point being that RFC 3339 does not in truth describe exactly one datatype, so you can't just reference the spec and hope everyone reads it the same way. EDIT: Specifically, RFC 3339 says:

"Date and time expressions indicate an instant in time. Description of time periods, or intervals, is not covered here.", but then goes on to define [6] a number of different syntaxes in ABNF, to indicate the subsets of ISO 8601 that "SHOULD be used in new protocols on the Internet." It essentially never defines what a 'valid' RFC 3339 object looks like, it doesn't explicitly say which ones are considered complete representations, so it's not clear if, say, '2016' is a valid RFC 3339 object... but the ones towards the bottom contain more than one discrete term, and can be presumed to be 'complete' representations. These are:

[A] partial-time: HH:MM:SS(.SSS)

[B] full-date: YYYY-MM-DD

[C] full-time: 'partial-time' +/- offsetFromUTC(HH:MM)

[D] date-time: 'full-date' "T" 'full-time'

Out of these, [D] is clearly a timestamp of an absolute instant in time, but the rest are debatable.

[6] https://tools.ietf.org/html/rfc3339#section-5.6


> He summarized the most upvoted posts from the last thread [1] really well.

I feel like he glossed right past the objections to the biggest and (to my mind) most destructive proposed change, the commas-to-whitespace thing; in fact doubles down on it (let's just declare that commas are whitespace! That surely won't confuse anyone!)


I've written a few JSON parsers over the years that treat commas as whitespace. The grammar is simpler and the parser is faster as result. As long as one always emits standards-compliant JSON there's no problem.

Had the JSON standard supported ECMAScript array holes [1,,,2,,3] this grammar shortcut would not have been possible. But luckily that's not the case.


Well, that's the thing, though: sure, for machine parsing it really doesn't matter what you use as a delimiter.

But the only reason to get rid of commas is to eliminate the trailing comma problem, which only occurs when hand-editing JSON. Replacing that with whitespace, or worse both whitespace and commas, would be a lot more prone to hand-editing errors, I think, than would the much less drastic change of allowing trailing commas. Or better still of just leaving JSON as is and letting people use a more robust protocol, if that fits their needs, or pre-parsing whatever special snowflake variations they want into standards-compliant JSON.

I'm more or less in agreement with the commenter on his site who said

> this entire proposal pretty much comes down to "I like JSON, but need more and am too lazy to write the extra 3 line wrapper to process type 'x'." I'd say no thanks. https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JS...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: