The Fixing-JSON Conversation

Zardoz84 · on Aug 26, 2016

Full example : https://github.com/Abscissa/SDLang-D/wiki/Language-Guide#exa...

Examples:

Creating a Tree

    plants {
        trees {
            deciduous {
                elm
                oak
            }
        }
    }

Creating a Matrix

A Tree of Nodes with Values and Attributes

    folder "myFiles" color="yellow" protection=on {
        folder "my images" {
            file "myHouse.jpg" color=true date=2005/11/05
            file "myCar.jpg" color=false date=2002/01/05
        }
        folder "my documents" {
            document "resume.pdf"
        }
    }

Date and Date/Time Literals (and comments!)

    # create a tag called "date" with a date value of Dec 5, 2005
    date 2005/12/05

    # a date time literal without a timezone
    here 2005/12/05 14:12:23.345

    # a date time literal with a timezone
    in_japan 2005/12/05 14:12:23.345-JST

_pmf_ · on Aug 26, 2016

That's beautiful!

masklinn · on Aug 26, 2016

For simple data exchange? Way over-engineered is what it is, TFA specifically noted that they want to stay in the overall conceptual complexity of JSON:

> “Just use X” […] Nah, most of them are way, way richer than JSON, often with fully-worked-out type systems and Conceptual Tutorials and so on.

RoryH · on Aug 26, 2016

I think now is a good time to re-quote the man himself... Douglas Crockford..

https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaG...

  I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability.  I know that the lack of comments makes some people sad, but it shouldn't. 

  Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.

coldtea · on Aug 26, 2016

>I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability.

I call BS. If people want to have custom parsing directives they can send them out of band, encode them in the filename, or whatever. But they don't. And I've not seen this happening with most other serialisation formats either, so why would JSON be a particular target? After all it's value comes from being trivially parsable across languages, and that would be killed by custom parsing directives. Those wanting those would also implement their own parsers etc.

Addition: Besides, reading comments to decide how to parse, implies either "comments on top of the file" or a "2 stage parsing".

With 2 stage parsing, you could implement comments and whetever else yourself, even in pure JSON anyway.

As for "comments on top of the file", well, just disallow them (only allow comments after the first JSON object starts), and no issue with "parsing directives" anymore...

beagle3 · on Aug 26, 2016

When I read this statement, I was thinking of Pascal style comment directives, e.g.

    { 
      field1: 'hello', 
      //#if (protocol_version>3)
      field2: 'hello',
      //#endif

      /* #charset utf-8 */ field3: 'world world' /* #charset default */
    }

Which falls into neither "comments on top" nor "2 stage parsing". Unlike the C preprocessor, which "eats" whole lines starting with #, Pascal used a {$directive arguments} inline comment style.

mort96 · on Aug 26, 2016

If people really wanted parsing directives, they could just say that keys starting with # and their values are parsing directives - e.g:

    {
        "#if": "parserversion > 1.5",
        "key": "somevalue",
        "#else": "",
        "key": "othervalue"
    }

Thought I also don't really see any reason to include parsing directivesin JSON.

tfm · on Aug 26, 2016

Alas, there's no guarantee that the order of key/value pairs within an object will be preserved (not even within JS anymore!). But that's okay, we can always reformat the ordered data as an array or write a custom parser ... oh.

I don't see any reason to include parsing directives in JSON either, but it's a wild world out there and people do all sorts of strange things. Seems that a few of those folks were working at Yahoo and made the mistake of letting Doug see their code when JSON was in prototype phase, so no JSON comments for anyone. Whoops!

DonHopkins · on Aug 26, 2016

If you're going to design something like that, it's wise to make sure it can be represented as valid JSON.

Since JSON objects don't support order or repeated keys, that syntax can't be represented, edited or processed by the rich ecosystem of JSON tools. Most decent JSON editors will show that text with squigly red underlines. It's not worth giving up interoperability, and having to make yet another new set of tools for an incompatible syntax.

That was the mistake that the Angular 2 template syntax made.

But XML-based templating languages like Genshi [1] show how you can obey the rules of XML, use namespaces correctly, support element, attribute and text based expressions, looping, logic and macros, and it works just fine and interoperates perfectly with existing tools.

Genshi was based on another Python based XML templating system called Kid [2], which itself was influenced by Zope's page templates, TAL template attribute language, TALES expressions [3] and METAL templates [4]. Genshi and Kid templates are simple and easy to use compared to the conglomeration of Zope stuff.

Here is the essential trick, described in the Zope manual, which all those languages share, that makes it possible to sidestep the fact that attributes are not ordered:

When there is only one TAL statement per element, the order in which they are executed is simple. Starting with the root element, each element’s statements are executed, then each of its child elements is visited, in order, to do the same.

Any combination of statements may appear on the same elements, except that the content and replace statements may not appear together.

Due to the fact that TAL sees statements as XML attributes, even in HTML documents, it cannot use the order in which statements are written in the tag to determine the order in which they are executed. TAL must also forbid multiples of the same kind of statement on a single element, so it is sufficient to arrange the kinds of statement in a precedence list.

When an element has multiple statements, they are executed in this order:

    1) define
    2) condition
    3) repeat
    4) content or replace
    5) attributes
    6) omit-tag

It would be great to have a Genshi-like templating language for JSON, tightly integrated with JavaScript the same way Genshi is integrated with Python.

[1] https://genshi.edgewall.org/

[2] http://turbogears.org/1.0/docs/GettingStarted/Kid.html

[3] https://docs.zope.org/zope2/zope2book/AppendixC.html

[4] https://docs.zope.org/zope2/zope2book/AppendixC.html#metal-o...

masklinn · on Aug 26, 2016

> It would be great to have a Genshi-like templating language for JSON, tightly integrated with JavaScript the same way Genshi is integrated with Python.

But… that already exists. It's called "Python". Just define your data structure using bog-standard Python and serialise it.

DonHopkins · on Aug 26, 2016

What I mean is using JSON as the syntax, and JavaScript as the expression and scripting language.

Zope has this supposedly "restricted subset" of Python that you're allowed to use as expressions and scripts, but it's missing important features and isn't meaningfully safe from a paranoid security perspective, and you end up having to drop down to lower level Zope external methods to write real Python code, which is very inconvenient.

If you trust someone enough to give them access to editing templates with "restricted python expressions", then you can probably trust them enough to use real Python expressions. You'd be unwise to give somebody you don't actually trust access to even a "restricted subset" of Python running on your server. That's just asking for trouble.

masklinn · on Aug 26, 2016

> What I mean is using JSON as the syntax, and JavaScript as the expression and scripting language.

… it's the exact same process except with Javascript as the language? Generate your JS datastructure then JSON.stringify it? I don't understand what the issue is or why you'd want a templating language when much of JSON's point is that it matches directly to common standard datastructures.

moonshinefe · on Aug 26, 2016

Okay, so the reasoning is we remove a highly useful feature that most people who use JSON regularly want, because some people were abusing it and using terrible practices?

That's terrible reasoning.

lmm · on Aug 26, 2016

http://www.haskellforall.com/2016/04/worst-practices-should-... . And it seems to have worked out well for JSON.

realharo · on Aug 26, 2016

The second part basically says "don't use JSON for config files, use some unspecified JSON superset".

efaref · on Aug 26, 2016

I find that YAML (a JSON superset) for config and JSON/CBOR on the wire works quite well.

lmm · on Aug 26, 2016

https://github.com/typesafehub/config/blob/master/HOCON.md makes it specified.

DonHopkins · on Aug 26, 2016

Ugh. Comments are also useful for disabling things without actually deleting them.

My hackey in-band work-around is for the persistence layer (and other code) to ignore any dictionaries in an array that have the "//" key, so I can put a "//": "DISABLED" key at the top of a dict to disable it (and document that and why it's disabled).

thymelord · on Aug 26, 2016

Timestamps are so complicated once you factor in timezones and daylight savings that it doesn't belong in JSON. Time zones are not static. They can change from country to country, or even states within countries. Ditto for when daylight savings is enacted during the year - even changing over the years. There is no rhyme or reason to any of this. The data for this has to be stored in tables and time zone meanings can change retroactively. The only reliable time stamp is UTC without leap seconds. (Speaking of leap seconds, who thought seconds going from 0 - 60 rather than 0 - 59 was a good idea?)

Accurate time is one of the most difficult things to model in computer science.

realharo · on Aug 26, 2016

Time is actually quite simple if you have a good mental model of what you're trying to represent and don't try to mix different concepts into a single value.

This talk explains it VERY nicely: https://www.youtube.com/watch?v=2rnIHsqABfM

Basically just decide whether you're trying to store an absolute time (a timestamp will do) or a civil time (year, month, day, etc.) and treat them as two separate data types.

(If you just use "civil time + offset from UTC" like RFC 3339 does, then you can convert it to an absolute time, but you can convert only that one specific value using that offset, and not any other - i.e. that offset is not a substitute for an actual timezone identifier.)

espadrine · on Aug 26, 2016

If by timestamp you mean Unix time, you have to keep in mind that it evolves non-linearly. When a leap second happens, Unix time increases linearly by one second for that second, and jumps back one second, and again increases linearly by one second for the leap second.

If you have sub-second precision and expect Unix time seconds to always go forward, or if you expect every Unix second to last a second (eg. by computing Unix time differences), things might break.

https://en.wikipedia.org/wiki/Unix_time#Leap_seconds

coldtea · on Aug 26, 2016

>Time is actually quite simple if you have a good mental model of what you're trying to represent and don't try to mix different concepts into a single value.

Not even close.

http://www.creativedeletion.com/2015/01/28/falsehoods-progra...

realharo · on Aug 26, 2016

If you watch the video I linked above, he explicitly mentions that 99% of the time you should not be dealing with any of those things manually - that if you find yourself working with offsets and DST values, it's a sign that you're most likely doing something wrong.

coldtea · on Aug 26, 2016

>that if you find yourself working with offsets and DST values, it's a sign that you're most likely doing something wrong.

The video is wrong then.

If you're writing an application that deals with times (e.g. stores and queries events with specific timestamps) and you don't take offsets and DST values into account, you get all kinds of weird edge cases.

realharo · on Aug 26, 2016

No, the point is that you should not be doing those thing manually, i.e. you should never be adding integer offsets to something or doing similar operations. Instead, all the rules for conversions between times are already stored in the timezone database, so all you should do is something like

    ToAbsolute(CivilTime, TimeZone)

and the reverse. He also mentions (at 26:50) proper ways of dealing with repeating and non-existent civil times close to DST transitions (and sane default ways if you just don't want to bother).

When talking about JSON or serialization formats specifically, none of the complexities need to ever leak into the representation.

masklinn · on Aug 26, 2016

> Instead, all the rules for conversions between times are already stored in the timezone database, so all you should do is something like

That only works if you're completely detached from the user and don't care for them. Example:

On January 1st 2011, a Samoan user living in Samoa (timezone Pacific/Apia) records an event for January 1st 2012. You convert January 1st 2012 09:00:00 to UTC, storing 2011-01-01T20:00:00.

On January 2nd 2012 at 10AM, you remind your user that they had an event set.

Because in May 2011 Samoa announced they were going to skip a local day and move across the international date line. So 2011-12-30T09:00:00 UTC was 2011-12-29T23:00:00 Pacific/Apia, but 2011-12-30T10:00:00 UTC was 2011-12-31T00:00:00 Pacific/Apia.

And as far as your user is concerned, they told you to ping them on January 1st at 9AM and you pinged a day late. Just because you store absolute datetimes doesn't mean you won't fuck up, and when the data is user-provided, chances are as good that that decision will be the one fuckup.

realharo · on Aug 26, 2016

That still doesn't mean that you should do any of those operations by hand (which is what the previous comment was about).

But sure, if you do scheduling for future times, you do need to be aware of such possibilities and store the future times as civil times (and have some sane way of handling non-existing/repeating times - but again, most of the time the system/library will do this for you).

Then even if the user is flying across the world, they can still get the alert at the right time wherever they are (assuming the device updates local timezone based on location).

I never said that you should only store absolute times - only that they are a separate data type and you shouldn't mix them or try to convert them by hand.

andrewaylett · on Aug 26, 2016

+1 serialise things without the processing complexities.

In general, an event is either scheduled in the future relative to a specific physical location, in which case you want to record it with a symbolic timezone reference (in case the TZ DB changes in the meantime), or it's timed absolutely, in which case you want to record it as UTC.

The trick is knowing which events are which. If you're scheduling something for a machine, probably just use UTC. If you're machine-recording the time of an event, even if it's based on a human pushing a button, use UTC and convert back to local time for display if necessary. If the user told you what time they wanted to store, store it as local time with a symbolic timezone.

Where it gets really difficult is interoperating with systems that don't do things properly, or working within systems that don't let you do things properly.

espadrine · on Aug 26, 2016

> The only reliable time stamp is UTC without leap seconds.

That doesn't make a lot of sense, as UTC does have leap seconds. It is similar to saying that the fastest car has no wheels, when you really mean that the fastest vehicle is a rocket.

TAI is the most reliable and easiest to work with. It relies on atomic clock seconds at sea level. https://en.wikipedia.org/wiki/International_Atomic_Time

However, it already has a difference of roughly 40 seconds with UTC (and therefore civil time), and dropping leap seconds in civil time will shift midnight to later in the day.

But as the frequency of leap seconds rapidly increases, maintaining UTC will become harder. They will consider dropping leap seconds from UTC in 2023. It is unlikely that people care about having the sun rise at midnight in 30000 years.

thymelord · on Aug 26, 2016

> That doesn't make a lot of sense, as UTC does have leap seconds.

Pedantic much? Posting in technical forums is such a bother. People pick at any unimportant detail.

Reworded just for you:

The only reliable time stamp would be something like UTC but without leap seconds.

espadrine · on Aug 26, 2016

I am sorry. My intention was not to sound pedantic. I only wanted to point out that we are fortunate enough to already have what you describe.

I too wish we used TAI-based timestamps instead of Unix timestamps in a more widespread manner!

It is a bit silly that we cannot determine any duration involving a UTC time a couple of years in the future…

thaumasiotes · on Aug 26, 2016

> (Speaking of leap seconds, who thought seconds going from 0 - 60 rather than 0 - 59 was a good idea?)

Who thought February going from 1-29 instead of 1-28 was a good idea?

I don't understand why everyone seems to believe that the phenomena are so inherently different.

pwfisher · on Aug 26, 2016

Because other months went to 30 or 31 already, so there was room for it in calendars and people never assumed months are always the same length.

60 seconds in a minute, OTOH, is a very strong assumption in many systems and people's minds.

thaumasiotes · on Aug 26, 2016

> other months went to 30 or 31 already, so there was room for it in calendars and people never assumed months are always the same length

I really don't think this has much to do with it. It would predict no problems trying to add a November 31 to some year; I tend to make the opposite prediction.

sla29970 · on Aug 27, 2016

Remember February 30 1712? https://en.wikipedia.org/wiki/February_30#Swedish_calendar

wtbob · on Aug 26, 2016

I think the idea about removing whitespace is kinda hilarious, because that would mean one could write:

    ["a" "b" "c"]

or:

    {"a": "foo" "b": "bar" "c": "baz"}

The advantage over:

    (a b c)

or:

    (a foo b bar c baz)

or:

    ((a foo) (b bar) (c baz))

or:

    ((a . foo) (b . bar) (c . baz))

seems … non-existent.

tarnacious_ · on Aug 26, 2016

I don't think # or // for comments is a very good idea as it would also make newline characters significant. I find it useful to be able store a JSON object per-line.

sanqui · on Aug 26, 2016

Personally, I would really like to see integer object keys (as opposed to only string keys). For simple numeric transformations, strings feel really heavy and require annoying conversion in languages. E.g. {"10": 60, "42": 2}.

niftich · on Aug 26, 2016

The flipside is that an integer-keyed map is similar in meaning to an array, which associates by virtue of placement, an integer with the value sitting at that index.

While it's possible to spec it to forbid this interpretation, Lua has made this interpretation a language feature, and it'd become impossible to construct an unambigous parser/printer in Lua for this new format.

thaumasiotes · on Aug 26, 2016

> it'd become impossible to construct an unambigous parser/printer in Lua for this new format

How so? JSON is just text; you can parse it however you like. For example, I can write a parser that produces the byte sequence "It checks out." for any valid JSON input. A Lua JSON parser that represents objects with integer keys as maps with string keys, and a Lua JSON parser that represents objects with integer keys as sparse arrays, are both unambiguous and both correct (as to a hypothetical JSON which allowed integer keys). JSON is a format, not a Lua structure.

Even if your JSON data is

    { 10: 10, "10": "ten" }

there's no problem being able to write an unambiguous parser. Define what your parser does in that situation, and it's unambiguous.

fiatjaf · on Aug 26, 2016

This guy is an idiot anyway. There's no way to "fix JSON". All you can do is create a new language, it doesn't matter if you call it JSON 2.0, it will still be incompatible with all the JSON parsers of today. I don't get why he is so mad at people suggesting him to use one of the JSON supersets that exist today.

willvarfar · on Aug 26, 2016

If // and /* are used as comments, then most of this new extended-JSON will still be valid Javascript.

If # is used as comments, then this breaks documents being Javascript.

The post says that "don't eval() JSON ever", but that's like Crockford leaving out comments originally in order to stop them being abused as processor directives...

daenney · on Aug 26, 2016

Like the post says, JSON is already not guaranteed to be valid JS so this isn't really a problem. The fact that 99% of the time it works to just eval it is great and granted, the "feature" that triggers this is incompatibility is a bit obscure.

But if you just do the right thing from the start you'll never have a thing to worry about in the first place, # comments or not.

abofh · on Aug 26, 2016

This is a bit of a strawman argument though. JSON is rarely used in a generic form - but instead as a format between endpoints (servers, clients, whathaveyou). When you control the server, you control the API - so random UTF-8 magic crap? Wasn't valid in the first place.

Just because you can express things that aren't javascript with it doesn't mean that ever actually turns up in practice -- more importantly 95% of the time, you control both the producer and consumer of JSON because it's consumed internally anyhow - at which point - you're not _going_ to write illegal json to yourself.

This is a true statement, but virtually nothing in practice cares.

josephg · on Aug 26, 2016

I'm jealous. I've had it show up in production. I think we were using script tags with sanitized JSON to re-hydrate an isomorphically rendered page.

Of course, a user entered one of the valid json, invalid JS characters in a field, which made its way into our database. Once there we started having weird errors show up between the frontend and client. That little gem took us days to track down.

JSON should be either a subset of javascript, or obviously not a subset of javascript. 99% compatible systems are dangerous landmines. That you haven't been blown up yet doesn't make it ok.

velox_io · on Aug 26, 2016

JSON (for the most part) is a nice format to work with, aside from loosely defined datetimes as mention.

The two areas where I believe the format can greatly be improved; 1# having a standard to define the structure (sometimes schemas can be handy!); 2# a stranded binary format, yes right now with have UBJSON (which doesn't have a date format, this is worse in binary) and BSON (which contains some MongoDB specific stuff).

I'm not saying they don't have their place, but.. Protocol Buffers are more akeen to .net or Java serialization, in the they're quite fragile if used with different versions and/ or with different vendors.

thymelord · on Aug 26, 2016

MessagePack is better supported than UBJSON and has far more implementations in almost every language.

http://msgpack.org/index.html

JSON Schema draft 4 is the defacto schema standard for JSON.

http://json-schema.org/latest/json-schema-core.html

Having used both XML schema and this one, I much prefer using JSON schema.

fiatjaf · on Aug 26, 2016

    “Just use X” · For values of X including Hjson, Amazon Ion, edn, Transit, YAML, and TOML. ¶
    Nah, most of them are way, way richer than JSON, often with fully-worked-out type systems and Conceptual Tutorials and so on.

What? MOST OF THEM? YAML is not, Hjson is not, TOML is not.

gengkev · on Aug 26, 2016

YAML... really? Looking at the examples in the Wikipedia article (https://en.wikipedia.org/wiki/YAML) gives me a headache. Fortunately, most actual YAML files I've seen are not that complicated.

fiatjaf · on Aug 26, 2016

https://en.wikipedia.org/wiki/JSON#YAML_sample

niftich · on Aug 26, 2016

He summarized the most upvoted posts from the last thread [1] really well.

[1] https://news.ycombinator.com/item?id=12328088

Regarding datetimes, it's worth pointing out the conversation that TOML had about it. It's a pretty long read [2][3][4][5] with lots of points raised for and against, but it also shows some of the process of how consensus was eventually forged: through trial-and-error, some enlightening realizations, expert opinions, and a willingness to leave some aspects of the behavior it up to parser, to avoid requiring all other languages to reimplement half of Java 8 Time.

[2] https://github.com/toml-lang/toml/pull/414

[3] https://github.com/toml-lang/toml/pull/362

[4] https://github.com/toml-lang/toml/issues/412

[5] https://github.com/toml-lang/toml/issues/263

The salient point being that RFC 3339 does not in truth describe exactly one datatype, so you can't just reference the spec and hope everyone reads it the same way. EDIT: Specifically, RFC 3339 says:

"Date and time expressions indicate an instant in time. Description of time periods, or intervals, is not covered here.", but then goes on to define [6] a number of different syntaxes in ABNF, to indicate the subsets of ISO 8601 that "SHOULD be used in new protocols on the Internet." It essentially never defines what a 'valid' RFC 3339 object looks like, it doesn't explicitly say which ones are considered complete representations, so it's not clear if, say, '2016' is a valid RFC 3339 object... but the ones towards the bottom contain more than one discrete term, and can be presumed to be 'complete' representations. These are:

[A] partial-time: HH:MM:SS(.SSS)

[B] full-date: YYYY-MM-DD

[C] full-time: 'partial-time' +/- offsetFromUTC(HH:MM)

[D] date-time: 'full-date' "T" 'full-time'

Out of these, [D] is clearly a timestamp of an absolute instant in time, but the rest are debatable.

[6] https://tools.ietf.org/html/rfc3339#section-5.6

outsidetheparty · on Aug 26, 2016

> He summarized the most upvoted posts from the last thread [1] really well.

I feel like he glossed right past the objections to the biggest and (to my mind) most destructive proposed change, the commas-to-whitespace thing; in fact doubles down on it (let's just declare that commas are whitespace! That surely won't confuse anyone!)

thymelord · on Aug 26, 2016

I've written a few JSON parsers over the years that treat commas as whitespace. The grammar is simpler and the parser is faster as result. As long as one always emits standards-compliant JSON there's no problem.

Had the JSON standard supported ECMAScript array holes [1,,,2,,3] this grammar shortcut would not have been possible. But luckily that's not the case.

outsidetheparty · on Aug 26, 2016

Well, that's the thing, though: sure, for machine parsing it really doesn't matter what you use as a delimiter.

But the only reason to get rid of commas is to eliminate the trailing comma problem, which only occurs when hand-editing JSON. Replacing that with whitespace, or worse both whitespace and commas, would be a lot more prone to hand-editing errors, I think, than would the much less drastic change of allowing trailing commas. Or better still of just leaving JSON as is and letting people use a more robust protocol, if that fits their needs, or pre-parsing whatever special snowflake variations they want into standards-compliant JSON.

I'm more or less in agreement with the commenter on his site who said

> this entire proposal pretty much comes down to "I like JSON, but need more and am too lazy to write the extra 3 line wrapper to process type 'x'." I'd say no thanks. https://www.tbray.org/ongoing/When/201x/2016/08/20/Fixing-JS...