Hacker News new | past | comments | ask | show | jobs | submit login
The Order of the JSON (almaer.com)
252 points by tosh on Aug 15, 2019 | hide | past | favorite | 123 comments



Next installment: how flipping an innocuous switch introduced a subtle bug that was silent for years and cost us millions.

You can’t just flip that switch, run ”a client to post the JSON to that instance”, see that test “worked just fine”, and call it a day.

The POST might just store it, for later processing to wreak havoc (say by ignoring a value that isn’t in the expected place in the JSON), or only rarely seen JSONs might cause problems, or it might ‘only’ break the yearly run, etc.



How do you justify ever changing anything with that attitude?

This change made parsing more lenient which is generally considered fine [1] and made the software actually follow the spec, since it specifies object entries are unordered.

[1] https://en.wikipedia.org/wiki/Robustness_principle


”This change made parsing more lenient”

How do you know? It makes the outermost layer of the system accept more liberal in what it accepts, but that doesn’t guarantee that the inner layers can handle that more liberal content.

For example, the code may assume that “<user-ID>” always is the first node in the json. If you start sending “<car-ID>” instead, things may go fine until you get two customers who share a car.

”and made the software actually follow the spec”

How do you know? The spec of this piece of software may state it has a JSON-like API that requires the “<user-ID>” to be the first node in every request.

And yes, most of its code may handle that change fine, but it only takes one piece of code to break things.

”How do you justify ever changing anything with that attitude?”

In the case of ”a service running IBM DataPower Gateway which sat on top of WebSphere which sat on top of the COBOL.” where ”Much of the system was so old that it was hard to find anyone who knew how it actually worked, and it’s maintenance had been outsourced ”: very carefully.

This change may have been fine, but you have to check that.


The target culture would be to justify the change by saying "the reason this was implemented is X but X no longer holds". Anything else is an indicator of change without understanding; which is likely to lead to unforeseen results.

The robustness principle is a powerful guideline for interface design; not an excuse to turn off validations that someone else has put in place. If the person who designed the interface wasn't ready for something, principles won't make the software work.


The problem is that you have a complex interconnected system and you just flipped a switch that changed the entire protocol by which data was being transmitted.

OrderedJSON and JSON might seem compatible on the surface but are you sure you caught every little code path that might have been assuming OrderedJSON?

This isn't an issue of spec compliance or not because the data format wasn't JSON in the first place. OrderedJSON is not a subset of JSON even though it has the property that all OrderedJSON documents are syntactically valid JSON -- they represent different abstract values and JSON is the lossy interpretation.


> How do you justify ever changing anything

By rolling out the change over a period of time to a small sample of users and comparing the experimental effect to a control sample.


In other words: why GUI configuration systems should allow comments, just like text config files do.


Underrated idea. Never seen this, but I would kill for the ability to add comments to configuration settings or even log changes with commit messages.


I'm pretty sure it was OS X Server (or might have been NeXT) that had a feature where you could save the state of the server config GUI to a text file that you could check into source control and/or back up. Then you could load that file to get the GUI back to its previous state after you screwed something up.


It’s a next/macOS idiom to save user settings in the program’s defaults domain, which is a key value store backed by plist.


There was a show hn startup within the last month that did this for web based saas applications with chrome. Don't remember the name, though.


Yea, that was my big question as well. Was the switch just for that one end point or did he accidentally reconfigure how all the API endpoints in the entire company worked.


I wonder if the author documented his changes to the infrastructure (just like it was documented that the feature was enabled and could or couldn't be switched).


This sounds very familiar. A lot of companies are full of people who have no curiosity and no ability to think for themselves. I have seen it multiple times where someone claimed a change is impossible or takes insane effort. Then you have someone competent look at it and you have a solution in an hour. I think stuff likes this is the real price of not hiring really good people.


I had the same thing happen to me lately, but I was at the stupid end. I am not a programmer by profession, so it matters very little to me, but I had been struggling to produce a correct solution to a seemingly simple problem (writing a macro to allow definitions in expression context in r6rs scheme). My solution worked but had the side-effect rewriting obviously bad syntax into correct one and not in a good way.

I was struggling with how to do it correctly, and then one of the people in the r7rs working group simplified the problem by a very large factor by pointing out to me that I was trying to solve a non-existant problem, because I tried to add definitions to expression contexts where it simply made no sense. In fact, instead of supporting all forms, I could get a better result by just focusing on one of them and add simple wrappers for another 5.

It was all quite humbling. Had I taken a step back and actually analyzed the problem I would have come to the same solution, but I immediately tried the, to me, most obvious and also hard-to-get-right solution.


This is what pair programming fixes, among other things. You have someone who can literally take a step back to get a broader overview. It's quite nice from what I can tell (I do want to do more of it, but at work we don't have the problems that make the most fun with pair programming, and for side projects I lack friends who are willing and able to grasp the math behind it (nothing extreme, just on the level/around pose reconstruction and some simple (linear core) optimizations intertwined with (photogrammetry) domain specific data shuffling).


Such dumbness is very common with me too, and yet I also call myself a great problem solver, for good reason too.

It's not a paradox, it's a matter if excess focus - once I get into a rut I just keep bulldozing[0], but coming to a problem fresh it's often trivial because the rut hasn't formed.

I'd say it gets better with experience, and you're probably better than you think - my guess is you don't see your successes as clearly as your failures. I guess you spend so much time on the failures, but if you instantly perceive the right approach to a hard problem and solve it in a shot, well, it wasn't a hard problem, right? A kind of bias of perception.

[0]I get the impression that's a bit of a man thing generally


You had trouble writing a scheme macro? I think most pro software developers would struggle to! I don’t think this is dumb.


I do not have any problems writing macros (in fact, I recently finished my guile scheme version of rackets for loops) It is just that the macro I wrote did a lot more work than it had to, and led to weird syntax behaviour. I didn't delimit what I was trying to do. John over in #guile put me on the right track.


I didn’t mean to imply you had trouble writing all macros. But as someone who is trying to learn some c-lisp on the side I can see why they can get a lot more tricky than “regular” coding that one might do say in Java. So the point is you are probably doing something more advanced there.


A very humane telling of a humbling experience.


Great engineers learn from mistakes, and it seems like you did. Formal training or not, we all make mistakes, but indeed a good approach is to take a step back and analyse _any_ issue or complex problem before implementing it. Sometimes prototyping separate from the main codebase helps.


All problems are easy when you already have the solution or pieces to build the solution. Real world problems vary wildly and you may find someone who can solve one problem trivially while they struggle on something else that seems simple.

This condescending perspective on problem solving is part of the problem with this entire industry and ultimately turns people into tools for disposal by businesses.


I didn’t want to be condescending although my comment sounds like that. Usually I give people the benefit of doubt but I just see a lot of people who are really not interested in the job. This seems to happen a lot when a lot of stuff is outsourced so no one is owning anything. People who want to solve problems usually leave and you are left with paper pushers who schedule a lot of meetings and offshore developers who don’t know anything about the history of the project.


What I see far more often is people who think they're competent coming along and giving you a solution in an hour. What they don't tell you is that their solution assumes that everything works the way they think it does, and when it inevitably doesn't it takes another 5 people 5 weeks to design and implement a decent solution that actually works. But of course by this time the original rockstar has pissed off to something new and doesn't need to clean up their own mess.

I'm not bitter at all...


I worked with one of these once. Incredibly frustrating. You could propose a change that would have massive impact on the maintainability and readability of the codebase. "But how can you know if it will work?" Like, we tested it. We used our brains to evaluate potential likely side-effects and vetted the relevant portions of the code. We ran in it development environments for a month. I am not some junior dev trying to fix the world, I am arguing what I believe to be a pragmatic, narrowly-targeted, risk-balanced change with positive expected outcome. But that wouldn't satisfy him, it had to be completely risk-free.

(And if things did go wrong, we had pretty much the ultimate backup plan: revert!)

It gradually dawned on me that his own code (which I thought was generally low quality) did generally reflect his values here: it was write-once. Any subsequent change just layer on and patched around.

> I think stuff likes this is the real price of not hiring really good people.

Yup. And I still am not sure I'm good enough to tell them apart in hiring without asking questions that the candidate will just tell you what you want to hear.


I think stuff likes this is the real price of not hiring really good people.

I think stuff like this is why decent developers don't become "really good" when working in these environments. If you have a culture of letting people figure things out, and forgiving mistakes, then you'll end up with better engineers. On the other hand if you have a culture of protecting "territory", and blame then you'll get 6 months for 9 people to check a box.


I think it's less about forgiving mistakes than it is about having the ability to recognize the nature and scale of problems and distinguish between appropriate and inappropriate potential solutions. That ability gets lost in layers of management, and the people who are tasked with rewarding the folks that actually create the solutions often don't have it.


“I think stuff like this is why decent developers don't become "really good" when working in these environments”

Totally agree. I feel bad for a lot of the young devs who are exposed to daily micromanagement and stand ups and the pressure to constantly deliver something. They don’t get the opportunity to make mistakes and learn from them.


Unfortunately, there simply are not enough "really good" people around, so they'll have to make do with the likes of me ;-)


Oh, don't worry. If I'm capable of improving, so are you ;)


I have been `the competent guy` this week. It took me two years to realize how deep the willingness to do nothing is for the IT guys here. I arrived as usual in the morning last week. An employee told me the notification system was broken and wouldn't be fixed before next year. I told her I would implement a fix before the end of the day. I got it done. The IT guy still has his public servant job and I am told to write a note to justify the implementation of the fix (I am not a public servant and not a contractor, I work for a private public LLC that is managed by a regional authority).


But this fix only (probably) works by accident. The problem is that flipping the switch makes the error go away in this system but ignores all the work of combing over the rest of the consumers of these documents to make sure they aren't subtly relying on the fact that the system was returning OrderedJSON instead of JSON.


(I've used both DataPower and COBOL - I've got a healthy respect for robust, long-lived legacy systems).

I must admit I was scratching my head on this.

The JSON spec might not specify order, but the serialized JSON is ordered by nature. JWT needs it for example (and I'd assume many signature models). Or you might have a caching layer that needs it. Maybe unchecking this causes the legacy backend to get hammered? There are valid non-spec concerns with order.

Replies here seem to assume stupidity here. It's a valid reason, but it's not the only one. Equally, the author doesn't ask "why" - why would it take so long, why was that option enabled?


I can't help but think of the story of Chesterton's Fence [1].

>There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, 'I don't see the use of this; let us clear it away.' To which the more intelligent type of reformer will do well to answer: 'If you don't see the use of it, I certainly won't let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.'

[1]: https://en.m.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fen...


If you require serialized JSON in a specific order, you are not requiring JSON, but some dialect of JSON and that should be clearly communicated. If, for some reason, a client needed a special serialization, i'd ask them for a spec, because the JSON spec does not apply anymore, and be done with it.


I think my argument is a bit different -- sure, this is an interface (maybe to a COBOL copybook in this case?). And yes, it's not "JSON".

Agree it's a leaky abstraction. Equally agree it should be documented. Often legacy systems have weirdness after seeing decades of edge-cases. Weirdness that makes them robust in all sorts of unlikely ways. Equally makes them poorly documented, leaky, opaque, and frustrating.

But I certainly have a lot of pause with the sentiment of "some idiot checked the JSON order checkbox on DataPower". My first thought is instead "I wonder why someone thought this was necessary."


Yes, that sentiment is highly unprofessional. 9x6 months translates to "that box is checked for reason" and "it's not going to change". Even if things are going to change, it's probably going to affect one function on your part and you are producing valid JSON anyway. Surely you can bill a fortune 500 company for a function to put things in order.


>I think my argument is a bit different -- sure, this is an interface (maybe to a COBOL copybook in this case?). And yes, it's not "JSON".

If it's not JSON, it's undefined when it uses JSON tools and libraries, and anything goes.

If it's not JSON, document it as your own dialect, and use your own tools, and try to never talk to the outside world based on it with systems that treat it as JSON.

>But I certainly have a lot of pause with the sentiment of "some idiot checked the JSON order checkbox on DataPower". My first thought is instead "I wonder why someone thought this was necessary."

It might not be, it could be the BS default setting...


Money down 3 months later:

1) the gateway or more likely something behind it starts crashing for memory errors when someone is serializing some stupidly large array into it

2) using ordered vs non-ordered keys on input changes the behavior and fixes it

3) enabling this checkbox to prevent it fixes the issue and guards against it happening again

4) everyone has forgotten about this web weenies inquiry into the issue

5) due to #4, it takes like 3 weeks of system crashes and hairy debugging to find the cause

6) this guy is long gone, having ridden off into the sunset of smugness

Also:

1) fixing the serialization-dependent memory allocation 4 layers deep on the back-end to allow things to operate safely without this checkbox requires a complex change to several other components and associated system-wide validation testing which would take: drumroll "9 people 6 months to complete"

2) What is the point of this javadoc reference and how exactly does it relate to the issue at hand? Here's a random code doc reference too!: https://pymotw.com/3/collections/ordereddict.html and here also a discussion about impacts: https://mail.python.org/pipermail/python-dev/2016-September/...


Money down:

1) Nothing happened, this already went down years ago, as the author writes.

2) The setting was only stopping the BS IBM system layer, wasn't really needed elsewhere.

3) The BS setting was probably even default, e.g. not consciously enabled by someone for the specific system.

>What is the point of this javadoc reference and how exactly does it relate to the issue at hand?

The point is that the IBM system who had this setting was using that BS, brain-dead, not-JSON, implementation to handle the order, when the setting was enabled.


> JWT needs it for example (and I'd assume many signature models)

You almost never need canonical representations for signing things. I would even say that if you need a canonical representation to sign your things, then that is a design smell of your cryptographic protocol.


You don't _need_ canonical representations for signing, but then you can't let go of the representation used for computing the signature. What is the argument against a requirement to only sign canonical data?


Could you please care to explain more? What is the "design smell", and what is the alternative solution to canonicalization?


I didn't mention canonicalization. My point is that serialized JSON is ordered - which I think is exactly the same property you're referring to.


That's an implementation detail. Serialized JSON can also be printed, but your systems shouldn't depend that JSON is always ink on paper.


>The JSON spec might not specify order, but the serialized JSON is ordered by nature. JWT needs it for example (and I'd assume many signature models). Or you might have a caching layer that needs it

Whether the file has an order is an implementation detail. There might be no static file at all, for example, it could be an endpoint returning you the JSON, and it could return different orders for the exact same items every time you call it. That would still be perfectly valid.

If you "need it", don't put your data into an object, put it in a list of objects.

If the underlying layers need "ordered objects" (in other words, ordered hashmaps) they don't really use JSON.


This isn't really an issue of spec because these systems recognize a completely different protocol 'OrderedJSON' which interprets {} as an ordered list of key-value pairs instead of a set.

The mistake is assuming that two syntactically compatible protocols were actually the same protocol. uint =! int.


Only there's no such protocol. It's some adhoc implementation within a single system (or a single IBM ecosystem).


Justgoogled DataPower, out of curiosity, and I found XML Accelerator XA35. I did not know there is HARDWARE for XML PROCESSING! Blows my mind.


Yep, back in the SOAP days we used to have a lot of XML flying around that would need to be converted into one format or another via XSLT. You can certainly run XSLT on a piece of XML in your own code, but having a dedicated piece of hardware do it for you with a slick GUI was always an easy sell for IBM to management.


Could you elaborate? I am really curious how this hardware worked.

Was it a HTML Gui with file upload and everything? Or just some server, port and proprietary protocol?

Maybe these things are on Ebay for cheap. Oh, that looks like fun.


Impressed to learn about this hardware for XML processing. I came across an old article that mentioned the price started at $55K!


No idea why someone needs ordered key-value-pairs. I had the please of working with a device that relied upon ordered query string parameters. Why? Because they encrypted the values with RC4.


> JWT needs it

I already had at least four solid reasons not to use JWT. Still, adding another just keeps the dumpster fire burning.

By which I mean, in the most oblique and backhanded fashion, that there’s not going to be any reason for validating JSON order that doesn’t ultimately reveal or confirm a design gaffe somewhere down the rabbit hole.


>I got to learn that we were talking to a service running IBM DataPower Gateway which sat on top of WebSphere which sat on top of the COBOL.

As soon as you read IBM you know you have bigger problems than COBOL in your system.


> How does world not break due to technology more often

My own theory is that most (not all) of technology is there to support BS jobs and they are irrelevant for the normal functioning of society. It simply doesn't matter when (most) technology breaks.


I agree the system described is crazy. However, I find it frequently useful to use ordered JSON as a data format and I think it would be handy if more languages supported it. For one, it makes it a lot easier to write integration tests using a “golden file” of ideal output, because your program that outputs JSON now usually deterministically has one correct output. For two, it lets you hash a json-encoded object deterministically.


OK. I totally prefer ordered JSON, because it is so much easier to eyeball - to visually compare JSON with different ordered keys is quite a lot more difficult (O() complexity?) than if they are in the same order. It also enables diff to help see where the differences are (diagnosis, not just binary identical or not).

And, in fact, I do use ordered JSON for comparison in testing, as you describe.

However... comparison of JSON as objects (i.e. in memory) is order independent. Hashcode is also order independent (the trick is to sum the elements' hashcodes e.g. https://docs.oracle.com/javase/7/docs/api/java/util/Set.html...

Diff for ordered JSON is possible using longest common subsequence for trees, but has terrible complexity, and lacks diff's clever optimizations both general and specific to typical input.


+1 for all reasons above for ordered JSON, highly convenient in practice.

And if it doesn't impact performance significantly, these are all pretty good reasons for JSON outputters to default to sorting objects deterministically by keys, or at least to provide a flag to do so. (Even if there's no canonical sort order between JSON libraries, all that matters is it's deterministic for each library.)

BUT... I can't imagine any scenario where you'd want to validate that JSON content was ordered on the input, which is what was enabled in this article. Why does IBM even have that as an option?!

Be strict in what you emit and liberal in what you accept, and all that...


I guess because OrderedJSONObject is ordered, not sorted. Like java's LinkedHashSet, it maintains the order keys are added.

If you're going to rely on a specific order for comparisons, it makes sense to alert the user to any JSON in a different order (instead of silently, liberally accepting it), or you'll get false negatives elsewhere. Easier to check for a sorted order, but also possible to define a specific order. IDK what IBM did here.

funfact: jq used to sort keys; now it retains ordering.


Indeed, and remember this JSON message is going to a mainframe. Mainframes don't have much memory and typically process record-by-record, or event-by-event. So the implemenations probably streams the JSON in and constructs COPYBOOK from the payload before continuing to invoke the cobol.

So the rework time might be to write a general purpose re-order layer that can re-order any imcoming message.


> any scenario where you'd want to validate...

Because it's a precondition for something else down the line (a dependency)


Please XOR your hashes instead of adding them! If you add them, you're losing bits on the low end. EDIT: No you're not. It feels like you should be, but with unsigned overflow, this actually works just fine.

This assumes of course that you're using proper hashes that make use of the full domain of the output type (a proper hash will have a 50% chance of any arbitrary bit being flipped by any change to the input). But if you're not using proper hashes, you're doing something wrong.


Can you please explain this assertion? ;)

If I have a 32 bit current hash value-- for any possible 32 bit value I add, I get a different 32 bit value out.

XORing is effectively adding each bit and throwing away the carry bit. Adding just cascades carries to the left.


You know what, you're right. I made a knee-jerk comment but I didn't think it through all the way. From any arbitrary unsigned 32-bit integer, every other unsigned 32-bit integer is reachable with a single addition. Therefore addition works just fine here.

It still feels wrong to say this, it feels like since adding will effectively shove bits off the high end and drop them on the floor that you're losing information, but I can't actually justify that feeling with reasoning.


Adding is actually considerably better. For high quality hashes XOR is just as good; but if there's any distributional problems at all in the hash, adding mixes stuff more.

(XORing is effectively adding with all of the carry information lost/falling off).


Please note that using a sum of hashes almost certainly weakens any cryptographic guarantees you may expect from your hashes. Of course, that may be fine depending on your use case.


Ordering is part of the view, then, not the model. This is conflating different abstraction layers.

Similarly, sometimes it might be useful to store some data in a different character encoding, or maybe multi-character glyphs are stored in a different normalization form. (I can't tell "ü" from "ü" just by looking.) Or maybe your sample data has 1.0 but your program generates 1.000. There's a million ways that serialized structures can be functionally identical but quite different. Easy: don't compare raw bytes.

If you want functionally-equivalent data to be accepted, you just need an equality-tester (and hashing algorithm) that's agnostic to such issues. They're not hard to write.


you don't need an ordered JSON for this. Just make sure that your equality tests ignore ordering, or that your and hash functions will hash two JSON objects with the same key but different order to the same hash.


if you need order shouldn't you be using an array?


makes sense. As a workaround, I'd read both, convert to a comparable object and then compare


> Much of the system was so old that it was hard to find anyone who knew how it actually worked, and it’s maintenance had been outsourced to some of the typical IT outsourcing companies of the time.

Some years ago, I worked on an antitrust case involving a firm that had outsourced relevant systems. Multiple times, to different IT firms. And there was literally nobody left who knew how they worked.

After considerable negotiation, they agreed to provide documentation. And what that ended up being was a report by IT company 2 about their understanding of what IT company 1 had done with the firm's systems. Because, I gather, IT company 1 had evaporated.

And yes, the core of it was COBOL.


Ha! Stephen Dolan had to add this to jq quite sometime back, making it preserve object key input order on output. And yes, it's infuriating, but it's also somewhat convenient, and yes, there really is software out there that cares about object key order (sigh).


Doesn't anything using jwt depend on a specific order?


It shouldn't, unless you base64 decode the header, then parse it with a library that causes the order to change, encode it again, and then use your own re-encoding to calculate the signature:

  HMAC-SHA256(
    b64(reencoded_header) + '.' + b64(payload),
    secret
  )
You should really just verify the signature for the provided header + payload in their base64 encoded form.


Ha. Creating a layer on top of broken APIs is an thriving business. Every single carrier company has a broken API. That's why companies like aftership exist.

One example I struggled with for days recently was USPS. Not only they use xml in the url parameter, the order of the elements also matters. Unfortunately, the order in the documentation is incorrect.


Hell even Amazon's ec2 spec is confusing in some places (the boto python call parameter doesn't always match the restful call, which is less documented, leading to errors in some libraries).


I wrote a json serialization/deserialization library for C++ and if you provide the members in the order they are specified in the type you get better performance. It can construct the class without having to bounce the parser back to that location and it is much much more cache friendly.


I also would much rather write code that doesn't quite do what it's supposed to if it's easier to do and I can take an extra day off. However, that's not my job.


It's an interesting problem. So if all the parser does it put them into a variant like structure, sure whatever. But when you go to put them into the reified data types you would be wasting a lot of resources to parse JSON to an intermediary data structure and then request the members. I parse them directly to their final classes. So I was left with a choice and both have a cost. Parse the json in-order of the file and store the concrete type to grab it later, or store the position/size of that part and move to the next constructing them in the order needed. This was cheaper. It will parse the JSON in whatever order, just the performance can be impacted by the data ordering as many parsers are.


Is your parser general-purpose? Is it template-based? Is the performance variance due only to the impact on the processor's cache or are there other factors? Is the code open-source, maybe I could just look myself?


It is template based so that each types JSON parser is statically known and to give the compiler more opportunity to optimize. https://github.com/beached/daw_json_link

I haven't had a chance to look at why yet.


I just recently dealt with a JS library that was expecting object properties to be in a certain order ("works" in ES2015 [1]), and that object was loaded via JSON. Was a huge pain getting the object in the required order.

[1] https://stackoverflow.com/questions/5525795/does-javascript-...


How were they going to charge him for 9 FTEs for 6 months to uncheck a checkbox?


Perhaps they didn't know about the checkbox and thought they had to write a custom solution. It could also be that they knew about that really subtle bug that is caused by un-checking that checkbox and thus understood why that didn't actually solve the problem long term if though it looked like a good solution at the time.

Or perhaps they're so used to getting away with quoting 6 month for a 6 hour job, it never crossed their mind not to.


Be cause the checkbox only removed the validation error. Who know what other errors will be encountered when the cobol can't read the incoming messages, which might or might not be in the right order?


Write some software that reordered it probably.


Yep. Probably IBM Data Transform Services™️ installed in the cloud plus a custom plugin written to do the ordering. That’s a few folks to deploy and configure the server. A few more for the engineering. Oh and there’s probably a support contract to support and maintain this new “solution”.


For what it's worth, I do wish it was trivially easy to customize things like error log output, so that I could say that I _always_ want the timestamp first, followed by severity, followed by whatever else. Our logs are annoying to parse when debugging things locally.


all the modern logging libraries does this now i think.

and with app like fluented, we can decorate it with whatever metadata we want (instance name, machine type, container name, etc..)


Writers should produce sorted maps, readers need to accept unsorted maps.

It's a security issue, not just convenience. With unsorted maps the internal hash seed can be exposed, together with timing information.

Another famous omission from the specs.


I’m no security expert. How is “exposing the hash seed” a problem for the vast majority of applications? What timing information would be leaked and why would that be problem?

On the other hand, accepting unsorted maps seems like it could introduce covert channels?


> How is “exposing the hash seed” a problem for the vast majority of applications?

They can be DOS'ed.

> What timing information would be leaked and why would that be problem?

You misunderstood. By 1. leaking the order, and 2. by checking the timing of getting certain keys of a map you do have two independent infos to get at the secret seed.

> accepting unsorted maps seems like it could introduce covert channels?

A covert channel might be exposed by the sender, if he uses some non-random but unsorted map order. The receiver needs to accept any order. There's no risk in accepting unsorted maps. The only risk at the receiver side is another famous omission from the spec: How to deal with duplicate keys. Accept (overwrite or drop) or reject? All 3 cases can be seen in the wild.


https://bugzilla.redhat.com/show_bug.cgi?id=750555

This led to hash randomization on by default since Python 3.3.


Cool. Thanks.


This makes my head hurt. I wonder if you end up with sorting differences depending on the locale?


It's not sorted, but ordered: The order emitted is the order entered.


I built the first popular sports "app" for the original iPhone, back when Jobs claimed that there was no need to write native code.

The backend polled an API that served XML updates on game scores etc. Note that I didn't previously know anything about baseball (and I still don't, not really).

So let's say that a baseball team scores a Double. We'd see some XML like <doubles><double/><double/></doubles>.

Now, suppose a team scores a tripple... You know that there's a <triples></triples> entity. What would you expect to see inside the node?

If you're me, you'd expect to parse <triples><triple/><triple/></triples>.

I got the call during dinner. There was an important game happening, and the app suddenly broke. People were uptight. They loved our app and they were complaining.

In the end, it turns out that we would need to process <triples><double/><double/></triples>. Why? "Oh, it's always been that way." (Says the brusque developer at the service charging $50k/month for access to this feed.)


recent twitter thread referenced in the article: https://twitter.com/therealfitz/status/1161349619659530242?s...


It's not sorted, but ordered: The order emitted is the order entered.


Sorry, that was intended to be a reply to the prev. comment.


The tweet at the top of the article says

> This so far out of the spec it makes my ankles hurt.

This is not in fact out of spec. The JSON spec does not define semantics here but instead quite explicitly leaves it up to the JSON processor and data interchange spec for what to do about ordering of objects.


> This is not in fact out of spec. The JSON spec does not define semantics [...] for what to do about ordering of objects.

What? It is literally on the first page of json.org and section 1 of RFC 7159 [1]

> An object is an unordered set of name/value pairs.

> An object is an unordered collection of zero or more name/value pairs

[1] https://tools.ietf.org/html/rfc7159#section-1


There are multiple standards. https://www.ecma-international.org/publications/standards/Ec... Is more relaxed.

>The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

Also it's pretty clear that the keys must have an order going over the wire. And JavaScript objects (which aren't unrelated to JSON have an order now) https://www.stefanjudis.com/today-i-learned/property-order-i...


If the objects are not guaranteed to be in order, I would always consider them unordered. You could if both the producer and consumer are ordered, but as the OP noted this can cause problems later on and for no good reason. I'd use an array of key-value tuples if I wanted it to be ordered.


Uh, "no good reason" in this case means "I unchecked the box and nothing bad happened right away" which amounts to shallow analysis, to put it gently.


That was a twist ending, indeed. I was expecting a resolution, not "I gave it a good whack and guess what, that fixed it."


I've been asked by architects to order JSON properties. It's fine until you depend on it. i.e. accessing a map as you would an array

https://fasterxml.github.io/jackson-annotations/javadoc/2.3....


I have written an app for the menu of my university's canteen. The JSON API that I used returned the meals as fields of a document where the order of the fields was the actual display order. It took me a while to even find an implementation of a JSON parser that keeps the order of the fields.


I would assume that any decent JSON parser has at least the option to keep the order of the fields. Especially for CLI tools that automatically add fields to a JSON file thats usually edited by a user e.g. package.json, I feel that it's absolutely crucial.


This is an omission from not only the JSON standard but also the JavaScript language, JavaScript has no map order and the rest is history.

The result is that all implementations are broken, f.ex. this is how a tree structure has to be implemented with JSON:

http://root.rupy.se/meta/user/task/eelzter/44953781393584543...

It's inefficient, ugly and just wrong.


Sounds like normal life to me. Please give me $$$$


"Why would you care what order we were sending the name value pairs for this?" The article makes it sound like it's somehow IBM's fault. However, questions about preserving order in hashes/associative arrays/maps/whatchamacallit span at least thirty years. Some applications of the format even demand it (e.g. ansible uses YAML - a superset of JSON - or even straight JSON for it playbooks; emitting them out of order is not an option).

Every time the issue surfaces it elicits a large amount of eye rolling among the cognoscenti. However, I'm a firm believer in the idea that recurring demands from the user base need to be addressed, rather then mocked. The coding community appears to me notably tone deaf on this.

'Course anybody can use <insert suitable technique> to send a JSON map in the desired order, except the parser on the other hand will blissfully disregard it. Or you can devise any ordered solution on both ends, which will leave you outside of the accepted standard and open you up to the situation described in the article (where the requirement may well have been frivolous).

I can remember very few - if any - instances of somebody implying that the standard should somehow make room for this kind of scenarios. The standard reply was "use a different format" or "change the requirement". (Somehow remembers me of people asking what can one do to have whitespace-preserving XML, and at least one amusing story about that)


Json has a perfectly fine order preserving key-value map:

    [["Key1", "value1"], ["key2", ...
You just need to deserialise it into a specific collection on the receiver. It's within standards and there are no weird parser issues.


That's a way. A way that, IMHO, vastly diminishes its expressive power on the reader's side, which I do not consider a very good thing. Most solutions to the issue are of this nature.

Consider serialized JSON already has a natural ordering, which is thrown away for some reason (probably parsing convenience).


> which is thrown away for some reason (probably parsing convenience).

It's got nothing to do with parsing. The json comes from JavaScript syntax, where objects represent unordered mapping. Json naturally does the same.

If you want expressive power for reading, json is very poor in comparison to pretty much everything else. Just use it for simple serialisation.


Granted, it came from there, but that was back in the days of map=eval(json) and they're gone. There is nothing in json the format (as opposed to json the language construct) to impose the unordered behavior.(Or, for that matters, the 'no comments' bit).


> There is nothing in json the format [..] to impose the unordered behavior

That's not true. The spec at http://www.json.org/ says "An object is an unordered set of name/value pairs".


Yes, that's in the specs. But what I meant is that there is nothing intrinsic to the format demanding unordered behavior.


I hear that. But also, JSON does not have a canonical serialisation. Other than object key ordering, strings have multiple equivalent formats (direct Unicode characters, \u00xx, \xXX, etc). Numbers have exponential form (1e2 is the same as 100). And white space is ignored - [2] and [ 2 ] and [2\n] are all equivalent.

If your application expects JSON input to conform to one particular set of decision points here, that’s fine and useful but you’re not using JSON anymore. You’re using a JSON-compatible subset. Document that new thing you’ve made and stop calling it JSON; because it’s not.


Agree. Except that the sheer number of questions about this spell "missed use case" - to my eyes at least. Sort of like the "How do I push a local branch to remote?" question in GITland.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: