The Order of the JSON

Someone · on Aug 16, 2019

Next installment: how flipping an innocuous switch introduced a subtle bug that was silent for years and cost us millions.

You can’t just flip that switch, run ”a client to post the JSON to that instance”, see that test “worked just fine”, and call it a day.

The POST might just store it, for later processing to wreak havoc (say by ignoring a value that isn’t in the expected place in the JSON), or only rarely seen JSONs might cause problems, or it might ‘only’ break the yearly run, etc.

smacktoward · on Aug 16, 2019

Also known as Chesterton’s Fence: https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence

dtech · on Aug 16, 2019

How do you justify ever changing anything with that attitude?

This change made parsing more lenient which is generally considered fine [1] and made the software actually follow the spec, since it specifies object entries are unordered.

[1] https://en.wikipedia.org/wiki/Robustness_principle

Someone · on Aug 16, 2019

”This change made parsing more lenient”

How do you know? It makes the outermost layer of the system accept more liberal in what it accepts, but that doesn’t guarantee that the inner layers can handle that more liberal content.

For example, the code may assume that “<user-ID>” always is the first node in the json. If you start sending “<car-ID>” instead, things may go fine until you get two customers who share a car.

”and made the software actually follow the spec”

How do you know? The spec of this piece of software may state it has a JSON-like API that requires the “<user-ID>” to be the first node in every request.

And yes, most of its code may handle that change fine, but it only takes one piece of code to break things.

”How do you justify ever changing anything with that attitude?”

In the case of ”a service running IBM DataPower Gateway which sat on top of WebSphere which sat on top of the COBOL.” where ”Much of the system was so old that it was hard to find anyone who knew how it actually worked, and it’s maintenance had been outsourced ”: very carefully.

This change may have been fine, but you have to check that.

roenxi · on Aug 16, 2019

The target culture would be to justify the change by saying "the reason this was implemented is X but X no longer holds". Anything else is an indicator of change without understanding; which is likely to lead to unforeseen results.

The robustness principle is a powerful guideline for interface design; not an excuse to turn off validations that someone else has put in place. If the person who designed the interface wasn't ready for something, principles won't make the software work.

Spivak · on Aug 16, 2019

The problem is that you have a complex interconnected system and you just flipped a switch that changed the entire protocol by which data was being transmitted.

OrderedJSON and JSON might seem compatible on the surface but are you sure you caught every little code path that might have been assuming OrderedJSON?

This isn't an issue of spec compliance or not because the data format wasn't JSON in the first place. OrderedJSON is not a subset of JSON even though it has the property that all OrderedJSON documents are syntactically valid JSON -- they represent different abstract values and JSON is the lossy interpretation.

tantalor · on Aug 16, 2019

> How do you justify ever changing anything

By rolling out the change over a period of time to a small sample of users and comparing the experimental effect to a control sample.

ken · on Aug 16, 2019

In other words: why GUI configuration systems should allow comments, just like text config files do.

ghc · on Aug 16, 2019

Underrated idea. Never seen this, but I would kill for the ability to add comments to configuration settings or even log changes with commit messages.

dagw · on Aug 16, 2019

I'm pretty sure it was OS X Server (or might have been NeXT) that had a feature where you could save the state of the server config GUI to a text file that you could check into source control and/or back up. Then you could load that file to get the GUI back to its previous state after you screwed something up.

kec · on Aug 16, 2019

It’s a next/macOS idiom to save user settings in the program’s defaults domain, which is a key value store backed by plist.

mooreds · on Aug 19, 2019

There was a show hn startup within the last month that did this for web based saas applications with chrome. Don't remember the name, though.

dagw · on Aug 16, 2019

Yea, that was my big question as well. Was the switch just for that one end point or did he accidentally reconfigure how all the API endpoints in the entire company worked.

johnchristopher · on Aug 16, 2019

I wonder if the author documented his changes to the infrastructure (just like it was documented that the feature was enabled and could or couldn't be switched).

Ididntdothis · on Aug 16, 2019

This sounds very familiar. A lot of companies are full of people who have no curiosity and no ability to think for themselves. I have seen it multiple times where someone claimed a change is impossible or takes insane effort. Then you have someone competent look at it and you have a solution in an hour. I think stuff likes this is the real price of not hiring really good people.

bjoli · on Aug 16, 2019

I had the same thing happen to me lately, but I was at the stupid end. I am not a programmer by profession, so it matters very little to me, but I had been struggling to produce a correct solution to a seemingly simple problem (writing a macro to allow definitions in expression context in r6rs scheme). My solution worked but had the side-effect rewriting obviously bad syntax into correct one and not in a good way.

I was struggling with how to do it correctly, and then one of the people in the r7rs working group simplified the problem by a very large factor by pointing out to me that I was trying to solve a non-existant problem, because I tried to add definitions to expression contexts where it simply made no sense. In fact, instead of supporting all forms, I could get a better result by just focusing on one of them and add simple wrappers for another 5.

It was all quite humbling. Had I taken a step back and actually analyzed the problem I would have come to the same solution, but I immediately tried the, to me, most obvious and also hard-to-get-right solution.

namibj · on Aug 16, 2019

This is what pair programming fixes, among other things. You have someone who can literally take a step back to get a broader overview. It's quite nice from what I can tell (I do want to do more of it, but at work we don't have the problems that make the most fun with pair programming, and for side projects I lack friends who are willing and able to grasp the math behind it (nothing extreme, just on the level/around pose reconstruction and some simple (linear core) optimizations intertwined with (photogrammetry) domain specific data shuffling).

tempguy9999 · on Aug 16, 2019

Such dumbness is very common with me too, and yet I also call myself a great problem solver, for good reason too.

It's not a paradox, it's a matter if excess focus - once I get into a rut I just keep bulldozing[0], but coming to a problem fresh it's often trivial because the rut hasn't formed.

I'd say it gets better with experience, and you're probably better than you think - my guess is you don't see your successes as clearly as your failures. I guess you spend so much time on the failures, but if you instantly perceive the right approach to a hard problem and solve it in a shot, well, it wasn't a hard problem, right? A kind of bias of perception.

[0]I get the impression that's a bit of a man thing generally

quickthrower2 · on Aug 16, 2019

You had trouble writing a scheme macro? I think most pro software developers would struggle to! I don’t think this is dumb.

bjoli · on Aug 16, 2019

I do not have any problems writing macros (in fact, I recently finished my guile scheme version of rackets for loops) It is just that the macro I wrote did a lot more work than it had to, and led to weird syntax behaviour. I didn't delimit what I was trying to do. John over in #guile put me on the right track.

quickthrower2 · on Aug 16, 2019

I didn’t mean to imply you had trouble writing all macros. But as someone who is trying to learn some c-lisp on the side I can see why they can get a lot more tricky than “regular” coding that one might do say in Java. So the point is you are probably doing something more advanced there.

ajna91 · on Aug 16, 2019

A very humane telling of a humbling experience.

trilila · on Aug 16, 2019

Great engineers learn from mistakes, and it seems like you did. Formal training or not, we all make mistakes, but indeed a good approach is to take a step back and analyse _any_ issue or complex problem before implementing it. Sometimes prototyping separate from the main codebase helps.

Frost1x · on Aug 16, 2019

All problems are easy when you already have the solution or pieces to build the solution. Real world problems vary wildly and you may find someone who can solve one problem trivially while they struggle on something else that seems simple.

This condescending perspective on problem solving is part of the problem with this entire industry and ultimately turns people into tools for disposal by businesses.

Ididntdothis · on Aug 16, 2019

I didn’t want to be condescending although my comment sounds like that. Usually I give people the benefit of doubt but I just see a lot of people who are really not interested in the job. This seems to happen a lot when a lot of stuff is outsourced so no one is owning anything. People who want to solve problems usually leave and you are left with paper pushers who schedule a lot of meetings and offshore developers who don’t know anything about the history of the project.

Sean1708 · on Aug 16, 2019

What I see far more often is people who think they're competent coming along and giving you a solution in an hour. What they don't tell you is that their solution assumes that everything works the way they think it does, and when it inevitably doesn't it takes another 5 people 5 weeks to design and implement a decent solution that actually works. But of course by this time the original rockstar has pissed off to something new and doesn't need to clean up their own mess.

I'm not bitter at all...

deathanatos · on Aug 16, 2019

I worked with one of these once. Incredibly frustrating. You could propose a change that would have massive impact on the maintainability and readability of the codebase. "But how can you know if it will work?" Like, we tested it. We used our brains to evaluate potential likely side-effects and vetted the relevant portions of the code. We ran in it development environments for a month. I am not some junior dev trying to fix the world, I am arguing what I believe to be a pragmatic, narrowly-targeted, risk-balanced change with positive expected outcome. But that wouldn't satisfy him, it had to be completely risk-free.

(And if things did go wrong, we had pretty much the ultimate backup plan: revert!)

It gradually dawned on me that his own code (which I thought was generally low quality) did generally reflect his values here: it was write-once. Any subsequent change just layer on and patched around.

> I think stuff likes this is the real price of not hiring really good people.

Yup. And I still am not sure I'm good enough to tell them apart in hiring without asking questions that the candidate will just tell you what you want to hear.

dec0dedab0de · on Aug 16, 2019

I think stuff likes this is the real price of not hiring really good people.

I think stuff like this is why decent developers don't become "really good" when working in these environments. If you have a culture of letting people figure things out, and forgiving mistakes, then you'll end up with better engineers. On the other hand if you have a culture of protecting "territory", and blame then you'll get 6 months for 9 people to check a box.

nlawalker · on Aug 16, 2019

I think it's less about forgiving mistakes than it is about having the ability to recognize the nature and scale of problems and distinguish between appropriate and inappropriate potential solutions. That ability gets lost in layers of management, and the people who are tasked with rewarding the folks that actually create the solutions often don't have it.

Ididntdothis · on Aug 16, 2019

“I think stuff like this is why decent developers don't become "really good" when working in these environments”

Totally agree. I feel bad for a lot of the young devs who are exposed to daily micromanagement and stand ups and the pressure to constantly deliver something. They don’t get the opportunity to make mistakes and learn from them.

dusted · on Aug 16, 2019

Unfortunately, there simply are not enough "really good" people around, so they'll have to make do with the likes of me ;-)

Piskvorrr · on Aug 16, 2019

Oh, don't worry. If I'm capable of improving, so are you ;)

johnchristopher · on Aug 16, 2019

I have been `the competent guy` this week. It took me two years to realize how deep the willingness to do nothing is for the IT guys here. I arrived as usual in the morning last week. An employee told me the notification system was broken and wouldn't be fixed before next year. I told her I would implement a fix before the end of the day. I got it done. The IT guy still has his public servant job and I am told to write a note to justify the implementation of the fix (I am not a public servant and not a contractor, I work for a private public LLC that is managed by a regional authority).

Spivak · on Aug 16, 2019

But this fix only (probably) works by accident. The problem is that flipping the switch makes the error go away in this system but ignores all the work of combing over the rest of the consumers of these documents to make sure they aren't subtly relying on the fact that the system was returning OrderedJSON instead of JSON.

jwilliams · on Aug 16, 2019

(I've used both DataPower and COBOL - I've got a healthy respect for robust, long-lived legacy systems).

I must admit I was scratching my head on this.

The JSON spec might not specify order, but the serialized JSON is ordered by nature. JWT needs it for example (and I'd assume many signature models). Or you might have a caching layer that needs it. Maybe unchecking this causes the legacy backend to get hammered? There are valid non-spec concerns with order.

Replies here seem to assume stupidity here. It's a valid reason, but it's not the only one. Equally, the author doesn't ask "why" - why would it take so long, why was that option enabled?

clusmore · on Aug 16, 2019

I can't help but think of the story of Chesterton's Fence [1].

>There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, 'I don't see the use of this; let us clear it away.' To which the more intelligent type of reformer will do well to answer: 'If you don't see the use of it, I certainly won't let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.'

[1]: https://en.m.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fen...

moby_click · on Aug 16, 2019

If you require serialized JSON in a specific order, you are not requiring JSON, but some dialect of JSON and that should be clearly communicated. If, for some reason, a client needed a special serialization, i'd ask them for a spec, because the JSON spec does not apply anymore, and be done with it.

jwilliams · on Aug 16, 2019

I think my argument is a bit different -- sure, this is an interface (maybe to a COBOL copybook in this case?). And yes, it's not "JSON".

Agree it's a leaky abstraction. Equally agree it should be documented. Often legacy systems have weirdness after seeing decades of edge-cases. Weirdness that makes them robust in all sorts of unlikely ways. Equally makes them poorly documented, leaky, opaque, and frustrating.

But I certainly have a lot of pause with the sentiment of "some idiot checked the JSON order checkbox on DataPower". My first thought is instead "I wonder why someone thought this was necessary."

moby_click · on Aug 16, 2019

Yes, that sentiment is highly unprofessional. 9x6 months translates to "that box is checked for reason" and "it's not going to change". Even if things are going to change, it's probably going to affect one function on your part and you are producing valid JSON anyway. Surely you can bill a fortune 500 company for a function to put things in order.

coldtea · on Aug 16, 2019

>I think my argument is a bit different -- sure, this is an interface (maybe to a COBOL copybook in this case?). And yes, it's not "JSON".

If it's not JSON, it's undefined when it uses JSON tools and libraries, and anything goes.

If it's not JSON, document it as your own dialect, and use your own tools, and try to never talk to the outside world based on it with systems that treat it as JSON.

>But I certainly have a lot of pause with the sentiment of "some idiot checked the JSON order checkbox on DataPower". My first thought is instead "I wonder why someone thought this was necessary."

It might not be, it could be the BS default setting...

cat199 · on Aug 16, 2019

Money down 3 months later:

1) the gateway or more likely something behind it starts crashing for memory errors when someone is serializing some stupidly large array into it

2) using ordered vs non-ordered keys on input changes the behavior and fixes it

3) enabling this checkbox to prevent it fixes the issue and guards against it happening again

4) everyone has forgotten about this web weenies inquiry into the issue

5) due to #4, it takes like 3 weeks of system crashes and hairy debugging to find the cause

6) this guy is long gone, having ridden off into the sunset of smugness

Also:

1) fixing the serialization-dependent memory allocation 4 layers deep on the back-end to allow things to operate safely without this checkbox requires a complex change to several other components and associated system-wide validation testing which would take: drumroll "9 people 6 months to complete"

2) What is the point of this javadoc reference and how exactly does it relate to the issue at hand? Here's a random code doc reference too!: https://pymotw.com/3/collections/ordereddict.html and here also a discussion about impacts: https://mail.python.org/pipermail/python-dev/2016-September/...

coldtea · on Aug 16, 2019

Money down:

1) Nothing happened, this already went down years ago, as the author writes.

2) The setting was only stopping the BS IBM system layer, wasn't really needed elsewhere.

3) The BS setting was probably even default, e.g. not consciously enabled by someone for the specific system.

>What is the point of this javadoc reference and how exactly does it relate to the issue at hand?

The point is that the IBM system who had this setting was using that BS, brain-dead, not-JSON, implementation to handle the order, when the setting was enabled.

blattimwind · on Aug 16, 2019

> JWT needs it for example (and I'd assume many signature models)

You almost never need canonical representations for signing things. I would even say that if you need a canonical representation to sign your things, then that is a design smell of your cryptographic protocol.

moby_click · on Aug 16, 2019

You don't _need_ canonical representations for signing, but then you can't let go of the representation used for computing the signature. What is the argument against a requirement to only sign canonical data?

atemerev · on Aug 16, 2019

Could you please care to explain more? What is the "design smell", and what is the alternative solution to canonicalization?

jwilliams · on Aug 16, 2019

I didn't mention canonicalization. My point is that serialized JSON is ordered - which I think is exactly the same property you're referring to.

coldtea · on Aug 16, 2019

That's an implementation detail. Serialized JSON can also be printed, but your systems shouldn't depend that JSON is always ink on paper.

coldtea · on Aug 16, 2019

>The JSON spec might not specify order, but the serialized JSON is ordered by nature. JWT needs it for example (and I'd assume many signature models). Or you might have a caching layer that needs it

Whether the file has an order is an implementation detail. There might be no static file at all, for example, it could be an endpoint returning you the JSON, and it could return different orders for the exact same items every time you call it. That would still be perfectly valid.

If you "need it", don't put your data into an object, put it in a list of objects.

If the underlying layers need "ordered objects" (in other words, ordered hashmaps) they don't really use JSON.

Spivak · on Aug 16, 2019

This isn't really an issue of spec because these systems recognize a completely different protocol 'OrderedJSON' which interprets {} as an ordered list of key-value pairs instead of a set.

The mistake is assuming that two syntactically compatible protocols were actually the same protocol. uint =! int.

coldtea · on Aug 17, 2019

Only there's no such protocol. It's some adhoc implementation within a single system (or a single IBM ecosystem).

cntlzw · on Aug 16, 2019

Justgoogled DataPower, out of curiosity, and I found XML Accelerator XA35. I did not know there is HARDWARE for XML PROCESSING! Blows my mind.

diskotek · on Aug 16, 2019

Yep, back in the SOAP days we used to have a lot of XML flying around that would need to be converted into one format or another via XSLT. You can certainly run XSLT on a piece of XML in your own code, but having a dedicated piece of hardware do it for you with a slick GUI was always an easy sell for IBM to management.

cntlzw · on Aug 20, 2019

Could you elaborate? I am really curious how this hardware worked.

Was it a HTML Gui with file upload and everything? Or just some server, port and proprietary protocol?

Maybe these things are on Ebay for cheap. Oh, that looks like fun.

lioeters · on Aug 17, 2019

Impressed to learn about this hardware for XML processing. I came across an old article that mentioned the price started at $55K!

cntlzw · on Aug 16, 2019

No idea why someone needs ordered key-value-pairs. I had the please of working with a device that relied upon ordered query string parameters. Why? Because they encrypted the values with RC4.

inopinatus · on Aug 16, 2019

> JWT needs it

I already had at least four solid reasons not to use JWT. Still, adding another just keeps the dumpster fire burning.

By which I mean, in the most oblique and backhanded fashion, that there’s not going to be any reason for validating JSON order that doesn’t ultimately reveal or confirm a design gaffe somewhere down the rabbit hole.

carlmr · on Aug 16, 2019

>I got to learn that we were talking to a service running IBM DataPower Gateway which sat on top of WebSphere which sat on top of the COBOL.

As soon as you read IBM you know you have bigger problems than COBOL in your system.

perfunctory · on Aug 16, 2019

> How does world not break due to technology more often

My own theory is that most (not all) of technology is there to support BS jobs and they are irrelevant for the normal functioning of society. It simply doesn't matter when (most) technology breaks.

lacker · on Aug 16, 2019

I agree the system described is crazy. However, I find it frequently useful to use ordered JSON as a data format and I think it would be handy if more languages supported it. For one, it makes it a lot easier to write integration tests using a “golden file” of ideal output, because your program that outputs JSON now usually deterministically has one correct output. For two, it lets you hash a json-encoded object deterministically.

hyperpallium · on Aug 16, 2019

OK. I totally prefer ordered JSON, because it is so much easier to eyeball - to visually compare JSON with different ordered keys is quite a lot more difficult (O() complexity?) than if they are in the same order. It also enables diff to help see where the differences are (diagnosis, not just binary identical or not).

And, in fact, I do use ordered JSON for comparison in testing, as you describe.

However... comparison of JSON as objects (i.e. in memory) is order independent. Hashcode is also order independent (the trick is to sum the elements' hashcodes e.g. https://docs.oracle.com/javase/7/docs/api/java/util/Set.html...

Diff for ordered JSON is possible using longest common subsequence for trees, but has terrible complexity, and lacks diff's clever optimizations both general and specific to typical input.

crazygringo · on Aug 16, 2019

+1 for all reasons above for ordered JSON, highly convenient in practice.

And if it doesn't impact performance significantly, these are all pretty good reasons for JSON outputters to default to sorting objects deterministically by keys, or at least to provide a flag to do so. (Even if there's no canonical sort order between JSON libraries, all that matters is it's deterministic for each library.)

BUT... I can't imagine any scenario where you'd want to validate that JSON content was ordered on the input, which is what was enabled in this article. Why does IBM even have that as an option?!

Be strict in what you emit and liberal in what you accept, and all that...

hyperpallium · on Aug 16, 2019

I guess because OrderedJSONObject is ordered, not sorted. Like java's LinkedHashSet, it maintains the order keys are added.

If you're going to rely on a specific order for comparisons, it makes sense to alert the user to any JSON in a different order (instead of silently, liberally accepting it), or you'll get false negatives elsewhere. Easier to check for a sorted order, but also possible to define a specific order. IDK what IBM did here.

funfact: jq used to sort keys; now it retains ordering.

diroussel · on Aug 16, 2019

Indeed, and remember this JSON message is going to a mainframe. Mainframes don't have much memory and typically process record-by-record, or event-by-event. So the implemenations probably streams the JSON in and constructs COPYBOOK from the payload before continuing to invoke the cobol.

So the rework time might be to write a general purpose re-order layer that can re-order any imcoming message.

tantalor · on Aug 16, 2019

> any scenario where you'd want to validate...

Because it's a precondition for something else down the line (a dependency)

lilyball · on Aug 16, 2019

Please XOR your hashes instead of adding them! If you add them, you're losing bits on the low end. EDIT: No you're not. It feels like you should be, but with unsigned overflow, this actually works just fine.

This assumes of course that you're using proper hashes that make use of the full domain of the output type (a proper hash will have a 50% chance of any arbitrary bit being flipped by any change to the input). But if you're not using proper hashes, you're doing something wrong.

mlyle · on Aug 16, 2019

Can you please explain this assertion? ;)

If I have a 32 bit current hash value-- for any possible 32 bit value I add, I get a different 32 bit value out.

XORing is effectively adding each bit and throwing away the carry bit. Adding just cascades carries to the left.

lilyball · on Aug 16, 2019

You know what, you're right. I made a knee-jerk comment but I didn't think it through all the way. From any arbitrary unsigned 32-bit integer, every other unsigned 32-bit integer is reachable with a single addition. Therefore addition works just fine here.

It still feels wrong to say this, it feels like since adding will effectively shove bits off the high end and drop them on the floor that you're losing information, but I can't actually justify that feeling with reasoning.

mlyle · on Aug 16, 2019

Adding is actually considerably better. For high quality hashes XOR is just as good; but if there's any distributional problems at all in the hash, adding mixes stuff more.

(XORing is effectively adding with all of the carry information lost/falling off).

atq2119 · on Aug 16, 2019

Please note that using a sum of hashes almost certainly weakens any cryptographic guarantees you may expect from your hashes. Of course, that may be fine depending on your use case.

ken · on Aug 16, 2019

Ordering is part of the view, then, not the model. This is conflating different abstraction layers.

Similarly, sometimes it might be useful to store some data in a different character encoding, or maybe multi-character glyphs are stored in a different normalization form. (I can't tell "ü" from "ü" just by looking.) Or maybe your sample data has 1.0 but your program generates 1.000. There's a million ways that serialized structures can be functionally identical but quite different. Easy: don't compare raw bytes.

If you want functionally-equivalent data to be accepted, you just need an equality-tester (and hashing algorithm) that's agnostic to such issues. They're not hard to write.

jle · on Aug 16, 2019

you don't need an ordered JSON for this. Just make sure that your equality tests ignore ordering, or that your and hash functions will hash two JSON objects with the same key but different order to the same hash.

Ididntdothis · on Aug 16, 2019

if you need order shouldn't you be using an array?

tr33house · on Aug 16, 2019

makes sense. As a workaround, I'd read both, convert to a comparable object and then compare

mirimir · on Aug 16, 2019

> Much of the system was so old that it was hard to find anyone who knew how it actually worked, and it’s maintenance had been outsourced to some of the typical IT outsourcing companies of the time.

Some years ago, I worked on an antitrust case involving a firm that had outsourced relevant systems. Multiple times, to different IT firms. And there was literally nobody left who knew how they worked.

After considerable negotiation, they agreed to provide documentation. And what that ended up being was a report by IT company 2 about their understanding of what IT company 1 had done with the firm's systems. Because, I gather, IT company 1 had evaporated.

And yes, the core of it was COBOL.

cryptonector · on Aug 16, 2019

Ha! Stephen Dolan had to add this to jq quite sometime back, making it preserve object key input order on output. And yes, it's infuriating, but it's also somewhat convenient, and yes, there really is software out there that cares about object key order (sigh).

aflag · on Aug 16, 2019

Doesn't anything using jwt depend on a specific order?

scandinavian · on Aug 16, 2019

It shouldn't, unless you base64 decode the header, then parse it with a library that causes the order to change, encode it again, and then use your own re-encoding to calculate the signature:

  HMAC-SHA256(
    b64(reencoded_header) + '.' + b64(payload),
    secret
  )

You should really just verify the signature for the provided header + payload in their base64 encoded form.

firefoxd · on Aug 16, 2019

Ha. Creating a layer on top of broken APIs is an thriving business. Every single carrier company has a broken API. That's why companies like aftership exist.

One example I struggled with for days recently was USPS. Not only they use xml in the url parameter, the order of the elements also matters. Unfortunately, the order in the documentation is incorrect.

dnautics · on Aug 16, 2019

Hell even Amazon's ec2 spec is confusing in some places (the boto python call parameter doesn't always match the restful call, which is less documented, leading to errors in some libraries).

beached_whale · on Aug 16, 2019

I wrote a json serialization/deserialization library for C++ and if you provide the members in the order they are specified in the type you get better performance. It can construct the class without having to bounce the parser back to that location and it is much much more cache friendly.

couchand · on Aug 16, 2019

I also would much rather write code that doesn't quite do what it's supposed to if it's easier to do and I can take an extra day off. However, that's not my job.

beached_whale · on Aug 16, 2019

It's an interesting problem. So if all the parser does it put them into a variant like structure, sure whatever. But when you go to put them into the reified data types you would be wasting a lot of resources to parse JSON to an intermediary data structure and then request the members. I parse them directly to their final classes. So I was left with a choice and both have a cost. Parse the json in-order of the file and store the concrete type to grab it later, or store the position/size of that part and move to the next constructing them in the order needed. This was cheaper. It will parse the JSON in whatever order, just the performance can be impacted by the data ordering as many parsers are.

couchand · on Aug 16, 2019

Is your parser general-purpose? Is it template-based? Is the performance variance due only to the impact on the processor's cache or are there other factors? Is the code open-source, maybe I could just look myself?

beached_whale · on Aug 16, 2019

It is template based so that each types JSON parser is statically known and to give the compiler more opportunity to optimize. https://github.com/beached/daw_json_link

I haven't had a chance to look at why yet.

jval43 · on Aug 16, 2019

I just recently dealt with a JS library that was expecting object properties to be in a certain order ("works" in ES2015 [1]), and that object was loaded via JSON. Was a huge pain getting the object in the required order.

[1] https://stackoverflow.com/questions/5525795/does-javascript-...

jrochkind1 · on Aug 16, 2019

How were they going to charge him for 9 FTEs for 6 months to uncheck a checkbox?

dagw · on Aug 16, 2019

Perhaps they didn't know about the checkbox and thought they had to write a custom solution. It could also be that they knew about that really subtle bug that is caused by un-checking that checkbox and thus understood why that didn't actually solve the problem long term if though it looked like a good solution at the time.

Or perhaps they're so used to getting away with quoting 6 month for a 6 hour job, it never crossed their mind not to.

diroussel · on Aug 16, 2019

Be cause the checkbox only removed the validation error. Who know what other errors will be encountered when the cobol can't read the incoming messages, which might or might not be in the right order?

duxup · on Aug 16, 2019

Write some software that reordered it probably.

jeremy_wiebe · on Aug 16, 2019

Yep. Probably IBM Data Transform Services™️ installed in the cloud plus a custom plugin written to do the ordering. That’s a few folks to deploy and configure the server. A few more for the engineering. Oh and there’s probably a support contract to support and maintain this new “solution”.

pavel_lishin · on Aug 16, 2019

For what it's worth, I do wish it was trivially easy to customize things like error log output, so that I could say that I _always_ want the timestamp first, followed by severity, followed by whatever else. Our logs are annoying to parse when debugging things locally.

tuananh · on Aug 16, 2019

all the modern logging libraries does this now i think.

and with app like fluented, we can decorate it with whatever metadata we want (instance name, machine type, container name, etc..)

rurban · on Aug 16, 2019

Writers should produce sorted maps, readers need to accept unsorted maps.

It's a security issue, not just convenience. With unsorted maps the internal hash seed can be exposed, together with timing information.

Another famous omission from the specs.

AgentOrange1234 · on Aug 16, 2019

I’m no security expert. How is “exposing the hash seed” a problem for the vast majority of applications? What timing information would be leaked and why would that be problem?

On the other hand, accepting unsorted maps seems like it could introduce covert channels?

rurban · on Aug 16, 2019

> How is “exposing the hash seed” a problem for the vast majority of applications?

They can be DOS'ed.

> What timing information would be leaked and why would that be problem?

You misunderstood. By 1. leaking the order, and 2. by checking the timing of getting certain keys of a map you do have two independent infos to get at the secret seed.

> accepting unsorted maps seems like it could introduce covert channels?

A covert channel might be exposed by the sender, if he uses some non-random but unsorted map order. The receiver needs to accept any order. There's no risk in accepting unsorted maps. The only risk at the receiver side is another famous omission from the spec: How to deal with duplicate keys. Accept (overwrite or drop) or reject? All 3 cases can be seen in the wild.

j88439h84 · on Aug 16, 2019

https://bugzilla.redhat.com/show_bug.cgi?id=750555

This led to hash randomization on by default since Python 3.3.

AgentOrange1234 · on Aug 17, 2019

Cool. Thanks.

m463 · on Aug 16, 2019

This makes my head hurt. I wonder if you end up with sorting differences depending on the locale?

wallyowen · on Aug 16, 2019

It's not sorted, but ordered: The order emitted is the order entered.

peteforde · on Aug 17, 2019

I built the first popular sports "app" for the original iPhone, back when Jobs claimed that there was no need to write native code.

The backend polled an API that served XML updates on game scores etc. Note that I didn't previously know anything about baseball (and I still don't, not really).

So let's say that a baseball team scores a Double. We'd see some XML like <doubles><double/><double/></doubles>.

Now, suppose a team scores a tripple... You know that there's a <triples></triples> entity. What would you expect to see inside the node?

If you're me, you'd expect to parse <triples><triple/><triple/></triples>.

I got the call during dinner. There was an important game happening, and the app suddenly broke. People were uptight. They loved our app and they were complaining.

In the end, it turns out that we would need to process <triples><double/><double/></triples>. Why? "Oh, it's always been that way." (Says the brusque developer at the service charging $50k/month for access to this feed.)

tosh · on Aug 15, 2019

recent twitter thread referenced in the article: https://twitter.com/therealfitz/status/1161349619659530242?s...

wallyowen · on Aug 16, 2019

It's not sorted, but ordered: The order emitted is the order entered.

wallyowen · on Aug 16, 2019

Sorry, that was intended to be a reply to the prev. comment.

lilyball · on Aug 16, 2019

The tweet at the top of the article says

> This so far out of the spec it makes my ankles hurt.

This is not in fact out of spec. The JSON spec does not define semantics here but instead quite explicitly leaves it up to the JSON processor and data interchange spec for what to do about ordering of objects.

dtech · on Aug 16, 2019

> This is not in fact out of spec. The JSON spec does not define semantics [...] for what to do about ordering of objects.

What? It is literally on the first page of json.org and section 1 of RFC 7159 [1]

> An object is an unordered set of name/value pairs.

> An object is an unordered collection of zero or more name/value pairs

[1] https://tools.ietf.org/html/rfc7159#section-1

patrickthebold · on Aug 16, 2019

There are multiple standards. https://www.ecma-international.org/publications/standards/Ec... Is more relaxed.

>The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

Also it's pretty clear that the keys must have an order going over the wire. And JavaScript objects (which aren't unrelated to JSON have an order now) https://www.stefanjudis.com/today-i-learned/property-order-i...

shortercode · on Aug 16, 2019

If the objects are not guaranteed to be in order, I would always consider them unordered. You could if both the producer and consumer are ordered, but as the OP noted this can cause problems later on and for no good reason. I'd use an array of key-value tuples if I wanted it to be ordered.

al_form2000 · on Aug 16, 2019

Uh, "no good reason" in this case means "I unchecked the box and nothing bad happened right away" which amounts to shallow analysis, to put it gently.

Piskvorrr · on Aug 16, 2019

That was a twist ending, indeed. I was expecting a resolution, not "I gave it a good whack and guess what, that fixed it."

lunias · on Aug 16, 2019

I've been asked by architects to order JSON properties. It's fine until you depend on it. i.e. accessing a map as you would an array

https://fasterxml.github.io/jackson-annotations/javadoc/2.3....

cypressious · on Aug 16, 2019

I have written an app for the menu of my university's canteen. The JSON API that I used returned the meals as fields of a document where the order of the fields was the actual display order. It took me a while to even find an implementation of a JSON parser that keeps the order of the fields.

hobofan · on Aug 16, 2019

I would assume that any decent JSON parser has at least the option to keep the order of the fields. Especially for CLI tools that automatically add fields to a JSON file thats usually edited by a user e.g. package.json, I feel that it's absolutely crucial.

bullen · on Aug 16, 2019

This is an omission from not only the JSON standard but also the JavaScript language, JavaScript has no map order and the rest is history.

The result is that all implementations are broken, f.ex. this is how a tree structure has to be implemented with JSON:

http://root.rupy.se/meta/user/task/eelzter/44953781393584543...

It's inefficient, ugly and just wrong.

flyinfungi · on Aug 16, 2019

Sounds like normal life to me. Please give me $$$$

al_form2000 · on Aug 16, 2019

"Why would you care what order we were sending the name value pairs for this?" The article makes it sound like it's somehow IBM's fault. However, questions about preserving order in hashes/associative arrays/maps/whatchamacallit span at least thirty years. Some applications of the format even demand it (e.g. ansible uses YAML - a superset of JSON - or even straight JSON for it playbooks; emitting them out of order is not an option).

Every time the issue surfaces it elicits a large amount of eye rolling among the cognoscenti. However, I'm a firm believer in the idea that recurring demands from the user base need to be addressed, rather then mocked. The coding community appears to me notably tone deaf on this.

'Course anybody can use <insert suitable technique> to send a JSON map in the desired order, except the parser on the other hand will blissfully disregard it. Or you can devise any ordered solution on both ends, which will leave you outside of the accepted standard and open you up to the situation described in the article (where the requirement may well have been frivolous).

I can remember very few - if any - instances of somebody implying that the standard should somehow make room for this kind of scenarios. The standard reply was "use a different format" or "change the requirement". (Somehow remembers me of people asking what can one do to have whitespace-preserving XML, and at least one amusing story about that)

viraptor · on Aug 16, 2019

Json has a perfectly fine order preserving key-value map:

    [["Key1", "value1"], ["key2", ...

You just need to deserialise it into a specific collection on the receiver. It's within standards and there are no weird parser issues.

al_form2000 · on Aug 16, 2019

That's a way. A way that, IMHO, vastly diminishes its expressive power on the reader's side, which I do not consider a very good thing. Most solutions to the issue are of this nature.

Consider serialized JSON already has a natural ordering, which is thrown away for some reason (probably parsing convenience).

viraptor · on Aug 16, 2019

> which is thrown away for some reason (probably parsing convenience).

It's got nothing to do with parsing. The json comes from JavaScript syntax, where objects represent unordered mapping. Json naturally does the same.

If you want expressive power for reading, json is very poor in comparison to pretty much everything else. Just use it for simple serialisation.

al_form2000 · on Aug 16, 2019

Granted, it came from there, but that was back in the days of map=eval(json) and they're gone. There is nothing in json the format (as opposed to json the language construct) to impose the unordered behavior.(Or, for that matters, the 'no comments' bit).

adrianmsmith · on Aug 16, 2019

> There is nothing in json the format [..] to impose the unordered behavior

That's not true. The spec at http://www.json.org/ says "An object is an unordered set of name/value pairs".

al_form2000 · on Aug 17, 2019

Yes, that's in the specs. But what I meant is that there is nothing intrinsic to the format demanding unordered behavior.

josephg · on Aug 16, 2019

I hear that. But also, JSON does not have a canonical serialisation. Other than object key ordering, strings have multiple equivalent formats (direct Unicode characters, \u00xx, \xXX, etc). Numbers have exponential form (1e2 is the same as 100). And white space is ignored - [2] and [ 2 ] and [2\n] are all equivalent.

If your application expects JSON input to conform to one particular set of decision points here, that’s fine and useful but you’re not using JSON anymore. You’re using a JSON-compatible subset. Document that new thing you’ve made and stop calling it JSON; because it’s not.

al_form2000 · on Aug 16, 2019

Agree. Except that the sheer number of questions about this spell "missed use case" - to my eyes at least. Sort of like the "How do I push a local branch to remote?" question in GITland.