Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Do you use JSON Schema? Help us shape its future stability guarantees
263 points by jviotti on Jan 30, 2023 | hide | past | favorite | 172 comments
The JSON Schema organization is requesting feedback from users out there about the stability guarantees it should be enforcing on the next release of the specification.

If you feel you can contribute, please read https://github.com/orgs/json-schema-org/discussions/295 and share your thoughts.

The current options are:

- The specifications that have been published to date provide no stability guarantees, but we will be adding explicit guarantees with the next publication.

- Apply the stability guarantees starting with the most recent publication (draft 2020-12) so that the next publication contains no breaking changes.




JSON Schema is awesome. I wish Typescript had better support for it though, having to do stuff in Zod and JSON Scheme sucks.

I have a system I built that compiles TS types to JSON schema, which then validates data coming into my endpoints, this way I am typesafe at compile time (Typescript API) but if someone hits my REST endpoint w/o using my library, I still get runtime goodness.

The number of different ways that JSON schema can be programmatically generated and therefore expressed is a bit high, different tools generate very different JSON Schemas.

Also the error messages JSON schema gives back are kind of trash, then again the JSON Schema on one of our endpoints is over 200KB in size.


FWIW, I've had good success with Typebox over Zod for exactly this stuff. You don't need to compile out types, you just...have them.


Ditto. It took me a while (coming from C++) to get that types are not the ideal source of truth in TypeScript, due to their disappearance at runtime.

Having a runtime parser (which can consequently express things impossible with TypeScript types alone, like "a positive integer" rather than being satisfied with any instance of `number`) from which types are inferred, is a needed mindset shift made easy with Zod.

I use JSON Schemas for request validation and response serialisation (eg: [1]) in Fastify, derived from Zod parsers via zod-to-json-schema [2]. Some of Zod's runtime validation does not translate to JSON Schema (typically transforms), so YMMV. But this gives a good runtime glue with the static typing of request handlers.

[1] https://github.com/SocialGouv/e2esdk/blob/beta/packages/serv...

[2] https://github.com/StefanTerdell/zod-to-json-schema


Maybe I was unclear - my post was a criticism of Zod because it involves a bunch of duct tape that I'm not sure makes sense.

Typebox just creates JSON Schema objects at runtime and projects them into the type system with `Static<T>`. In so doing, you simultaneously create schema and types and the process of doing so is pleasant--you can just hand a TObject to Fastify as a validator object and you're done. Plus, with a Typebox type provider, it infers down to your handler.

JSON Schema is the lingua franca; to me, working in it, rather than converting to it, is a much easier proposition.


Zod has the same "simultaneously create schema and types and the process of doing so is pleasant" feature, it's just not using JSON Schema for it (and has more power than JSON schema does).

In the Zod world, if you need JSON Schema to e.g. communicate to the outside world, you can extract it from the Zod schema with https://github.com/StefanTerdell/zod-to-json-schema -- but if you don't, you don't.

JSON Schema is kind of underpowered for real validation, so if you limit yourself to it, then you'll just have a another round of validation immediately after. TypeBox' s CreateType seems to be the same idea, it cannot be expressed in just JSON Schema.


I think Colin's a rad programmer, he works on EdgeDB which is absolutely my favorite datastore I've ever used and I think Zod's fine if you want to use it. But I don't agree with your premise. I don't, in practice, find the way Zod asks you to think about data compelling. It's probably because I think specifically in communication between systems; making the particulars of the interchange format central to the act of writing the thing, for me, keeps top-of-mind the necessity of both ends having an identical understanding of the allowable semantics. (Similarly, I've never used any Typebox features that don't map to JSON Schema, and I've never felt the need to.)


Interesting, I'll have a look, thanks!


If you're going from Zod parsers to JSON schema, are you duplicating that in Typescript? Or can you just go ts-to-zod[1] then zod-to-json-schema?

[1]: https://github.com/fabien0102/ts-to-zod


Zod schemas are the source of truth, from which TS types are inferred and JSON schemas are generated.


Interesting.. I'd rather write typescript types than Zod schemas. I haven't used JSON schema, but going TS to Zod was straightforward and really pleasant


Typescript can't understand types as objects, but it can understand objects as types (using typeof). So starting with objects and generating types is more natural.

Also if you start with objects you can express runtime conditions that are not possible with types, for example maximum sizes.

Zod is quite a bit more general than JSON schema because it can express transformations not just validations.


That way does not give you runtime validation of input.

(Or, requires an extra build step to generate that. It's beyond Typescript itself.)


Maybe a better solution (not necessarily for your exact your use case) would be to generate the Typescript types from JSON schema? The schema feels more like it should be the real source of truth. That's how protobuf (for example) works - there's a language-independent schema and you can generate types for any languages you wish (but don't have to depend on any particular language if you don't need it).


This is pretty doable with mapped types in TS, to make it work without a compilation step though is pending on https://github.com/microsoft/TypeScript/issues/32063

Although in practice the mapped types to do this do slow down the typing service non trivially.


Our original build path was:

1. Fancy combo of build and compile time generics generated all of our libraries 2. Tooling ran over return values from the generic definitions and created the schema

I rewrote it and now everything is defined in JSON files and those JSON files are ran through a code generator that creates our exported libraries, and we still generate the schema based on the TS exports.

Having everything defined in JSON then allowed us to write tooling on top of the JSON to make changes to our libraries.

Protobuf v3 is horrible, had to start using it recently. The type system is so anemic, it is a joke how hard it is to model things in it.

JSON schema is more powerful than TS, and TS is orders of magnitude more powerful than what can be expressed in PB.

The original generics code was super cool, and obscenely succinct, but it wasn't amenable to being auto generated.


I had something like this and turned into a library: https://www.npmjs.com/package/rototill


What tool do you use to generate json schema from typescript?


Not OP, but we have a very similar setup. We use https://github.com/vega/ts-json-schema-generator in a prebuild hook to generate a schema from Typescript. The schema is then fed into Fastify, and referenced in route configurations for incoming (validation) and outgoing (fast serialization) data, as well as auto-generated Swagger. Took some custom code to simplify attaching definition references to our route configs, but it works quite well.

As an extra, our frontend uses the same shared TS code for data transfer objects, this way there's an extra level of type safety at the API.


Switched to typescript-json-schema, ts-json-schema-generator worked but its schemas were much larger.


This gives you a pretty good feel for where it’s at: https://json-schema.org/blog/posts/future-of-json-schema

I’m invested in it. I’m using it to provide implementation-specific validation of requests to/from a third party API.

I wish there was a good macOS editor or IDEA plugin for it with autocomplete etc. The static generators from examples are obscure, ugly, minimal, and can’t account for variations. It isn’t pleasant to write, it’s tedious and slow.

Nevertheless I’d rather write API validation this way, in a document, than in code.


Have you looked at vscode? It supposedly supports json schema (https://code.visualstudio.com/docs/languages/json) and I’ve been thinking about trying it out.


The one out of the box with IntelliJ is pretty decent for checking YAMLs with JSON Schemas, is there something you're missing?


Yeah, I'm writing the schema by hand to describe someone else's API by cross-referencing their API docs and the actual (integration-specific) requests/responses. So there's nothing automatic in my case beyond the native JSON support. Storm doesn't offer autocomplete based on the JSONSchema spec.


+1 it's great. It just finds the schemas all on its own. Love it. When I've written my own i had to change a setting to point it to my schema but that worked well too.


In which flavour is this? I have bunch of json and yaml files. In VS Code I can put a $schema field in the file and I get highlighting and completion. In IntelliJ I need to make file association per project to get this. And it only works for json not yaml.


What are you trying to do in regard to your API? I just developed an extensive API definition using OpenAPI, which (as of version 3.1) is compliant with JSON Schema. I used Stoplight Studio on Mac OS, although it still doesn't fully support OAS 3.1 two years after it became the current OpenAPI standard. The support from tools and generators is terrible; there's still no apparent toolchain that supports version 3.1 from definition through code generation.

Ass-dragging on 3.1 support aside, the code-generation tools out there are, in my experience, trash anyway. It's a massive hodgepodge of Java-based tools with incomplete and redundant documentation repositories all over the place, and widely varying output quality... when the tools work at all. I appreciate people giving their time to open source, but on the other hand I wasted so much time trying to make the broken tools work that I could have written my own code generator from scratch (which is what I ended up deciding to do after weeks of dicking around with OpenAPI Generator).

Anyway, I was just wondering how you're using JSON Schema for API work (since you didn't mention OpenAPI).


I spent the last year and half working on a project where we planned to use OpenApi/AsyncApi to generate typed api clients and validate api requests between all our services and applications (written in typescript & C#).

Going into it, I thought we'd be on a well traveled path, but that was far from the case...

Typescript has some decent code generation packages (although fragmented) and request validation can be hacked via AJV... But tooling around C# was halfbaked at best.

In the end, I didn't really feel like OpenApi provided the API "contract" I had originally hoped for.

If I could start over, I would probably use something like gRPC for internal service communication (would maybe still use OpenAPI for any public API)


This is exactly what I expected: a well-traveled path. I assumed I was doing something wrong or just not "getting it," and spent weeks digging through the scattered doc and poring over Java code and templates... and the generated code, which wouldn't even compile. In our case, we needed C++ for client and server... a hugely under-served scenario.

Using OpenAPI and Stoplight was very useful for thinking the API through and how the whole thing could work, but the code-generation aspect was a total bust. And talking to some new colleagues later who had some knowledge of it, I found that I wasn't alone in my opinion that the tooling is trash.

Even worse, the prevailing opinion on the ecosystem is so poor that it might even be a professional liability to propose (or admit to) using it. The implication was that it (or at least the code-generation facilities for it) is by and for people who don't know what they're doing. Oof.


It's kind of insane how stoplight and swagger/smartbear still don't support OpenAPI 3.1

Half tempted to just start building out OA3.1 stuff to replace those services


This is what I was going to do too. I was just going to write a code generator in Python.

And I don't even know Python. I figured it'd be easier to learn it and use it for text processing than wasting any more time trying to make the sprawling Java "tools" work.


I am planning to autogenerate a set of Java libs from OpenAPI 3.1 definitions in an upcoming project, and so far have assumed it would Just Work. Am I going to have a bad time?


It's hard to say. It may be that the Java ecosystem for it is in good shape. The generator tools are almost all Java, so you might have the best-case scenario.

The OpenAPI Generator uses Mustache templates and a plug-in-style design to handle all the different languages and outputs targeting different packages. I was never able to find a succinct document explaining exactly how the processing steps worked or even a catalog of data elements that the generator extracted from your OpenAPI spec document (YAML or JSON). So while there's a lot of talk about creating a custom template, or, if that's not enough, a custom generator... the documentation to do so is very incomplete. The only example I found all discussed making a generator for documentation and not code. Again, not to sound unappreciative, but... that's lame.


Soon we'll have JSLT. JSON Query, JSON Pointer, JSON signatures etc are already there.

And I say that as a conscious XML user. We are going in circles. At least XML has comments.


At this rate, we'll be back to S-expressions in 50 years or so.


Some of us never left :)


I legit saw a team asking for a tool that does JSON -> JSON transformations at work. The XSLT for JSON vibes were almost too strong.


That's jq. At least useful for a lot of transformations.


A little different, but yeah. (If I'm not mistaken, xslt had a bit more of a defined language and way more formalism.)

That said, there are plenty of examples. JSONPath, JQ, Jolt...

I am not claiming that this is not useful. Just amused to see the wheel go around.


Oh, it absolutely goes around in this industry.

For a simple, most frequently used subset of jq a more modest, non-Turing-complete language of transforming strings could be created, so that for many useful cases one just need to provide such a string. Strictly for JSON -> JSON.


yes, jq is XSLT (and more) for JSON


and more?


The problem with XSLT wasn't that it was a fundamentally bad design, the problem was the horrible syntax. Well, and namespaces.


what was the problem? there's templates and queries and it seemed pretty clear once you read a quick intro.


Namespaces in XML was a bit of a bolt-on, and really showed the difference between the crowd who wanted to create XML to simplify defining new markup languages like HTML (the originators) and the XML as data crowd who wanted generalized tooling to support all XML data.

XPath really started to show the warts here - it leveraged namespaces as an axis, where your XPath could utilize namespaces which were defined at a particular element in the XSLT. However, this went against generalized tooling, as you had to understand which attributes or text were paths in order to know if namespaces had semantic value beyond the XML syntax itself.

IMHO, Canonical XML and XML Infosets were an effort to ret-con the new behavior back in - that namespaces and prefixes were not just used to serialize XML arbitrarily, but were semantically important parts of the XML itself. But at this point XML was too large to evolve - even XML 1.1 had relatively little market uptake.

On top of that, you had issues with how e.g. Microsoft Internet Explorer had divergent behavior for XSLT from other browsers/the spec.


They really should have made namespaces just be {url} prefixes on element and attribute names.

<{http://my.example.com/ns/fruit}citrus> ... </{http://my.example.com/ns/fruit}citrus>

Sure, it's textual bloat, but XML was already verbose and already compressed well with gzip, and it's a lot simpler to process.

The problem with the xmlns:alias="url" rewriting thing was that 1) most things didn't bother implementing it in full, and ended up e.g. relying on the name of the chosen alias 2) it had a special case of changing the current default namespace, which caused even more bad implementations.


syntax &lt; semantics


I miss working with xslt - I was once handed an API specified in xsd and we wrote an xslt script that turned it into an SQL script that built all the tables and insert/update/delete stored procedures so we could log all the calls.

I guess you could do something like that with redis but at the time it was magic.


Trying to figure out how to get from XSD to anything else in order to work with a huge dataset. Wish there was an easier way to go from XSD -> JSONSchema -> Language Model so I could go from XML straight to my language of choice (Typescript in this instance). There don’t seem to be a ton of choices from XSD -> Language either.


You know what is funniest part is?

JSON Schema is something that address stability guarantees for JSON - where JSON was invented to be easy to change so it skips heavy XML stuff that was giving stability guarantees.

Now people involved in JSON Schema have problem with decision on stability guarantees - if they skip those or keep it backwards compatible...


next is using a (subset) of typescript for human consumption and concert that to json for machine consumption

because one thing is for sure, json schema and openapi are not really nicely readable, even if written as yaml


The jq cli tool/language is awesome already. The person who wrote it was also interestingly very competitive in Advent if Code with it too.


It's not "at least", it's arguably a different approach to a similar, but not equal, problem of having data structure.


JSONschema uses jsonpointer.


I use JSON schema to generate JSON-editing forms via json-editor: https://github.com/json-editor/json-editor

Then I can use the same schema in the backend to validate the data, both sent in via the form and directly with the application/json content-type. It's a pretty smooth flow, and reduces a lot of redundancy.


I use JSONschema, it's honestly great the way it is [0]. So I guess the guarantee I want is "don't change anything"? Ha.

[0] my biggest gripe is it's not well defined what to do with multipleOf when the number isn't an exact integer.


Your [0] might be a change that can be made in a backwards compatible manner.. as long as current implementations don't diverge on what they do when the number is not an integer (whether they error or do something)


I think you would have to build in a "integerMultipleOf" filter and slow-deprecate "multipleOf"

To be more specific, since I wasn't in GP... It's not clear what you should do when for example you want to see if 0.3f is a "multiple of" 0.1f (due to the whole 0.30000000000004 thing)


That sounds like a limitation of the implementation. Decimal types exist.

If the implementation wants to account for some floating point errors I think that's fine too. If they want to codify it though, maybe add an epsilon param so the user can specify how close they want it to be.


Decimal types don't exist in JSON. If you need decimals, you definitely should encode them as strings. As JSONschema exists to document JSON, it should be agnostic to that. You can if you wish, provide format information in the format field, which is not prescriptive.


Neither binary nor decimal floating point types exist in the JSON spec. The JSON spec does not give an explicit internal representation for floating points, implementations are free to use decimal types internally, if they wish to do so. The JSON spec merely specifies what a "number" is at the grammar level.

Having said that most implementations don't use decimal floating points to represent these.


That is correct, but JSONschema does and there is an implicit behavioural difference introduced by the existence of the MultipleOf filter when you're dealing with floats versus integers that can't simply be elided away or excused by the lack of a distinction in the underlying type system.

The point is JSONschema could take a stand and say for example, "this filter will always fail when the operand is not the number representation of an exact integer".


It is a scandal that JSON does not support the BigInt type.


It does. In fact, not only does JSON support big ints, but it supports arbitrary-precision base-10 decimal numbers.

Whether your JSON parser will preserve the precision correctly is another story. For JavaScript/ECMAScript, you'll need to use a library.


It does. Just string encode it.


   1 == "1"; // true
   1n == "1"; // true
   1n == "1n"; // false


You have to describe what your encoding is in the "format" field of your JSONschema and implement the correct semantic meaning.


This is wrong. All numbers in JSON are arbitrary-precision base-10 decimal numbers.


No they aren't. The JSON spec says implementation is up to the engine and de facto JSON numbers are parsed as ieee754 on just about every platform I can think of.


I think what you're getting hung up on is that JSON does not use IEEE floating point math. If your implementation or environment stores parsed numbers as an IEEE floating point, that's a limitation the implementation has to disclose or work around in some fashion, but it's not a limitation of JSON or JSON Schema as such.


The behavior is perfectly well defined... there's no special cases for integers vs. rational numbers, they're all the same.

For example, given multipleOf: 0.3, and an input of 0.9, 0.9/0.3 is 3, which is an integer, so would be accepted. A value of 0.8 would be rejected.


0.3 is not the same thing as 3/10 in ieee754, neither is 0.1 the same as 1/10.


Would love an standardized ability to specify some human-readable annotation for a validation rule and then an ability to easily see this in the DETAILED or VERBOSE validation format. I.e. rather than a cryptic note that something doesn't match a `pattern: "\\S"` (best your typical library would do) I wanna say "must contain at least one non-whitespace character". And I want this standardized, so any conformant library I pick would be able to do this for me, providing a consistent experience whenever it's UI-side "on-the-fly" pre-validation or processing a submission. Unless I missed it, I think while some libraries can do this, it's not exactly standardized thing.

And maybe schema URIs weren't exactly the best idea. Not a fan how many libraries make me register a schema with some a fake URL instead of just feeding them the schema document and simply not caring where it came from at all. But since it's already there - OK, fine, it's a minor nuisance.

Otherwise, it just works.

P.S. I have no idea about the relation to OpenAPI/Swagger/RESTful APIs/whatever. I use vanilla JSON Schema as a convenient "cross-platform" DSL for JSON-serializable data structure validation, and I think it does excellent job at this. Would love to see it staying in that scope.


I feel like this is missing an important bit of context, at least it's not obviously part of this discussion though I expect it's known somewhere.

Keeping things compatible is essentially always preferable. Making breaking changes is in isolation a bad thing but can be worth it for what you get out of it.

Is there a wishlist of "if only we could break compatibility, we'd do X/Y/Z"?

> The specifications that have been published to date provide no stability guarantees, but we will be adding explicit guarantees with the next publication. However, it seems disingenuous to promise "no breaking changes" while including breaking changes.

Not really, this seems pretty par for the course of any project hitting a 1.0 release - announcing that things are now stable. Perhaps that's a better framing, you wouldn't release something saying no breaking changes while containing breaking changes, but releasing something that has breaking changes as a stable thing makes a lot of sense.


I have been trying to get anyone to use it since I discovered FasterXML/Jackson could generate it.

I have not worked for an organization with public APIs. Not a single one of the APIs I work on is completely stable. Figuring out what our APIs do requires techniques from archeology, anthropology, and sociology.


I'm not sure, but I suspect the JSON Schema is better used on the problem formulation stage, not on reverse engineering stage. It's still rather simple tool, it can easily be confused and hide the intent - at the same time if you record your API syntax in form of JSON Schema, you at least have reasons for each JSON Schema feature you put into the spec.


> Do you use JSON Schema?

At one point I did, but then discovered RAML[0] and it subsumed the value of what JSON Schema provides as well as being easier to work with than OpenAPI[1]. Also, generating JSON Schema from RAML definitions has proven to be a fairly straightforward process.

The usual caveats apply... Your mileage may vary, my experiences do not speak for any others, my opinion does not detract from the value of JSON Schema, etc.

0 - https://github.com/raml-org/raml-spec/blob/master/versions/r...

1 - https://swagger.io/specification/


I took a look at RAML and it also seems OK but not great like Swagger and JSON Schema. It isn't that I think I can come up with a better one, it's that I can see why such tools often aren't adopted and API docs are often published in non-machine-readable formats.

What I didn't like was that it doesn't infer something is an object and instead you have "type:object" repeated a lot. Also properties rather than props.


I'm not currently using it, but I'm strongly considering validating json in postgres with https://github.com/supabase/pg_jsonschema - which uses the https://docs.rs/jsonschema/latest/jsonschema/ Rust crate

So I'm not sure if my feedback is valid but, I sure hope that the jsonschema crate follows the spec! Otherwise I'll never use jsonschema but instead something-not-exactly-jsonschema. In other words.. you better not break anything.


the jsonschema crate is spec compliant and has full implementations for the required parts of drafts 4,6, and 7

pg_jsonschema matches that support with the caveat that it doesn't support loading documents over http since that would be risky behavior inside a database

disclaimer: I'm the author of the pg_jsonschema


Started using it for my site. Have a hard time navigating the docs and understanding the best types of schema for different purposes. Is there a great guide that anyone knows of?



Watch out for that second link. Not everything there is accurate. If you or anyone has suggestions to improve the docs (first link), filing issues or PRs are more than welcome!


> Watch out for that second link. Not everything there is accurate.

Can you elaborate please? I particularly like their well formedness concept.


Part of the problem is they wrote that site and its content for a specific version of JSON Schema. The `items` keyword no longer takes an array form value in the latest version of the specification, while their documentation shows that it does.

I seem to recall other issues, but can't find them now. We may have raised issues which were resolved, but I don't recall the specifics.


We at OpenMetadata(https://open-metadata.org) use JsonSchema extensively to define the metadata standards. JsonSchema is one of the reasons we are able to ship and get the project to what it is today in quick time. More about it here https://www.youtube.com/watch?v=ZrVTZwmTR3k


I considered using it some time back for API payload validations. I found it bit too verbose for my use case then. I ended up writing my own lighter version and it worked pretty well. I later open sourced it here - https://github.com/sleeksky-dev/alt-schema .


I use OpenAPI and JSON Schema at work, but for persobal project ended up with my own alternative too. It's too verbose to be fun. https://github.com/alkun-org/open-kun/tree/main/bayan


Is this the right place to complain about lack of a native Date datatype in JSON?

Also lack of inheritance support. (For example I want a way to specify that my json object should be deserialized as Dog not as Animal.)


Lets invent JAYSON :D

   (dog){
      "ears": [(ear){"side": "right"}, (ear){"side": "left"}]
      {% this is a comment %}
   }

Now we only have to invent attributes, and we can sell this XML skin to the enterprise JSON users.

(This is only half sarcasm)

EDIT: Even Better, make comments a subtree by having a special type tag, like #. this way, (#,dog){...} can stay in the tree, but is ignored.


> Also lack of inheritance support. (For example I want a way to specify that my json object should be deserialized as Dog not as Animal.)

Use Typescript, then so long as it barks, you can treat it like a dog!

And if it knows how to bark and meow, you can treat it as a dog or a cat at your leisure. :-D


The issue is when JSON has to be parsed on the server using Java/C#. How do you know whether to parse it as a Dog or as a Cat (both of which inherits from Animal)?

Here's Microsoft's workaround: https://devblogs.microsoft.com/dotnet/system-text-json-in-do...


I must admit I'm a noob in this world so maybe I'm going about it the wrong way, but I used JSON Schema's oneOf[1] along with required[2] for this:

    "properties": {
      "cat": {
        "$ref": "#/$defs/catType"
      }
      "dog": {
        "$ref": "#/$defs/dogType"
      }
    },
    "oneOf": [
      {
        "required": ["cat"]
      },
      {
        "required": ["dog"]
      },
    ]
Usage would then look like:

    {
      "dog": {
        "name": "Charlie"
      }
    }
At least it's explicit and can be easily checked for validity.

[1]: https://json-schema.org/understanding-json-schema/reference/...

[2]: https://json-schema.org/understanding-json-schema/reference/...



> The issue is when JSON has to be parsed on the server using Java/C#. How do you know whether to parse it as a Dog or as a Cat (both of which inherits from Animal)?

I mean how MS does it, yeah, you have a tagged union, look at the tag to determine what it is.

Heck even networking packets work this way, UDP or TCP, look at the Protocol Field on the IP packet to see which it is.


This doesn't deal with what type of object you create on deserialization, but you can have some form of inheritance in JSON Schema. You create a definition for the parent class and then reference that definition in the subclass and add whatever other attributes you need. JSON Schema doesn't have any understanding of classes, but this will allow you to do something similar.


I think JSON allows both of your use cases, you'll just have to define your own deserialization.

We've used { "__type": "DOG" } as metadata flags in GraphQL and JSON API's for a long time.

I rarely see an API return anything other than an ISO8601 UTC datetime string anymore other than Xero's horrific .NET(?) string " /Date(1326480130277+1300)/"


JSON schema doesn't deal with how your language treats the underlying data - and shouldn't do it.

Use custom deserializers for that.


No, it's not the right place.


Correct, so let's maintain the tradition. After all we don't usually have precise enough topics, so we can deviate a little and use the opportunity.

Date type reminds about some network-specific types which used to be in older version of JSON Schema (like IPv4 address?). Today we can use a string with regexp for an approximation. Do we have a mechanism in JSON Schema to define, mmm, dialects?..

Deserialization of a number as an integer, not a double, could be a similar problem. We can agree that we (try to) deserialize to most abstract class (?) or to use external explicit indicators of the class.


I use it extensively in prod. Well the superset that is OpenApi. It enables contract first development where any change to the api is done in the schema fist, and then implemented by the clients/server.

Since we have tooling[0] that validates requests and responses at runtime, the clients can be absolutely sure of what they receive (we through 500 if the server attempts to respond with an undocumented respond) And the server is also sure about the shape of the requests. This allows us to validate everything at compile time too, generating typescript types for both client and server.

And since we have similar tooling regarding our data stores (typescript types for sql queries) most of the time if there is a bug, the code would simply not compile - pretty nifty!

[0] - https://github.com/ovotech/laminar


How do you keep the openapi spec file updated over time ? Our problem is usually making sure changes to the api get reflected and updated in the open api spec file.


Here's my gripe with jsonschema...

In most cases where I want to do some validation on JSON I find that I usually have a class/struct/object that represents the payload and I want to unpack JSON into it (or dump that class to JSON). Ultimately, there are already nice tools to do this (eg; marshmallow on the python side). So unless I'm crossing language boundaries writing a separate jsonschema and using that is more work and I have to keep it up to date.

And in the cases where I want to cross language boundaries at this point there's less and less of a compelling case to go with JSON and not eg; flatbuffers/protobuf/thrift/capnproto since you're writing a schema anyways.


Here's my gripe with random HN comments: this has nothing to do with stability guarantees of the specification or not, unless I'm missing something in your comment. You basically just saw "JSON Schema" and thought "Aha, JSON Schema, here is why I don't like it at all, regardless of why you started this conversation"


"Ask HN: Do you use JSON Schema?" was complete thought too


To say something concrete about this: it’s vital that JSONSchema have the ability to provide arbitrary namespace-friendly metadata at a field level. This would allow the schema itself to be extended to specify serialization details - much like Kubernetes YAML can go far beyond specifying a deployment and be tagged with various kinds of behavior modifiers.

In theory this is already supported - ish. Because unknown keys at any level of the spec are ignored… but how does one know that those keys will always be unknown? The specification should do something like say certain keys, say, any with a period in the key name, will never be used by the core specification as it evolves. It’s a bit unclear from the spec at https://json-schema.org/draft/2020-12/json-schema-core.html#... IMO. This is why this thread is important!


And this is exactly why we are asking this sort of question. This sort of feedback is useful. Thanks. Some of the people working on the spec have had a similar idea and it may end up being part of the spec.


Since you use python you might be interested in knowing that Pydantic outputs JSON Schema as an option. This is part of how FastAPI is able to generate OpenAPI specs (OpenAPI uses JSON Schema for a lot of validation). The only time I've needed to use JSON Schema I just used Pydantic models as my source of truth.


I use pydantic, and then use the json schema output to generate typescript interface definitions. This gives me pretty good velocity by having a single ground truth.

The application I'm working on predates FastAPI, so I'm using CherryPy and modified their JSON tool to call the parse_obj() and json() on the model where needed.


Wonder in which language class members can have such JSON Schema requirements as "an integer which is a multiple of integer X" or "string which matches regexp R". Also not sure which other schema system is better than JSON Schema across many different features.


They're saying that the tool doesn't fit their priorities for their use case. I suspect you might be trying to say that their use case isn't what the tool is designed for, but you've phrased it in a way that comes across as condescending for not knowing what the tool even does. I'm not sure if you're trying to convince them that the tool would in fact be superior for their use case or if you're unhappy with them for giving this feedback in a context that you think isn't appropriate, but in either case I think it would be more effective to just state your point directly instead of passive-aggressively.


Ok, direct statement. I've tried some alternatives to JSON Schema, e.g. Protobuf. Left with some disappointment towards those alternatives. JSON Schema so far looks pretty good to me in comparison - overall, not for some specific cases, which are frankly rare.


> Wonder in which language class members can have such JSON Schema requirements as "an integer which is a multiple of integer X" or "string which matches regexp R".

Any language with a half-decent type system can represent that at the type level. And certainly any language worth knowing can trivially check whether an integer is a multiple of integer X or a string matches regexp R, which is what grandparent was getting at.


We seem to talk about different things. When doing deserialization, many wouldn't expect to add a custom code which would translate data and validate it - otherwise we would write the whole deserialization ourselves, I guess.


Most validation has to be done in custom code after deserialization (e.g. maybe a schema can enforce that the "collection ID" is in the right format, but it can't enforce that that collection actually exists in the datastore). There's definitely value in having a library/code generator/etc. to mechanically do the bytes -> structured value deserialization, but the cost/benefit of doing more complex validations at that stage is very questionable IME.


For simple enough APIs it's possible to have all the validation in the form of JSON Scheme. Surely the id may be missing, but that could be not a "client error" - the input is syntactically valid, and for some APIs that's enough - but will cause the server to return the empty result set.


You can't accept a write if you're going to break a foreign key constraint when you insert the row. And I struggle to imagine a case where you want to do complex validation like "multiple of x" or "matches this regex" but don't want to actually check if the thing exists.


I used matching regexes to check if a particular string actually has a time moment, in the form YYYY-MM-DD hh:mm:ss.xxx .

If you add events, they may not have FKs, you just add them to the table while generating PKs on the fly. This is useful in many cases. There could be other approaches with different cases, I suspect.


This will be possible in my programming language https://letlang.dev It's going all in on dynamic typing.

NB: It's actually already possible, since the type system is already implemented, it's just that many features of the language are not yet available.


You can generate json schemas from other things you know. They have Typescript interface to Json schema compilers. You can probably fine one that generates a schema from python too.


Flatbuffers can generate the json-schema for you out of the fbs files.


> Do you use JSON Schema?

No, but I don’t do much backend stuff, anymore.

I’m totally anal about Quality, so I want to do things like jsonSchema. I used to deliberately publish APIs in both JSON and XML, so that I could use XML Schema to validate the data, but JSON, to actually use it.

XML Schema is a huge PItA. I don’t like auto-generated Schema (see “anal,” above), so I tend to hand-tune (or write code to dynamically generate) my Schema.

The main reason I don’t use jsonSchema, is that I don’t have a real use for it, these days.

I mostly have internal APIs (proprietary backends), so there’s no need. I do have one backend that I recently wrote, that is public, and I may consider adding a published schema to it.


> XML Schema is a huge PItA

Yes. In the past I used to use Schematron; it's quite fun and extremely straightforward, although it's better described as a set of rules / tests than a tool to describe the structure of a document.

https://en.wikipedia.org/wiki/Schematron

A Schematron equivalent in JS would be cool. Maybe it already exists?


In some sense, JSON Schema does lean in that direction. In JSON Schema anything is valid except what you constrain. For example, the schema {} will accept any valid JSON. Anything you add to the schema is kind of like a test that further constraints the schema.

If I say {"type": "object"}, now my schema will accept any valid object. If I add {"type": "object", "properties": {"foo": "string"}}, now if there is a property "foo", it must have type string. I can still have whatever other properties I want of whatever type.


I do use it, and it's a nice additional layer in forcing APIs to a) remain consistent and b) increase security. I think the only public program I'm using it in is my dynamic DNS widget:

https://github.com/chapmajs/dynamic_dns

My main interest in using JSON Schema in the above project was security related: this service sits on the public Internet, by nature I cannot restrict the sources that connect to it (road warrior type systems couldn't send DNS updates!). Having a strict schema is another layer of sanitization on what one nowadays must assume is a malicious source.


We use JSON Schema, and we have had nary a problem with storing a bunch of JSON documents in a variety of no-SQL databases for many years now (knock on wood because yikes that a bold sentence to write).

Making sure there aren't major breakages from one version to the next sure would be nice, yes. We hit some snags as we attempted to upgrade from I believe 4 to 7, especially because we'd have to deploy native apps into the wild to use the newer version, and getting native mobile apps deployed everywhere quickly is an exercise in futility.

So being able to be confident that draft X will work with draft X+1 would be pretty excellent.


The OP actually created a tool that allows you to upgrade your schemas between versions of JSON Schema. It's pretty neat, and has had the eyes of those who write the spec. https://github.com/sourcemeta/alterschema


I haven't used JSON Schema (knowingly, at least) in my professional software career. I wonder why that is. There are things that seem to supersede it, namely type systems. GraphQL types with compiled clients wouldn't have a need for this (I don't think?). Neither would Golang generated clients from GraphQL specs, nor gRPC/protobufs. JSON storage / data passing within applications is covered by Typescript.

For libraries that build/offer specific JSON outputs (maybe like an online editor that dumps out a JSON file on export), I would also expect that product would come with a client library with Typescript types that would give type safe access to the JSON data.

Maybe JSON schema is useful for RESTful resources to provide a payload definition of responses? And I guess consumers could then generate client definitions from the JSON schema? It seems weird to do that at runtime, so I'm guessing it would also be a compile step to generate clients from JSON schema? Or is there an intentional runtime use case?

Are there popular APIs/libraries/etc that use JSON Schema? I don't see a "used by" section on this site which could help folks understand where this sits in the modern software development industry.


Used by:

- OpenAPI/Swagger (1), which is the industry standard for REST API. Used by Kubernetes among others for resource descriptions - OpenRPC (json-rpc standard) - Most YAML based configuration files.

GraphQL doesn't quite supersede JSONSchema because it doesn't deal with validation. (Something like Cue (https://cuelang.org/) might)

Type systems are not typically cross-language, which is where JSONSchema tends to be used a lot.

There are indeed a lot of openapi-based client generators. This one, for example, has been around for quite some time: https://github.com/OpenAPITools/openapi-generator

(1) Kinda. OpenAPI schema and JSONSchema are largely intersecting sets but there are bits in both that aren't present in the other.


I had a similar experience evaluating the state of the ecosystem and finding it not ideal. The schema format is great and schemas are great ways to validate your input / make your programs easier to reason about — but with the state of the ecosystem, it’s hard to actually make use of that validation in any but the most trivial ways.

The problem with using it directly was that the validators did not produce very usable output. At one point I found myself using it to validate big tables of AirTable-hosted data that were sent into a system as JSON blobs. Identifying the error in a field, and displaying that error to a user so they could correct it, was quite possible, but was a hack in reverse engineering a specific validator system and parsing and mapping its descriptions about where the error was. This was at its worst when you had to have a subschema that was oneOf a few different things: communicating back what was wrong was incredibly painful, as there was basically no metadata coming back that could point you at the problem, just “this object has to be one of a few things, but wasn’t.” In my case I was able to add a second pass of validation if that error came back, validating a specific subschema depending on what type of thingy had been submitted exactly, then mapping that back to the original input fields, before translating those into hyperlinks where the operator could take action.

It was further compounded by a library that referenced any shared sub-schemas by loading the URI. I had to write my own system to load them up at application boot instead, and lock down the load-over-network capability for sanity’s and safety’s sake.

I didn’t see things that were much better when I attempted to work with it in Ruby, either.

The best way to use JSON Schema in a Python API project, in the end, seemed to be to use FastAPI to output the OpenAPI schema and check that in as an artifact (and test it being up to date in the unit tests) so that diffs in the API were obvious at code review.


Postscript — a surprise good part of schemas was validating all the data our code produced before sending it out. The code would return 500 rather than going out of spec. It made things easier to find and fix.


Aside from your passing mention of Go, it sounds like your working life is almost exclusively in Typescript. "A client library with Typescript types" is not a substitute for a schema that can be used in multiple languages.

(I have the opposite working life to you: I use lots of languages but none are based on JS. So I don't feel qualified to say this, but I don't see how a mish mash of hand written client libraries is a substitute even if you do only work in Typescript. What if I write my own JSON data at the application level? Do I have to write my own mini Typescript library to encode and decode it? It's going to be less concise than specifying the format in a schema.)


glTF is a 3D model format. It's a JSON file with optional binary data. JSON Schema is used to define the JSON part. The properties reference in the spec is autogenerated from the schemas: https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html#pr...


I've used JSON Schema for specifying YAML config schemas.


My struggle with JSON Schema is that not everyone uses it! I do a lot of work with various APIs and ingestion systems that accept JSON, but they don't provide a JSON Schema for me to validate my payloads against. Instead, I often have a human being on the other side emailing me and saying, "Your data structure isn't correct."

It's mildly infuriating. I've taken to using Cuelang[0] to write my own validators based on their specifications. (Using Cue because JSON isn't the only data format I have to support.)

I wish there were an easier way to take some documentation and generate JSON Schema from it. I can take the sample JSON in the docs and generate one, but those samples don't usually contain all the edge cases that the systems complain about, so it's not super useful.

[0] https://cuelang.org/


I reviewed JSON Schema libraries for Haskell:

https://github.com/sshine/library-recommendations/blob/main/...

My current impression is that JSON Schema is nicer in theory than in practice.


Thanks for sharing. This is useful commentary to see. We try to be helpful to maintainers. If any have questions, the JSON Schema server has an implementers channel! We would like to be able to support implementers financially, but we need other companies to add their support in order to do that. And for that, we need to better communicate our value proposition. We're working on it!


It really needs comments. Unfortunately it'll never be backwards compatible. It's very helpful for debugging


We already use comments in JSON without JSONC in lots of projects. Adding a { "//": "My Comment" } line works and most API's ignore excess fields.


A good json schema validator will usually reject excess fields.


Actually, currently, this is not the case. The specification explicitly allows for additional fields for the purposes of extensions. We are considering changing that, and only allowing pre-defined fields, or having a way to mark specific fields as limited extensions.


Huh, you're right. There is additionalProperties, but I mistakenly thought `"additionalProperties": false` was the default.


It is not =]

Additionally, the value is a "schema" as opposed to a boolean, meaning you can apply a subschema to any additional fields. Useful if you for instance want to make sure any additional fields start with a specific prefix.

Booleans are valid schemas. It's subschemas all the way down! I wrote an article around this topic: https://json-schema.org/blog/posts/applicability-json-schema...



I guess I mean like commenting out a whole line like in JavaScript


> It really needs comments. Unfortunately it'll never be backwards compatible. It's very helpful for debugging

If you want to have comments, consider using YAML for definitions where possible as it is a superset of JSON[0]:

  YAML is also a superset of JSON, so JSON files are valid in YAML.
0 - https://www.redhat.com/en/topics/automation/what-is-yaml


Just for fun, I wondered if YAML '#' comments could exist within a multi-line formatted JSON file.

I cannot say the below holds for all YAML libraries, but can say the below works.

Assume there is a file named "foo.yaml" which has the following content:

  {
    # this is a comment
    "hello" : "world",
    array : [
      # first number
      1,
  
      # second number
      2
      ]
  }
Running:

  ruby -ryaml -rjson -e 'puts JSON.pretty_generate(YAML.load(ARGF))' < foo.yaml
Yields the below without error:

  {
    "hello": "world",
    "array": [
      1,
      2
    ]
  }
HTH


you can reduce further this by running:

    cue export foo.yaml --out json
If you have an existing JSON file, add // comments wherever and run:

    cue export cue: - --out json < file.json
The `cue: -` tells cue how to interpret stdin and has many options


To me, saying that JSON - not JSON Schema - needs comments is like saying an array - the good old "int[3] foo = { 5, 6, 7 };" - needs comments. I'm sure there are ways to 1) use it profitably without comments and 2) add comments in non-invasive way if really needed.


Isn't that a great example of something that would benefit from comments; a generically named int array with no context?

Regarding the ability to add comments to JSON, it can be done in a cumbersome manner, such as adding a field to an object called "comment", but it is a hack since this wont work for arrays anyway.


The point is that JSON is a data structure, comment is rather unnatural there. It can be added to JSON though, just like an array could be wrapped in an object with a "comment" field. Usually array name serves as a crude comment. Literate programming isn't very popular.


> The point is that JSON is a data structure

JSON is not a data structure. JSON is a data language. A computer program is a data structure and doesn't need comments, a programming language is a format for a human to read, write, and interact with automated tooling, whoss content is a program, but for the human uses, comments are important.

Similarly for data languages.


This does not help if you are integrating against someone else's API.


I agree. Yes, you can support comments using a non-standard method, but the standard needs to include them.


I put my JSONSchema in .json5 files. I then convert the json5->json with a tool before it’s interpreted. One extra step, not a huge deal.


Regarding explicit guarantees, I'd consider a "forward-compatible" way such that when - and if - a breaking change needs to be introduced, it's possible to automatically convert old JSON schemas into the new one. JSON Schema version which would select which processing to choose could be an option, but not sure if there couldn't be better.

As is often the case, good domain knowledge can help to choose which features are "real" and can stand and which are more doubtful and have a higher chance to fall out of favor later.


A project exists which does this, and the team even support its development: https://github.com/sourcemeta/alterschema

Personally, I'd like to see it become officially supported. We have work to do!


Just last week, I added the ability to export Umami[1] recipes as Recipe JSON Schema[2]. Writing the code for it was quite pleasant thanks to schema-dts[3].

[1] https://www.umami.recipes [2] https://schema.org/Recipe [3] https://github.com/google/schema-dts


I've been storing some document data in my app in a protobuf in part because I thought protobuf's support for an evolving schema would help with this data aging gracefully over time. But protobufs in js are worse than I thought they would be so I'm considering trying the same strategy out with JSON Schema instead. Any reason why that wouldn't work? JSON is so much easier to work with in js than protos.


I built a tool which uses JSONschema for validating custom policy documents as part of a GitHub Action pipeline. The documents get processed for updating HashiCorp Vault, and being able to validate/restrict fields like "email address domain" keeps InfoSec happy.

Part of my motivation to build the tool was to learn more about JSONschema.


it would be nice if you did provide a guid of things tool implements get commonly wrong

like you would expect that people realize "additionalProperties" are additional to "properties" but pretty much any doc or code gen tooling I have used has some problems with getting that right (various problem not always the same)


I use json schema to communicate the potential shape of JSON input and output to co-workers who have to produce or consume said JSON, and use it to validate incoming requests via valijson. It's somewhat clunky. Stability guarantees? Well, it's all namespaced anyways, isn't it.


Could you elaborate where you see the namespacing?


Are there any good libraries for translating data from one json schema to another (i.e. data upgrade)? I'm not worried about a particular language, I'd just like to see what is available.


Many of the orgs I've encountered through the years focus on mechanisms to implement and share APIs by way of method signatures. Is it possible to build an ecosystem around data schemas?


just commentary on JSON Scheme itself:

I mainly seem to have no leapfrogged the need for this

There was a time when the JSON parsers across different platforms were so finicky that schemas seemed like a solution, but the parsers got better at not breaking on unexpected data types or JSON structures

I'm also usually able to do system design to return a smaller subset of data, with consistent data types

Or someone else's JSON or JSON output has its own documentation that already tells what something is and how it should be parsed


Validating JSON parsers is not really what JSON schema solves.


we agree on that, one of the examples of using JSON schema is regarding the range of values of a key


We just use protobufs JSON serialisation. It's a lot easier.


I'm working on JSON BinPack (https://www.jsonbinpack.org) for precisely these use cases. It is a WIP, but hopefully will be ready soon.


Allow for single quotes when they are the only quotes to define a key or value. Similar to python strings. This would be huge for me.


Just used it last week for some extensions to existing schema.


Schema.org




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: