JSON for Linking Data

x-complexity · on March 27, 2023

I'll be the first to admit this: All I'm seeing here is the JSON version of the semantic web.

https://en.wikipedia.org/wiki/Semantic_Web#Example

Fundamentally, it suffers from the same problems that the semantic web suffers from as well, primarily the top of my concerns being link rot & scenarios involving at least one link in those examples being inaccessible/unrecoverable.

For the Semantic Web / JSON-LD to work, the infrastructure behind it needs to permanently keep all of the necessary sources that a file relies on (i.e. permaweb). Otherwise, the entire knowledge graph suffers from a left-pad style situation, wherein the removal of a critical piece of definition(s) will cause the unparsability of all documents reliant on it for definitions.

hobofan · on March 27, 2023

Yes, JSON-LD is just a different serialization format for RDF.

For anyone looking for a solution to the link rot problem (as well as versioning and dependency management), they might be interested in Plow[0].

We built it due to frustration with exactly those drawbacks of the semantic web/data space. At the core it builds on a model very similar to Cargo, with a public index and checksummed artifacts, that allows for easy mirroring for use-cases where all transitive links always need to be resolvable.

We are also looking to build out some tooling in the future that makes keeping up those properties easier, such as a linter that ensures that e.g. all referenced concepts in the dependency tree have a definition. (We already have a primitive version of that, but that's just on a single package level.)

[0]: https://plow.pm/

geysersam · on March 27, 2023

What's the advantages of plow over ipfs?

hobofan · on March 27, 2023

IPFS alone only gives you content addressing. So you'll need at the minimum some higher-level IPLD structure that allow you to express structures such as an ontology, etc.. That can be based on the Plow model, if you want.

If you want a workable ecosystem that enables many decentralized parties to collaborate, the model of not-too-big-not-too-small packages that are interconnected via abstract dependencies (embodied by SemVer ranges), is still the gold standard to achieve this in my opinion. And breaking the mold of traditional monolithic slow-moving ontologies with high technical barriers to publish is one of the main motivating factors for building Plow (together with building a stable ontological layer that you can build other software on).

I actually have worked on bridging the gap between semantic data tech and IPFS in the past[0] (both on a fine-granular per-concept level and a more coarse-grained ontology level), and I can just say that there are a ton of additional challenges if you want to do it right (and semtech is already challenging enough as it stands today).

Conceptually, the infrastructure parts that make up Plow (the index and artifact store) are also flexible enough that you could distribute them via IPFS.

[0]: https://github.com/rlay-project

mananaysiempre · on March 27, 2023

It also comes from the exact same line of development at W3C[1]. LD, or “linked data”, is W3C’s attempt at rebranding the “semantic web” (and, despite my general dislike for rebranding attempts, seems more descriptive). This is the first time I see LD expanded as “linking data” , though.

[1] Manu Sporny [JSON-LD 1.0 editor and CG chair], “JSON-LD and Why I Hate the Semantic Web” (2014), http://manu.sporny.org/2014/json-ld-origins-2/ (down right now but that looks temporary?). Unfortunately, he did not participate in JSON-LD 1.1 (as much?), and the spec once again returned to its RDF-jargon-filled equilibrium.

eyelidlessness · on March 27, 2023

Yep it’s just hypermedia. And as we know, cool URLs don’t change. Of course there are hypermedia solutions to even changing URLs, but that would require investing in using the actual platform rather than constantly rediscovering parts of it.

MilStdJunkie · on March 27, 2023

Is "left-pad style situation" (aka the Azer/Kik/NPM drama) the formal name for this class/type/definition of system fragility? What would this be an example of, formally, in math or systems design or whatever?

The reason I ask, is that I see it literally all the time, sometimes as a design goal, which drives me absolutely up the wall. "Let's make a centralized repository (CIR) of all acronyms, and then everyone can reference the CIR". Ah yes, sure, great idea and the first time someone changes Air Conditioning to Alternating Current ALL YOUR DOCUMENTS CHANGE[1].

If "left-pad style" has a formal name, I might be able to offer a counter-argument in the form of math.

[1] Now, obviously, the CIR problem does have a solution, I'm aware of that, but it's not a solution that lies in the domain of the document markup - it's a change process problem, which is in another domain, which means extra eyeballs and red tape. So I guess the broad definition, I am looking for, is what is wrong with these sorts of functionalities that break out of their domain like this? What do you call that?

__MatrixMan__ · on March 27, 2023

Agreed, there's an overly-optimistic assumption that other people will go out of their way to maintain referential integrity for applications that they don't know about.

I think it would be better if it was done in a content addressed way so that you can trivially host the bits you're relying on without having to also be an authoritative source for them.

Once we have that figured out we can socialize it onto the users. Like either pay $1 for access or pin whichever nodes in the app's dependent knowledge graph hash to your user ID mod 1024. Having helped with your 1/1024th share of the hosting, you get access for free.

vid · on March 27, 2023

Maybe it's not "link rot," it's just information that's no longer relevant. Design your systems to be able to handle that. And you'll be able to interop and reason with all kinds of data.

hobs · on March 27, 2023

Except it's highly relevant and gone anyway.

Designing a system to operate with partial data (or backing it up everywhere) has pretty hard tradeoffs when you are dealing with a system the size of the internet, any piece of content can go from a few bytes to terabytes.

eurasiantiger · on March 27, 2023

So we add a DHT and use magnet links. The permanence problem is then at least distributed and the semantic graph is no longer a collection of single points of failure.

vaughan · on March 27, 2023

Why isn’t semantic web more popular inside companies?

This expandable graph seems like the purest representation of all data in a company.

Instead we have several different databases each with their own schemas. All with entities related to one another, but without these relations defined anywhere, and impossible to query.

RDBMS are implementation details of this graph that involve a heap of manual work. Wouldn’t it be great if the rdbms was generated from this graph.

There are so many interconnections between the data involved in the design and implementation of a product and the product database itself.

Imagine hovering over a UI element and seeing who implemented it and when, what project it was part of, why the project was initiated, and what kpis and goals it contributes to.

blowski · on March 27, 2023

At the point you create the data, you don't have link rot. At the point you create the link rot, you don't notice it. At the point you notice it, you don't have time to fix it. At the point you try to fix it, you don't have a full list of rotten links. And those who keep starting the cycle by creating links don't care that ends up this way because there's no immediate impact on their silo.

Thus to fix it, you need an organisation-level project, but it's hard to identify what business value you get from it.

Lutger · on March 27, 2023

I have had this question for years myself, and I still think it has a lot of potential for companies.

For new tech to take off, multiple things need to be true and often the social factors are the most important. The semantic web tech seems mostly driven by scientists and highly specialist companies. The barrier for developers is still quite high and a lot of concerns have been poorly addressed. Probably the most important merit of json-ld is improving the usability for developers a bit, making it less difficult to take advantage of the semantic web.

Furthermore, most organizations subdivide the work in silos (teams) and have clear subdivision of goals. Anything you are doing that isn't directly contributing to the goals assigned to you or your team is potentially damaging your standing and rewards, whether those are promotional, monetary or simply recognition and praise. After all, this effort cannot be spend in your main focus.

So this comes together with my previous point: it takes a lot of 'extra' time adapting the data in your silo to work with the semantic web of your company, and there are usually zero incentives for doing so. This is true even if it would be quite valuable to the company as a whole, because individuals and teams don't act from that perspective. Thus, it simply doesn't get done, since it only really works if (nearly) everybody is onboard.

Its like the famous decree of Jeff Bezos demanding everything to be available via an api: it takes a single minded visionary (or dictator) to push everyone to do this 'extra' work and get to the point where the investment pays off.

vaughan · on March 27, 2023

I think the only hope comes from going graph-first and using a graph db as the source of truth. I think programming as a whole would be in a much better place if graph dbs beat out rdbms' and they attracted more R&D for perf optimization.

hobofan · on March 27, 2023

DISCLAIMER: Contains some self-promotion

> Imagine hovering over a UI element and seeing who implemented it and when, what project it was part of, why the project was initiated, and what kpis and goals it contributes to.

That's exactly what we are building at Field 33[0] with a package manager for ontologies (Plow[1]) as an underpinning to get a good level of flexibility/reusability/colaboration on all the concepts that go into your graph.

------

> Why isn’t semantic web more popular inside companies?

As part of building Field 33 we obviously also asked ourselves that question. My rough hypothesis would be that ~10 years ago semantic tech didn't provide tangible enough benefits, and since then got left behind in the dust by non-semantic tech.

That caused a tech chasm that widened and widened, where the non-semantic side became a lot more accessible with quasi-standards (REST) and new methods of querying data for frontend usage (GraphQL), while the status quo of the semantic web space is still SPARQL (a query language full of footguns). Same thing goes for triple stores (the prevalent databases in the space) that roughly go through the same advancements as RDBMs, just at a much slower pace.

It also doesn't help that most work being done in the space comes from academia rather than companies that utilize it in production scenarios.

There is quite a nice curated list of problems/papercuts about the semantic web/RDF space[2].

Overall, despite the current status quo, I'm quite optimistic that the space can have a revival.

[0]: https://field33.com

[1]: https://plow.pm

[2]: https://github.com/w3c/EasierRDF

vaughan · on March 27, 2023

Interesting work you guys are doing.

I think a big reason for lack of popularity is familiarity of graphs in general. Most people wouldn't know of a good graph editor I would guess. Everyone knows documents and spreadsheets/tables/lists/folders, but it's rare to see a graph anywhere. Online and offline. If I have to think about an example of a graph in real-life what comes to mind is: a corkboard with strings going between them to track a criminal in a tv show, or a subway network map. In VSCode/IntelliJ, there is not one graph I look at frequently, even though pretty much all code is a data flow and dependency graph.

I think part of this is that graphs always appear like a complicated mess, and we prefer hierarchies and categories.

I would really like a tool like Airtable for graphs. You start with spreadsheets with columns relating to other columns, and then you view the graph next to it as you go. I don't know of a popular tool that does this. It's funny because behind-the-scenes of spreadsheets there is always big dependency graph that updates cells as changes come through.

All the specs feel overly-complex too. Like a relic from the XML/SOAP days. For such a simple base concept (subject-predicate-object / entity-attribute-value) it feels like overkill. It's interesting though thinking about how JSON won, while being extremely inferior to XML. Although I think this ability to move fast has left us with a ton of untyped data lying around, and plenty of ad-hoc data transformations.

I'm interested to read into the EasierRDF doc you sent - looks very interesting.

hobofan · on March 27, 2023

> You start with spreadsheets with columns relating to other columns, and then you view the graph next to it as you go.

We are actually starting to crystalize something like that in our app. It's currently more read- than write-oriented but I think we are getting there. :)

> It's funny because behind-the-scenes of spreadsheets there is always big dependency graph that updates cells as changes come through.

Yup! Actually one dream that I have for our platform is that we'll build a Excel importer, that will import a fully fledged spreadsheet representation, including formulas that continue working (VBA macros excluded). Our platform does already support the core pieces required for this to work, there are just a ton of nitty gritty details to work out about how this would nicely integrate into the product in a way that isn't too cumbersome for our end-users.

> All the specs feel overly-complex too. Like a relic from the XML/SOAP days.

Oh, I could rant about that for days... RDF itself is already a bit weird in that aspect (the provide both the possibility to specify a datatype and a language tag, but not both at the same time to specify e.g. "German Markdown"). I don't think any comparable standard today would bake in localization on such a fundamental layer.

tokinonagare · on March 27, 2023

> Why isn’t semantic web more popular inside companies?

Because SemWeb doesn't solve any real problem while create a ton of new ones and uses a complex data model. Any company that would benefit from using graphs will be better served using a simpler and more general graph model and database.

SemWeb is nothing more than stringly typed data with an URI fetish.

iSnow · on March 27, 2023

>Why isn’t semantic web more popular inside companies?

Because it offers no protection against some team inside the company breaking the whole web by moving to a different URI or refactoring their domain model in incompatible ways. A department pays for some subgraph tailored to their needs and they are not interested in financing this for the whole org.

Industrial companies use master data management systems, they are centralized and are considered the single source of truth, everyone else builds on them.

dathinab · on March 27, 2023

Because it's not what you thing and doesn't solve the problems most companies have.

Firstly jsonld is only a format for serializing semantic metadata, nothing more, so it only specifies how you can attach that metadata, but not what exctly can be attached in what way with what subtle meanings.

Secondly it's one of this very generic tools which "can solve everything" but in practice often only make things more complicated as long as you don't have enough very complicated (and matrue!) tooling around it.

And that's where the problem lies, the availability and maturity of this tooling is limited and awareness or experience about which tooling is good and mature and which isn't often is missing too.

So it's easy to end up with adding a lot of complexity with very little gain, hence why most avoid it. Through some companies had success with it, but their are often on the size of SAP or similar.

> This expandable graph seems like the purest representation of all data in a company.

Yes, but requires you to have all the data in the right format correctly annotated, correctly maintained and changed and available often non of this is true.

> Imagine hovering over a UI element and seeing who implemented it and when

This would need a proper integration of json-ld in the version manager and development flow, project manager, and probably more. There is a good chance that whatever tooling you use has not integration for any of this points, not even considering that you then still need to bind the data together (query) in a usable performant way which might having a non graph db for caching common queries etc. Each of this points being likely a non trivial sub project, one which most companies wouldn't want to afford for some minor benefit like having such a tooltip in a UI builder.

Now if everyone would always support the semantic web, and would agree on common annotations for all kind of metadata (far beyond the scope of the jsonld spec) and would add accessible apis based on this to their product etc. _then yes it would be grate_.

vaughan · on March 27, 2023

> would need a proper integration

Yeh I think it's way too complex to shoe-horn in later with minimal benefit.

But new databases are being built at new companies every day. A lot of new companies I see, build out their first MVPs, CRMs, etc. adhoc in Airtable. Then some mockups in Figma. Then they bring in the devs to build a RDBMS.

Now if all these low-code tools worked on manipulating a single graph instead of building a bunch of disparate relational databases...that would be cool. And then you just need a good graph database to build web apps with.

I think a bunch of tools need to be re-invented with this in mind.

> would agree on common annotations for all kind of metadata

Thinking about a project/task manager for example. They all pretty much have similar schemas at this point. There is also a huge industry in connecting tools together. Zapier/IFTTT/Unito/etc. Everything is adhoc though or proprietary. Standardization is slow and boring.

I think the best thing would be if someone made a schema for this that gained wide adoption, and then the transformers from these existing applications fed into this graph. Basically using a graph db instead of relational or key-value.

dym_sh · on March 27, 2023

same reason wikis arent, it takes too much effort to maintain and keep it semantic instead of copypasted pile of text

you'd need CMS, CRM, knowledge base, documentation, source control, chat/forum, inventory, point of sale, customer support channels, issues ticketing system, and every other thing interconnected to with each other

do you know of anyone offering this as a complex turn-key integrated solution at a reasonable price?

vaughan · on March 27, 2023

Sounds like an opportunity.

In every category you mentioned there are new products released and widely adopted all the time. Look at the rise of ClickUp for example in such a crowded space as project management.

I don't think its too farfetched to one day see a new entrant offering as a key selling point that their api is just a big interoperable graph that you can easily plug your company into...and that's not just GraphQL.

WJW · on March 27, 2023

I know of several ERP vendors offering such products at a price that is reasonable for such complexity. "Reasonable" is not the same as "cheap" though.

dym_sh · on March 30, 2023

well, i would like a couple of links if that's not too much to ask

turnsout · on March 27, 2023

JSON-LD is the reason ActivityPub hasn’t taken off even more. It’s just such a pain to deal with variables that could be URLs or entire object trees. Especially for strongly typed languages, it’s a nightmare.

berkes · on March 27, 2023

It isn't.

It isn't hard: I've built several Serde impls for Rust that handle JSON-api and JSON-LD setups (for a.o. ActivityPub). It's just an Enum and a type with some logic to parse one or the other. An if/else isn't hard.

And it certainly is not the reason AP hasn't taken off. For one, because AP has taken off, and secondly because of all the reasons it would potentially not have taken off, the underlying dialect of JSON is the least of all concerns.

mariusor · on March 27, 2023

You're right about serializing/de-serializing the data not being difficult, but the problem lies more in how you handle that data down the pipleine after you deserialized it.

Maybe rust has a better typing system that can handle the union of MaybeSliceOfStrings|MaybeSliceOfObjects|MaybeString|MaybeObject|MaybeNull[1] types better than Go does, but I can tell you - after about 4-5 years of trying to implement a clean Go API for processing ActivityPub payloads that the complexity is quite high, at least if you want to build something that can work with more than one client/service.

[1] I believe that a good implementation should handle even mixed cases, where you have a slice of both objects and strings.

berkes · on March 27, 2023

It sounds like the issue is not Json-* formats but Go's type system.

And I'm not familiar enough with large go setups, but generally -in any language- anything that must handle data "down the pipeline" should ideally not be bothered about whether this link (or meta, or paginator, or whatever) is there, is of Type-T and so on: that's for middleware/controllers etc to ensure, and to transform. In other words: your business-logic should ideally never have to bother about a link being either a string or a struct or a null etc: it should be able to assume it's a MyLink.

mariusor · on March 27, 2023

I'm not sure I understand correctly what you mean by assuming everything to be a "MyLink" but that's not always possible. It depends first on how deep the rabbit hole you want to go with dereferencing URLs - what logic will you employ to decide what level of dereference to full objects you stop at, and, more importantly on being authorized to access the objects pointed at by said URL. Sometimes ActivityPub activities come with properties that are IRIs that when dereferenced are giving you 404 or 401. What does your middleware do then: Retry, Abort, Fail?

Also, I'm not sure I agree with your thesis that business logic needs to worry only about a single type of object, because the ActivityPub specification recognizes all of the types I mentioned in my first reply as being "valid" and if you consider their superset as being the "canonical type" you'll be wasting quite a lot of storage (memory and disk wise) when all you have access to is the plain URL.

talideon · on March 27, 2023

That's one of my root annoyances with Go's type system: it lacks sum types. Rust has sum types in the form of enums, and thus unions. People don't appreciate how useful sum types are until they need them, and then they discover how much easier they make things.

UncleEntity · on March 27, 2023

Couldn’t you just have a RDF database do the heavy lifting and use go for the application logic?

I did some deep diving into RDF stuff not too long ago and all the FLOSS databases I found didn’t do json-ld but it also didn’t look all that hard to add. Just adapt the existing xml code methinks.

mariusor · on March 27, 2023

Erm, I don't want an rdf database in my application, thank you. :D

The main purpose for which I chose Go for the development of this project was simplicity of the build/deploy pipelines. My targets are small communities and single enthusiasts which shouldn't need a dedicated SRE to deploy a service in the Fediverse.

lcrz · on March 27, 2023

It's only a pain if you try to look at it as a JSON tree instead of a knowledge graph.

seabass-labrax · on March 27, 2023

JSON-LD is an open standard for expressing RDF data as JSON. RDF is the most fundamental part of the W3C's Semantic Web and Linked Data projects, which began at the end of the 1990s to make the Web more machine-readable and continue to make steady progress.

If you're not familiar with RDF, I would suggest starting by reading the RDF 1.1 Primer[1], using the RDF 1.1 Concepts and Abstract Syntax as a reference[2] if something is confusing. I don't think you'll regret spending the time; RDF is a fascinating field!

[1]: https://www.w3.org/TR/rdf11-primer/

[2]: https://www.w3.org/TR/rdf11-concepts/

whartung · on March 27, 2023

> I don't think you'll regret spending the time; RDF is a fascinating field!

I love RDF. I'm using it for a project of my own, which has nothing to do with semantic web. I'm not using any of the standard ontologies, or OWL, or anything like that. I just wanted an extensible, schemaless data model, because I have no idea how folks will want to correlate data with each other. Being able to hang anything off of anything is important.

I'm also not processing trillions of triples for my work (if you ever look at triple stores, they like to taut how fast they can import vast troves of data).

Mind, I'm no doubt reinventing some stuff that I could probably be using a standard vocabulary for, but since global interoperability is not a goal, I'm not that concerned about it. I also have some concept of structure that I'm capturing, and encoding, and I perhaps could be using something like OWL for, but this is ad hoc doing things as I go and while there may well be gems within those spaces I can leverage, I'd rather make progress on my own path at the moment than fall into those large maws of times trying to suss them out to see if I could adapt them to my project.

SPARQL is an odd duck I'm still wrapping my head around (30 years of SQL warps ones point of view), but at least it exists, and I can use it.

ec109685 · on March 27, 2023

It’s a standard in search of anyone who wants to implement it.

Meanwhile this already works today: https://www.wikidata.org/wiki/Wikidata:Main_Page

KrugerDunnings · on March 27, 2023

I want to live in an alternative timeline where RDF was never adopted by wikidata but instead created something that solved its specific problems in a human friendly manner. People always point to wikidata as a succesful semantic web project but fail to imagine how much more awesome it could have been. First off, wikidata has little use for ontologies outside of its own domain because all the types are modeled as dynamic second order concepts. Meaning, people organise knowledge using web pages and those webpages are used to structure other knowledge.

A simple Sparql query for wikidate would look something like this

    select *
    where {
        ?item wdt:Qe23oieke wd:Ert923Ee3451 .
        service wikibase:label { bd:serviceParam wikibase:language "en" }
    }

This incomprehensible nonsense is the result of two choices: 1) It had to be language independent because you know the world is bigger than just the anglosphere. 2) It had to be "machine readable", because then the computer can work together in harmony.

Putting the cart before the horse, or in this case the machine before the man.

vid · on March 27, 2023

Wikidata was consciously designed without consideration for the semantic web/rdf. I remember being dismayed by this, but they added some facilities later. It is designed around a purpose built data model. https://www.mediawiki.org/wiki/Wikibase/DataModel

hobofan · on March 27, 2023

Wikidata builds on RDF at its core (the data model behind JSON-LD), and also supports queries with responses in JSON-LD format.

speedgoose · on March 27, 2023

It’s always wikidata.

CharlesW · on March 27, 2023

For those more familiar with it, didn't this decade-old standard die with other efforts the Semantic Web?

luckylion · on March 27, 2023

It's used to provide microdata to Google, e.g. to talk about the fake reviews you host on your site, which Google parses and rewards by showing a star-rating next to your result. It won't die as long as Google continues to use it.

morelisp · on March 27, 2023

With industry bullshit starting to pare down, time for the academia bullshit to ramp up again!

endtime · on March 27, 2023

The canonical JSON mapping of protocol buffers uses a JSON-LD @type field for google.protobuf.Any.

Other than that, the only use I've ever run into is for Schema.org, which is as you say a "semantic web"-era initiative.

dunham · on March 27, 2023

Because of Google, json-ld is very common on recipe web sites on the web. (Recipes for cooking.) It's typically found in a script tag and usually contains the full recipe, even if the page itself is paywalled.

This can neatly solve the problem of extracting the recipe to save, and the complaints about having to dig through a page of text to read a recipe. I wrote a little browser plugin that pops up the recipe in a window and offers to save it to Obsidian (using a custom url and hammerspoon).

I don't know if the schema.org stuff is in wide use outside of that. It would perhaps be useful to enable a browser to pull things like appointments and contacts out of a web page, but I suspect the world will just go the way of data detection instead.

karmahunter1234 · on March 27, 2023

Google also uses them for other types of structured information to show what they call "rich results": https://developers.google.com/search/docs/appearance/structu...

x-complexity · on March 27, 2023

It's reductively the same thing as the Semantic Web, just for JSON. It still suffers from the link rot issue, as well as the left-pad issue.

https://news.ycombinator.com/item?id=35322592

berkes · on March 27, 2023

Every time I get to work with JSON-LD (and json-schema and so on) I keep asking myself: how is this better than XML? (I am convinced it isn't).

matja · on March 27, 2023

I was just thinking this looks similar to a DDI (Data Documentation Initiative) citation element, but the advantage of the DDI XML schema is that one can annotate any existing element in an XML document by importing the DDI namespace, which shouldn't change how the document is otherwise interpreted by applications, but this JSON schema as far as I can tell is a stand-alone document?

zerocrates · on March 27, 2023

JSON-LD is definitely designed, at least partially, to do the "annotate a preexisting document" thing. You can add the "context" and have it specify the RDF semantics that should apply to each preexisting key.

lcrz · on March 27, 2023

json-schema has nothing to do with JSON-LD. If you want shape constraints on your JSON-LD data, you'll need SHACL: https://www.w3.org/TR/shacl/

JSON itself has a much simpler specification than XML, and feels much more lightweight for a lot of devs.

Under the hood though, linked data is all RDF.

HelloNurse · on March 27, 2023

"Over the hood" JSON is less complicated but less correct than XML, not a good positioning for long-term data storage.

seabass-labrax · on March 27, 2023

Are you claiming that A: plain (non-LD) JSON is less correct than plain XML, or that B: JSON-LD is less correct than RDF/XML?

berkes · on March 27, 2023

I wasn't implying they are related in that way.

Just that both make me think "this thing has long been solved in XML, why re-invent it in JSON?"

singularity2001 · on March 27, 2023

There could be semantic conflict:

  "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"

Entity

  "spouse": "Cynthia Lennon"

String value

  "spouse": "http://my_url.com/Cynthia_Lennon.mpg"

Url

Also please use Json5 to allow comments, trailing commas and get rid of superfluous '"'

seabass-labrax · on March 27, 2023

The first line of the example on the JSON-LD website is:

  "@context": "https://json-ld.org/contexts/person.jsonld",

This is a 'context map', which is what links the short name "spouse" for the JSON key to a particular class defined in a RDF (Resource Description Framework) ontology. Here, that's <https://schema.org/spouse>. These ontologies are what make it unambiguous to the parser that the content is a IRI/URL rather than a literal string.

Alternatively, one can write JSON-LD without a context map by using the full IRI of the class as the key:

  "https://schema.org/spouse": "https://json-ld.org/contexts/person.jsonld",

lfciv · on March 27, 2023

Bit of a tangent, but it'll be interesting to see how standards designed to make the web more machine-readable are viewed/treated as more and more content on the web is ultimately consumed and provided to users higher up in the funnel via LLMs

thinkingemote · on March 27, 2023

Anyone know if JSON-LD is being used more in very recent years for AI? Saw a couple of comments elsewhere that many large companies are using it now. If so, how is is (and any knowledge graph / linked data) used in these situations?

jillesvangurp · on March 27, 2023

Haven't come across this in the wild. My impression is that most of the semantic web people that were obsessing about ontologies twenty years ago have moved on.

As for AI, we now have deep learning approaches that don't require an investment in a lot of machine readable, disambiguated data but is as good or better than we are of making sense of unstructured data. That probably explains the low interest in semantic web type stuff at this point.

seabass-labrax · on March 27, 2023

The ones that have moved on are being replaced by newcomers though - as evidenced by my multiplicity of my comments on this post, I'm one of them ;)

I would agree with your appraisal of AI - at the moment, there is a much greater return-on-investment for processing vast quantities of unstructured data than there is for meticulously curating knowledge graphs. However, I think that, in time, the balance will shift to vindicate the semantic web as people desire more trustworthiness from automated systems.

jillesvangurp · on March 27, 2023

Why? What's changed to switch the balance from machine learned back in favor of painstakingly manually curated? Sounds a bit like wishful thinking to me. I don't think that cat will jump back into the bag and zip the bag up behind itself. Machines are only going to continue to outpace humans when it comes to pattern recognition, classifying things, or making sense of unstructured data. I don't see that turning around.

Maybe there will be a market for "artisanal ontologies". But I wouldn't get my hopes up for that one.

UncleEntity · on March 27, 2023

I was pondering the other day if someone could convince one of the proprietary LLMs to spit out something like json-ld of its internal knowledge base to train other AIs.

dathinab · on March 27, 2023

yay good ol' jsonld.

I love and hate it.

Having semantic metadata weather in from of jsonld or others is pretty nice.

But then in my experiences most systems having it either:

- don't really use it, could simply not have had it, likely for problems with the metadata to not be detected

- use it but end up with a horrible UX

- use it, decent UX but extremely complex

And that is under the assumption that for projects which can use existing tooling, but in a lot of contexts there is way too little tooling for this.

zeven7 · on March 27, 2023

Is it not strange that the first example lists Cynthia Lennon as John Lennon's spouse instead of Yoko?

tgv · on March 27, 2023

It's just an example, but it does highlight another problem. If your model has a single spouse, it's going to be wrong. It needs a list, with time periods. But then, why stop at spouses? Surely you want to include relationships that were not sanctified. But why stop at single partnership? Some people have multiple partners. And people have more than romantic/partner-type relationships: parents, step-parents, children, friends, bosses, colleagues.

People are going to extend the model, and not update the records, because that's a hassle. And there will be multiple versions of a record. What you'd really want is an entity that takes care of all of this, in one place. And not machine readable, because who is going to program a machine to find out the marriages of John Lennon? Something like an encyclopedia, perhaps.

jonnycomputer · on March 27, 2023

It's actually a problem with the semantic web generally: it tends to assume that the world is compartmentalizable into neat categories. Instead, we live in a world where

(a) membership in categories is partial

(b) membership in categories is probabilistic

(c) membership in categories is contested (see Pluto for an innocuous example)

(d) definition, legitimacy, etc. of categories is contested

(e) category boundaries are vague, e.g. Sorites paradox:

Base step: A one day old human being is a child.

Induction step: If an n day old human being is a child, then that human being is also a child when it is n+1 days old.

Conclusion: Therefore, a 36,500 day old human being is a child.

(f) membership in categories changes with time

etc...

There are hot things, and there are cold things, and then there are things that are neither hot nor cold.

dym_sh · on March 27, 2023

next step n the json path is efficiency: https://bsonspec.org/

encryptluks2 · on March 27, 2023

I used JSON-LD spec because it is preferred by Google, but it requires a lot of duplicate effort of the actual page content. Its almost as if it is designed for sites that use heavy JavaScript frameworks and not for HTML/CSS sites.

SquareWheel · on March 27, 2023

Usually the dynamic data would be generated by the server. eg. if you have an eCommerce site, it might update prices, stock remaining, applicable regions, etc. That would then be pulled into Google Shopping or whatever other service might consume it.