Imagine if instead of kneecapping XHTML and the semantic web properties it had baked in, Google had not entered into the web browser space. We might be able to mark articles up with `<article>`, and set their subject tags to the URN of the people, places, and things involved. We could give things a published and revised date with change logs. Mark up questions, solutions, code and language metadata. All of that is extremely computer friendly for ingestion and remixing. It not only turned search into a problem we could all solve, but gave us rails to start linking disparate content into a graph of meaningful relationships.
But instead Google wanted to make things less strict, less semantic, harder to search, and easier to author whatever the hell you wanted. I'm sure it has nothing to do with making it difficult for other entrants to find their way into search space or take away ad-viewing eyeballs. It was all about making HTML easy and forgiving.
It's a good thing they like other machine-friendly semantic formats like RSS and Atom...
"Human friendly authorship" was on the other end of the axis from "easy for machines to consume". I can't believe we trusted the search monopoly to choose the winner of that race.
I think in this case semantic web would not work, unless there was some way to weed out spam. There are currently multiple competing microdata formats out there than enable you to specify any kind of metadata but they still won't help if spammers fill those too.
Maybe some sort of webring of trust where trusted people can endorse other sites and the chain breaks if somebody is found endorsing crap? (as in, you lose trust and everybody under you too)
> I think in this case semantic web would not work, unless there was some way to weed out spam.
That's not so hard. It's one of the first problems Google solved.
PageRank, web of trust, pubkey signing articles... I'd much rather tackle this problem in isolation than the search problem we have now.
The trust graph is different from the core problem of extracting meaning from documents. Semantic tags make it easy to derive this from structure, which is a hard problem we're currently trying to use ML and NLP to solve.
>Semantic tags make it easy to derive this from structure
HTML has a lot of structure already (for example all levels of heading are easy to pick out, lists are easy to pick out), and Google does encourage use of semantic tags (for example for review scores, or author details, or hotel details). For most searches I don't think the problem lies with being able to read meaning - the problem is you can't trust the page author to tell you what the page is about, or link to the right pages, because spammers lie. Semantic tags don't help with that at all and it's a hard problem to differentiate spam and good content for a given reader - the reader might not even know the difference.
In the interests of not causing a crisis when Top Level Trust Domain endorses the wrong site and the algorithm goes, "Uh uh," (or the endorsement is falsely labeled spam by malicious actors, or whatever), maybe the effect decreases the closer you are to that top level.
But that's hierarchical in a very un-web-y way... Hm.
The internet is still kind of a hierarchy though, "changing" "ownership" from the government DARPA to the non-profit ICANN.
And that has worked... quite fine. I have no objections (maybe they're a bit too liberal with the new TLDs).
Most of the stuff that makes the hierarchies seem bad are actually faults of for-profit organizations (or other unsuited people/entities) being at the top, and not just that someone is at the top per se. In fact, in my experience, and contrary to popular expectation, when a hierarchy works well, an outsider shouldn't actually be able to immediately recognize it as such.
> Imagine if instead of kneecapping XHTML and the semantic web properties it had baked in, Google had not entered into the web browser space. We might be able to mark articles up with `<article>`, and set their subject tags to the URN of the people, places, and things involved. We could give things a published and revised date with change logs. Mark up questions, solutions, code and language metadata.
Can you explain in technical details what you think was lost by Google launching a browser or what properties were unique to XHTML?
Everything you listed above is possible with HTML5 (see e.g. schema.org) and has been for many years so I think it would be better to look at the failure to have market incentives which support that outcome.
Good machine-readable ("semantic") information will only be provided if incentives aren't misaligned against it, as they are on much of the commercial (as opposed to academic, hobbyist, etc.) Web. Given misaligned incentives, these features will be subverted and abused, as we saw back in the 1990s with <meta description="etc."> tags and the like.
I don't think there's any reason to think google was responsible for the semantic web not taking off. People just didn't care that much. It may have been a generally useful idea, but it didn't solve anyone's problem directly enough to matter.
If WordPress outputs semantic output that instantly gives you a lot more than 0.0001%. The rest would follow as soon as it improves discoverability of their content
Wordpress can't magically infer semantic meaning from user input any better than Google can. The whole point of the semantic web is to have humans specifically mark their intention. A better UI for semantic tagging would help for that, but it would still be reliant on the user clicking the right buttons rather than just using whichever thing results in the correct visual appearance.
But instead Google wanted to make things less strict, less semantic, harder to search, and easier to author whatever the hell you wanted. I'm sure it has nothing to do with making it difficult for other entrants to find their way into search space or take away ad-viewing eyeballs. It was all about making HTML easy and forgiving.
It's a good thing they like other machine-friendly semantic formats like RSS and Atom...
"Human friendly authorship" was on the other end of the axis from "easy for machines to consume". I can't believe we trusted the search monopoly to choose the winner of that race.