Hacker News new | past | comments | ask | show | jobs | submit login
The Semantic Web Is Dead. Long Live the Semantic Web (blikk.co)
57 points by dennybritz on Nov 3, 2014 | hide | past | favorite | 43 comments



> The Semantic Web means different things to different people.

There is no better way of expressing why "the semantic web" as it was originally conceived failed, and will always fail. This is not to say that automated processing to extract higher-level information of the kind the article argues for might not be both helpful and possible, but it will always be awkward, difficult and imperfect. The awkwardness and difficulty will go down as AI gets better, but the imperfection has a fairly high minimum level because the kind of meaning humans extract from language varies enormously between individuals.

As a friend who works in geology put it: "If I send a bunch of geologists out to survey an area and combine their data naively, I can tell who mapped where but not what anybody mapped." The illusion of shared meaning is powerful, and with enormous effort we can get communities of shared meaning that are large and powerful enough to be extremely useful, but we do that either through top-down control (politics, corporations, militaries) or intense bottom-up interaction (the sciences), not the kind of loose, informal, disconnected mechanisms that the Web enables.


The great idea of semantic web, as conceived in the early beginnings, was mostly visionary and rather impossible to build. This doesn't mean the that whole concept has failed. It advanced many fields of AI and created many initiatives which are quite vibrant to this day (e.g. Linked Data). The illusion of failure of the semantic web is mostly due to the fact that today there are less semantic web projects funded. The money goes to other things that are now on the rise, the same as it was with semantic web 10 years ago. However semantic web is still here, with many tools mature enough to be applied in the industry. I don't expect it to dissapear completely.


I completely agree with what you are saying, but I think what you are getting is a related but slightly different problem: Interpreting/Extracting information usually leads to uncertain results, even for humans.

I think the Semantic Web doesn't try to solve this problem and doesn't need to. There are other approaches, such as probabilistic programming, that are meant to reason with uncertain data. But to do that someone must make the data available to you and tell you how uncertain it is in the first place. I think that's what the Semantic Web, or whatever you may call it, can do. Make data available, but leave open the interpretation.


When I got exposed to semantic web probably 4 years ago, I realized what a huge mistake it was/is.

My take is that it is a bunch of ideas to make it easier for search engines to crawl and use your data about all kinds of things. On one hand, that is great.

However, the huge problem with using it is that you end up creating your own search engine to crawl data sources to do anything useful. That is to say, you have to crawl the data, store it somewhere, and then build up your own systems for querying or doing anything useful with the data.

The use case many people have is that they want to use and API to get some particular bit of data out and that's it. Like, say you want to do a search for a list of tweets on Twitter with the hashtag #HackerNews. The sane thing to do is to be able to hit a twitter api endpoint and have it send you back a list of tweets with that hash tag.

Now, imagine if instead you had to index all of twitter and filter yourself for tweets with #HackerNews in it. Is that better for that particular use case? No, it sucks.

There are certainly cases where you DO want to crawl data and do your own data analysis on it. But, that is a much more limited use case for many developers and there isn't as much value in that as people seem to believe.

A better solution would be something like REST with HATEOS on a much larger scale. You'd be able to index things nicely, but still have the benefits of smarter API calls.

Unfortunately, I don't see this happening anytime soon, despite the interesting things you could build with it.


I agree, the crawling / indexing problems you describe are definitely true for Linked Data. But there are still other use cases for semantic web technology.

> The use case many people have is that they want to use and API to get some particular bit of data out and that's it.

In theory, that's what SPARQL and SPARQL endpoints are for. Plus you get things like federated queries and an open data model (RDF), that allows to combine multiple data sources without "wrapping" schemas.

But well, this is kind of utopia and yes, I doubt it will ever be "a thing".


Having attended ISWC numerous times, this conference is long dead. It is the perfect example of eminence-based vs evidence-based science. Technicality of paper if low, most of the paper gathers bad and useless description logic theorems from the very same guys each year. The consanguinity rate is higher than any other CS conference. The quality of the conference is in fact well disputed among academics. Core[1] ranking puts it as a B conference (later updated to A due to request based on scholar values[2]). Arnetminer[3] and Academic Search[4] put it in A+/A tier. These latter are drawn from bibliographic metrics while the first ranking is the result of a poll among australian researcher. Altough we disagree on rugby, I'm with the aussies for this one.

The industry track exists for the record. Much of semweb companies were funded using EU money and went bankrupt when the money went out. I can't believe the EU continues to inject plenty of money for the semweb in H2020 despite having wasted more than 1B in the previous decade (publicly admitted by the EU).

At ISWC 2012 in a workshop that took place the day before the conference, some guy (don't remember his name) asked the speaker "would this building explode, do you think the semweb would still exist ?". The speaker tried to find examples of people doing semweb in the industry, but did not manage to convince anyone, not sure if he did it for himself. This was (and is) a nice summary of the situation of the semantic web.

[1] http://core.edu.au/

[2] http://103.1.187.206/core/1338/

[3] http://arnetminer.org/page/conference-rank/html/All-in-one.h...

[4] http://academic.research.microsoft.com/RankList?entitytype=3...


Having attended ISWC this year, where I also published in the proceedings of a workshop, I believe you don't understand exactly how academia works. Academic fields such as the semantic web have nothing to give to industry.

It's like asking Einstein back in 1905 how is his work going to be used in industry.

Semantic web is not an industry or engineering field. It's theoretical and academic and it has its purpose because it lays strong theoretical foundations.


The Semantic Web is developing a technology, like the Web is a technology. It is something that aims to be used by people, so to be implemented in real systems. Therefore it aims at giving smth to the industry. The theoretical argument of the semweb is absolutely meaningless.

Physicists study law of nature, there is no such thing in the Web which is a deeply human field. It's quite an insult for Einstein to be compared to the Semantic Web ...


Will the KDE people get the message? </troll> Sorry I had to get that out of the way.

In all seriousness, 'Semantic Web' has always felt like a SciFi inspired version of AI intelligence. A concept that sounds cool, but in reality can't ever work.

Take movie ratings & "suggested viewings" for example. Jim, Bob, and Steve all watch the same movie. Jim thinks it's funny because of the physical gags. Bob thinks it's funny because of the dialog & jokes. Steve likes it because the hot new actress is naked. Dave likes the director and cinematography. All 4 guys give it a rating of 4/5.

With this one data point Streaming-movie-place.com cannot ever 'guess' what to suggest to these guys to offer more movies for them to watch. The hope is that once these guys start watching and rating other films a pattern will emerge. That pattern can then be marketed and offer valid suggestions.

BUT reality is too different. We like different things for different reasons, and no algorithm can ever get it 100% right. How many of you have a Netflix queue of things that you want to see, but the suggested movies are full of crappy suggestions? most or all of us I bet.

Which brings me back to my point; it's a sci-fi illusion. It can never exist in real life. Humans are too damn fickle. (Which brings me back to KDE; I wish they would give up on neopunk/symantic desktop crap. it's bloated, slows the system down and offers nothing in return. Or I am just using it wrong.) </rant>


All I want from the Semantic Web is to know if I see that an API follows the standard, things with the same name will be the same kind. For all I care, that API's underlying model could output to HTML, rendered PNGs, ASCII art or 3D plastic.

In particular, scraping the View is pretty much a last resort. If you have to, it's likely that either the source doesn't have the manpower to become semantic anytime soon, or it's hostile to providing easy to parse data anyway.


I have some personal insights from following the SW world for some time:

1. Most of the implementations and formats and everything really boil down to a great way to publish graph data on the web, and query it using a pretty nice graph query language, and for most general cases everything just works.

2. A lot of the web is implementing incomplete select parts of SW technologies even if they don't realize it, and then promptly putting it all behind an API key and shared secret. When SW takes hold, that will all have to be different, IE a common mechanism for authentication / authorization, and some kind of way to quantify what is supported by a service, and all of that exists but again, every service is different it seems right now, and they all fear unfettered access.

3. You can use all of it today if you want, and the library ecosystem is very rich, IMHO. Plop it all into Neo4J / Jena / rdflib (etc etc) and have at it.


Humans are all unique so there will be differing views of semantics. It does exist in pockets but not to the schema or spec as imagined. For instance, hashtags and geolocation on tweets, categories/tags within wordpress blogs, the social graph within facebook and others. Small pockets of semantic culture within.

When implementation is left to a large population, there will be differentiation as that is our nature. The semantic web wanted it easy, it wanted the implementers to organize. Organizing it with all the differences is how it will have to be. Unless a system implements the standard for you and each author upon creation, there will be differences, deltas and no standard.

Probably the best semantic web / metadata system that has been built does do this and that one is at the NSA.


The approach currently used is in its death throes, to be sure.

But a working Semantic Web is a huge deal (think: bigger than Google) and will happen, make no mistake. But when it happens, it will be via a different approach.

Edit: spelling (thx Joshua)


We are on the cusp of solving natural language understanding, which will enable natural language for humans and relational databases for computers. There will be no "Semantic Web", there will the "Web" and the machines that read it just as well as humans.


I think we are still very far away from NLU. There are a couple of tiny subproblems such as Entity Extraction, Relation Extraction and Dependency Parsing that we have become pretty good at (for English that is, but for many languages even that is still difficult). However, once you get into pragmatics and contextual understanding we still have no clue. And in order to enable real NLU you'd also need to ability to do reasoning on top of other knowledge (and "common sense"), which is a whole different problem.

Based on products like Siri or Cortana it may seem like we are coming closer to NLU, but the reason those are working at all is because they are very topic-specific and tiny subproblems of what NLU is really about.


Except that sadly, the only machines that will read it will have to be centralised in the hands of cloud providers in order to provide the funding and power needed to run them.

One of the great things about the semantic web, to me, was always that you could process it on a cheap VPS or your own desktop, creating new systems with little or no knowledge of math or machine learning. (I know projects that do this today.) That's not going to happen for a long time with natural language processing.


>>We are on the cusp of solving natural language understanding

Why do you think this? Is it just a feel or is there some company that is really close. As another user mentioned in this thread natural language understanding seems to be the same thing as solving strong AI. Solving strong AI is a huge deal. So big that the people in control of it will probably become the most powerful people in the world.


Whether we are on the cusp really depends on what we mean by "natural language understanding" specifically. What about Watson? It seems to "understand" quite a bit of natural language just fine and can answer interesting questions posed in same. If that doesn't count as "understanding", I'd like to know what does? It probably can't do extended reasoning based on its "understanding", but that would seem to me to be moving the goal posts.

I doubt that a sufficiently well-defined notion of natural language understanding that does not specifically include strong artificial intelligence in its definition would require strong AI. Constructing such a definition is left as an exercise to the reader.

Thinking that strong AI is required for natural language understanding may end up being similar to how it was once thought that beating humans at chess would require advanced AI. Brute forse can do wonderful things, as can weak AI.


Natural language understanding is typically regarded as AI-complete, and once we have that, semantic web will be the least interesting bit of new technology.


> We are on the cusp of solving natural language understanding

I'd love to find out more about what you have in mind here. Can you elaborate?


The Semantic Web is about the semantics of schema, not the actual semantics of the data.

THIS is why it failed: you guessed what the thing was from the name instead of actually learning about it.

Also: "death throes"


Nicely put, but I think there's more to its failure than just that.

You might find this interesting: https://vimeo.com/92351230

When anyone can query anything, the SW will build itself -- a sort of new web, but of data, not text.


Text is data, just structured in a human-readable way.


Text is data, just unstructured. You can't ask meaningful queries of it. That's why we have databases.


     ps x | grep "nginx" | wc -l


Meaningful


Every time I begin to explore using RDF and the associated technologies for a knowledge rich problem I always find the tool support a major disappointment. I understand there are exceptions (4store is a good example of a mature triple store) but generally speaking I always leave feeling much of the core tools never made it out of academia. And in a world where development time is a precious commodity, I always end up going with a lower friction but ultimately less powerful technology.


Semantic web is mainly an academic field. The theoretical research produced by this community is a foundation valuable for a variety of different fields. Stop thinking about the semantic web as the next big thing or the future of the web.

Just like many other research fields it lays strong theoretical (and at times also practical) foundations.



The Semantic Web is more about the semantics of schemas than amount the semantics of data.

This is very poorly explained; people tend to think it means something about computers understanding the data.


Any linking provides context thus semantics. Linking to a schema makes this explicit and enhances latent semantics. So it's both right?


But that doesn't provide meaning to the data itself.


The site is down. Does anyone have a mirror?


hmm, must be a semantic site. How appropriate.


:)


Sorry about that, restarted it :) I should probably upgrade that small DigitalOcean instance.


I reposted it here while I'm upgrading the instance: https://medium.com/@dennybritz/the-semantic-web-is-dead-long...


Alright, should be back up again. With a bit more memory and swap space this time :D


Welcome to the party pal.

Love, RSS


I wrote my counter response to this article. https://news.ycombinator.com/item?id=8552989


The semantic web as I understood it was just using the right HTML tags. Most importantly, no tables for layout and using semantic names for CSS classes rather than names like "blue". This has completely failed, imo. Tables are no longer constructed with <table>, <tr> and <td> but with a combination of <div> and <span> tags with classes that are not semantic at all :(

Edit: Not sure why this got downvoted, just trying to illustrate what I thought the term meant.


> The semantic web as I understood it was just using the right HTML tags.

Using semantic HTML (rather than presentational HMTL) for content is somewhat related to the Semantic Web, but not equivalent to it -- the Semantic Web is/was about open, composable ways of extracting and processing meaning from content on the web. HTML itself (even HTML5 or the current state of the HTML Living Standard) -- even when used in a cleanly semantic way -- isn't expressive enough to do much with that without building additional ontological structure on top of that, hence things like RDF and Microformats.


That sense of the term was actually co-opted from what people are talking about in this discussion, i.e., rdf, microdata, etc. They are related though, as using the correct HTML tags was meant to be a first step into adding a layer of meaning to web content that wasn't there before. But "the semantic web" includes a whole, whole lot more than just semantic markup. Hope that clarifies a bit!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: