In my ears, knowledge graph sounds a bit grandiloquent. I do not have a definition, but I know that when talking about knowledge as it is embodied in people, it's quite a subtle thing, hard to formalize and to be honest, something relatively rare.
Why can we just call these things fact databases?
Add. Knowledge evokes a lot of other associations as well, for example that what we are able to know changes over time. That a time has a certain underlying grid, into which certain factual stories appear and later disappear.
> Why can we just call these things fact databases?
Because (in theory) they are much much more than that.
In practice the semantic web/data space has a problem of building complicated standard on top of complicated standard (as well as having a Java implementation monoculture, which doesn't help that). That also makes it hard to formalize all the non-trivial statements that are part of our knowledge.
And yes, there are subtle aspects to knowledge, that is usually not capturable easily in manually formalized knowledge graphs, but that's where pairing knowledge graphs with ML-based methods (e.g. vector search) can really shine.
A Knowledge Graph is the data in the database, not the tech.
You can absolutely implement this in a RDBMS. There are some advantages to a proper graph database though.
But SPARQL is a dead-end - I don't think anyone is really using that in practice outside a dew public demonstration apps. To a large extent this is true of RDF too: triples are useful, RDF gets in the way.
We participated in a huge RfP for a pharma company which planned RDF KG infrastructure for the next couple of years with 500 billion triple capabilities.
Biomedical, finance, defence, automotive -- all of those industries are using RDF/SPARQL. Just because your problems are not big or complex enough doesn't mean this tech is not used. It takes a certain organization size for Knowledge Graphs to make sense and pay off, that's why most industry users are Fortune 500-level companies.
Those are easy to implement on top of RDBMS. Query performance is a different thing, which can only be evaluated on a case-by-case basis, but you can go a long way with good indexes.
A few companies need real time analytics on really big graphs. Most don't and shouldn't waste their time with fancy Google-scale databases.
What is the use-case for this software? From the README:
> We are building LinkedDataHub primarily for:
> researchers who need an RDF-native notebook that can consume and collect Linked Data and SPARQL documents and follows the FAIR principles
I would be interested in reading a user story of a few paragraphs about how this works. I don't know anyone working with RDF or SPARQL documents, but I'm curious about these technologies. Graphs are cool, and SPARQL has a certain appeal. Who is using these things already day-to-day?
> developers who are looking for a declarative full stack framework for Knowledge Graph application development, with out-of-the-box UI and API
I work on an application (https://notion.so) that would be better with more Knowledge Graph, but I don't need a framework. I'm curious what application developers approach the knowledge graph space looking for a "full stack framework". I presume most commercial developers would prefer to use their existing application tooling. Maybe academic researchers writing software for their lab?
>What makes LinkedDataHub unique is its completely data-driven architecture: applications and documents are defined as data, managed using a single generic HTTP API and presented using declarative technologies. The default application structure and user interface are provided, but they can be completely overridden and customized. Unless a custom server-side processing is required, no imperative code such as Java or JavaScript needs to be involved at all.
This kind of flexibility is intrinsically appealing to programmers, but the resulting user experience leaves a lot to be desired. Usually it's better to build a good product first, and then to extract the framework bits once they've proved productive. Otherwise you may end up with a framework that can do anything, but in a way nobody wants.
So I don’t personally have many use cases for RDF-type data, but I plan on implementing RDF data endpoints in a music library app I’m building.
I suppose RDF thrives in the academic space, whereas userspace suffers from a chicken and egg type problem. There aren’t many common services available that have public RDF endpoints, so few applications using them get built.
Edit: I suppose that’s what LinkedDatHub provides then, a way for researchers to build API-transformers into their graph, so they can then use it with SPARQL.
LinkedDataHub was extracted from the common code from a number of Linked Data projects that we have done in different domains.
It can be used as a framework but it's a standalone application as well, because it provides the default built-in ontologies as well UI for Linked Data and SPARQL consumption.
The list of dependencies is amazingly long for a product which seems to be a harder to use TiddlyWiki, or Neo4j UI for the graph viz part. It's crazy the SemWeb community still haven't give up given how much effort have been poured into it for so few results.
You do realize this is an open-source project? And you are comparing with a product by Neo4J who got $300M VC investment?
The enterprise Knowledge Graphs (yes, it's the same SemWeb tech stack in principle) in Fortune 500 sized companies have in-house platforms that present the graph to the end-users, with entity browsers, analytics, dashboards, structured content etc. LinkedDataHub is an attempt to bootstrap an open-source, standards-driven version of that.
Large graphs (just about anything larger than a karate club social network [1]) can't usually be visualized in a useful manner. There are exceptions, but in real world applications they are more useful as pretty art than helping with understanding.
Statistical summary plots are more useful.
Maybe one day someone will figure something out, but much like scatter plots fall over when you plot vast amounts of raw data, so do plotting graphs.
My answer to it is that graphs need to be manually curated. For example, a UML diagram for all the database tables on the system I am working on now would have to be printed out on a wall to make any sense, but if I picked out the tables involved in a new user registration that would be useful.
and saw a series of drafts he'd made where he had drawn many different versions of a conspiracy social network and gradually went from a hairball to something that looked meaningful.
In terms of turning this into a tool there's the interesting problem that there is a graph that comes in from the outside world (and could be regenerated) and also data that represents the curation of the graph (Do I show this? What color is this line? What position does this node get displayed at?) You've got to be able to edit one independently of the other and deal with things sometimes getting out of sync to have a tool that advances over the state of the art.
Mh... I'm an org-roam (org-mode/Emacs) user, witch have a similar feature and... I find such visualization honestly sugar-eye and useless.
Network analysis of notes links is fascinating, but must be actionable in some way, just having a UI means nothing. Also most noting tools miserably fails to really offer "easy atomic notes that can be combined (transcluded) and splitted as the user wish", some try structured ways (SPARQL/fixed formats alike) others try to offer some loose feature set to make anything possible but a real solution is still decades of development away IMO.
So far the best, witch means least worse, way I found to really analyze my notes is using org-mode drawers with relevant templates help for consistency to be queried via org-ql, witch means essentially key-value structured tagging of notes so I can see them in a timeline, I can see all notes about a URL, an author, a subject, a topic, ... unfortunately is a manual tedious process and at runtime is not that fast nor flexible.
Long story shorts vast approaches like Wikidata, classic libraries cataloguing techniques & tools, modern/old notes and relevant tools all work to a certain extent and fails thereafter.
My first intuition of a knowledge graph would be an IDE. If that's not right, how am I wrong? If it is a typical use, what IDE(like) examples are there? Org-mode is a tree instead of a general graph, but general graphs can be traversed as (sets of) trees. Is the tree discipline somehow important to understanding code?
LinkedDataHub, a "RDF-native notebook", is not to be confused with LinkedIn DataHub, which is a metadata store/crawler/ui for your data systems: https://datahubproject.io/.
Graphs are great for querying drawing a query can really help explaining what you really want. But for results visualization as soon as you reach the hundredish nodes it becomes unbearable. There are tricks used by crime analyzis software for example where results are grouped in different nodes that can make it easier, but that's only good for when you don't have too many node types.
What's good about XSLT? Is its ecosystem substantially better than alternative options like simple string templating a la https://pkg.go.dev/html/template?
XSLT is wildly more than a templating engine. It can (and has) been used to e.g. specify a protocol and generate software based on it. See XCB for an example. With a sufficiently large corpus you can run queries on XML and generate arbitrary media.
As with most overbearingly flexible technology, it's an incredible pain in the ass to use efficiently, and XSLT processors tend to be plagued with complexity and concomitant performance problems.
No complaints about the Saxon processors here (we're using Saxon-HE server-side and Saxon-JS client-side). The XSLT standards are excellent, as is the quality of Saxon implementations.
Can you describe a task that XSLT makes substantially easier to build/more correct/faster to execute? Saying “yep its good” gives an opinion but after looking at XSLT docs I am not “getting it”. Why do I want this? I transform data all the time with a bash script, is XSLT like bash?
I use it in ETL whenever I have an XML source, then I use XSLT to lift it to RDF (either RDF/XML or TriX). I use it for the UI where I'm transforming RDF/XML to HTML.
I'm also using for the interactive parts instead of Javascript (or React or Svelte etc.), but that's the interactive XSLT extension that goes beyond the standard.
XML to XML, XML to RDF, JSON to XML and XML to JSON, XML to text -- XSLT can be used for all kinds of transformations. 3.0 also supports streaming transforms, which is very useful for large input files.
XSLT is a declarative DSL made specifically for the XML data model. It does limited things such as navigating the XML tree but does them really well. It lifts the abstraction level so you can focus on the transformation.
You can transform XML with bash or a general purpose language like Java, but it will never be so concise or effective.
Why can we just call these things fact databases?
Add. Knowledge evokes a lot of other associations as well, for example that what we are able to know changes over time. That a time has a certain underlying grid, into which certain factual stories appear and later disappear.