Hacker News new | past | comments | ask | show | jobs | submit login
LinkedDataHub: The Knowledge Graph Notebook (github.com/atomgraph)
110 points by bryanrasmussen on June 23, 2022 | hide | past | favorite | 45 comments



In my ears, knowledge graph sounds a bit grandiloquent. I do not have a definition, but I know that when talking about knowledge as it is embodied in people, it's quite a subtle thing, hard to formalize and to be honest, something relatively rare.

Why can we just call these things fact databases?

Add. Knowledge evokes a lot of other associations as well, for example that what we are able to know changes over time. That a time has a certain underlying grid, into which certain factual stories appear and later disappear.


> Why can we just call these things fact databases?

Because (in theory) they are much much more than that.

In practice the semantic web/data space has a problem of building complicated standard on top of complicated standard (as well as having a Java implementation monoculture, which doesn't help that). That also makes it hard to formalize all the non-trivial statements that are part of our knowledge.

And yes, there are subtle aspects to knowledge, that is usually not capturable easily in manually formalized knowledge graphs, but that's where pairing knowledge graphs with ML-based methods (e.g. vector search) can really shine.


> Why can we just call these things fact databases?

Companies that want to reinvent/repackage and sell boring RDBMS tech


A Knowledge Graph is the data in the database, not the tech.

You can absolutely implement this in a RDBMS. There are some advantages to a proper graph database though.

But SPARQL is a dead-end - I don't think anyone is really using that in practice outside a dew public demonstration apps. To a large extent this is true of RDF too: triples are useful, RDF gets in the way.


That's just bullshit. Stop spreading FUD.

We participated in a huge RfP for a pharma company which planned RDF KG infrastructure for the next couple of years with 500 billion triple capabilities.

Biomedical, finance, defence, automotive -- all of those industries are using RDF/SPARQL. Just because your problems are not big or complex enough doesn't mean this tech is not used. It takes a certain organization size for Knowledge Graphs to make sense and pay off, that's why most industry users are Fortune 500-level companies.


Wikidata is a good example of something that works. But I agree there isn't much else.


Except the software it’s powered by, Blazegraph, by is deprecated (afaict the devs were poached by AWS to work on Neptune).

https://phabricator.wikimedia.org/T206560


Unfortunately yes, there were discussions about switching to another one not sure where they stand, looks like according to your link not that far...


Its not rdbms though. It's RDF, triple store, graph-like data models.


Those are easy to implement on top of RDBMS. Query performance is a different thing, which can only be evaluated on a case-by-case basis, but you can go a long way with good indexes.

A few companies need real time analytics on really big graphs. Most don't and shouldn't waste their time with fancy Google-scale databases.


Google started calling it Knowledge Graph 10 years ago: https://en.wikipedia.org/wiki/Google_Knowledge_Graph Then everyone else followed.


What is the use-case for this software? From the README:

> We are building LinkedDataHub primarily for:

> researchers who need an RDF-native notebook that can consume and collect Linked Data and SPARQL documents and follows the FAIR principles

I would be interested in reading a user story of a few paragraphs about how this works. I don't know anyone working with RDF or SPARQL documents, but I'm curious about these technologies. Graphs are cool, and SPARQL has a certain appeal. Who is using these things already day-to-day?

> developers who are looking for a declarative full stack framework for Knowledge Graph application development, with out-of-the-box UI and API

I work on an application (https://notion.so) that would be better with more Knowledge Graph, but I don't need a framework. I'm curious what application developers approach the knowledge graph space looking for a "full stack framework". I presume most commercial developers would prefer to use their existing application tooling. Maybe academic researchers writing software for their lab?

>What makes LinkedDataHub unique is its completely data-driven architecture: applications and documents are defined as data, managed using a single generic HTTP API and presented using declarative technologies. The default application structure and user interface are provided, but they can be completely overridden and customized. Unless a custom server-side processing is required, no imperative code such as Java or JavaScript needs to be involved at all.

This kind of flexibility is intrinsically appealing to programmers, but the resulting user experience leaves a lot to be desired. Usually it's better to build a good product first, and then to extract the framework bits once they've proved productive. Otherwise you may end up with a framework that can do anything, but in a way nobody wants.


So I don’t personally have many use cases for RDF-type data, but I plan on implementing RDF data endpoints in a music library app I’m building.

I suppose RDF thrives in the academic space, whereas userspace suffers from a chicken and egg type problem. There aren’t many common services available that have public RDF endpoints, so few applications using them get built.

Edit: I suppose that’s what LinkedDatHub provides then, a way for researchers to build API-transformers into their graph, so they can then use it with SPARQL.


It doesn't have many ETL features, but it does support CSV import.

What kind of data are you looking to transform?


Basically I’m combining your local music library with data from streaming services so you have unified playlists across them.

So basically I’m having to match a bunch of resources like songs together from different APIs.


LinkedDataHub was extracted from the common code from a number of Linked Data projects that we have done in different domains.

It can be used as a framework but it's a standalone application as well, because it provides the default built-in ontologies as well UI for Linked Data and SPARQL consumption.

Disclaimer: I'm the main developer.


What were the projects you did? I think concrete examples would help me understand the software more.



Thanks!


The list of dependencies is amazingly long for a product which seems to be a harder to use TiddlyWiki, or Neo4j UI for the graph viz part. It's crazy the SemWeb community still haven't give up given how much effort have been poured into it for so few results.


I access SPARQL endpoints from inside programs written (usually) in Common Lisp, Python, and Clojure.

LinkedDataHub looks cool enough for non-tech users, but I prefer working inside a repl/Slime/etc. interactive programming environment.

Also, Google, Facebook, most banks, etc., etc., use Knowledge Graphs - pretty solid technology.


You do realize this is an open-source project? And you are comparing with a product by Neo4J who got $300M VC investment?

The enterprise Knowledge Graphs (yes, it's the same SemWeb tech stack in principle) in Fortune 500 sized companies have in-house platforms that present the graph to the end-users, with entity browsers, analytics, dashboards, structured content etc. LinkedDataHub is an attempt to bootstrap an open-source, standards-driven version of that.


This package was designed to solve more problems than it creates

https://github.com/paulhoule/gastrodon

Overall I think of graph visualization as a problem, in particularly there are some people who just don't see that hairballs are incomprehensible

https://cambridge-intelligence.com/how-to-fix-hairballs/


Large graphs (just about anything larger than a karate club social network [1]) can't usually be visualized in a useful manner. There are exceptions, but in real world applications they are more useful as pretty art than helping with understanding.

Statistical summary plots are more useful.

Maybe one day someone will figure something out, but much like scatter plots fall over when you plot vast amounts of raw data, so do plotting graphs.

[1] https://en.m.wikipedia.org/wiki/Zachary%27s_karate_club


My answer to it is that graphs need to be manually curated. For example, a UML diagram for all the database tables on the system I am working on now would have to be printed out on a wall to make any sense, but if I picked out the tables involved in a new user registration that would be useful.

I went to an exhibit of this guy's works

https://en.wikipedia.org/wiki/Mark_Lombardi

and saw a series of drafts he'd made where he had drawn many different versions of a conspiracy social network and gradually went from a hairball to something that looked meaningful.

In terms of turning this into a tool there's the interesting problem that there is a graph that comes in from the outside world (and could be regenerated) and also data that represents the curation of the graph (Do I show this? What color is this line? What position does this node get displayed at?) You've got to be able to edit one independently of the other and deal with things sometimes getting out of sync to have a tool that advances over the state of the art.


Mh... I'm an org-roam (org-mode/Emacs) user, witch have a similar feature and... I find such visualization honestly sugar-eye and useless.

Network analysis of notes links is fascinating, but must be actionable in some way, just having a UI means nothing. Also most noting tools miserably fails to really offer "easy atomic notes that can be combined (transcluded) and splitted as the user wish", some try structured ways (SPARQL/fixed formats alike) others try to offer some loose feature set to make anything possible but a real solution is still decades of development away IMO.

So far the best, witch means least worse, way I found to really analyze my notes is using org-mode drawers with relevant templates help for consistency to be queried via org-ql, witch means essentially key-value structured tagging of notes so I can see them in a timeline, I can see all notes about a URL, an author, a subject, a topic, ... unfortunately is a manual tedious process and at runtime is not that fast nor flexible.

Long story shorts vast approaches like Wikidata, classic libraries cataloguing techniques & tools, modern/old notes and relevant tools all work to a certain extent and fails thereafter.


Graph layout is just one of multiple layout modes. See here for more screenshots: https://atomgraph.github.io/LinkedDataHub/


My first intuition of a knowledge graph would be an IDE. If that's not right, how am I wrong? If it is a typical use, what IDE(like) examples are there? Org-mode is a tree instead of a general graph, but general graphs can be traversed as (sets of) trees. Is the tree discipline somehow important to understanding code?


LinkedDataHub, a "RDF-native notebook", is not to be confused with LinkedIn DataHub, which is a metadata store/crawler/ui for your data systems: https://datahubproject.io/.


I wish the installation process can be easier.

For now, i use either obsidian or graf[1] to manage my own knowledge graph.

[1] https://github.com/altilunium/graf


Graphs are great for querying drawing a query can really help explaining what you really want. But for results visualization as soon as you reach the hundredish nodes it becomes unbearable. There are tricks used by crime analyzis software for example where results are grouped in different nodes that can make it easier, but that's only good for when you don't have too many node types.


At least in LinkedDataHub, the graph layout is only one of layout modes, together with lists, tables, charts, maps etc.

Check the GH page for more screenshots: https://atomgraph.github.io/LinkedDataHub/


Its honestly fantastic to see web pages that are using XSLT, is this the most advanced app out there using it these days?


What's good about XSLT? Is its ecosystem substantially better than alternative options like simple string templating a la https://pkg.go.dev/html/template?


XSLT is wildly more than a templating engine. It can (and has) been used to e.g. specify a protocol and generate software based on it. See XCB for an example. With a sufficiently large corpus you can run queries on XML and generate arbitrary media.

As with most overbearingly flexible technology, it's an incredible pain in the ass to use efficiently, and XSLT processors tend to be plagued with complexity and concomitant performance problems.


No complaints about the Saxon processors here (we're using Saxon-HE server-side and Saxon-JS client-side). The XSLT standards are excellent, as is the quality of Saxon implementations.


> What's good about XSLT?

Nothing. It was a bad choice in its heyday (I worked on some projects way back then).

> Is its ecosystem substantially better than alternative options

No.


Absolutely. XSLT is a data transformation technology, not a template language.


Can you describe a task that XSLT makes substantially easier to build/more correct/faster to execute? Saying “yep its good” gives an opinion but after looking at XSLT docs I am not “getting it”. Why do I want this? I transform data all the time with a bash script, is XSLT like bash?


I use it in ETL whenever I have an XML source, then I use XSLT to lift it to RDF (either RDF/XML or TriX). I use it for the UI where I'm transforming RDF/XML to HTML. I'm also using for the interactive parts instead of Javascript (or React or Svelte etc.), but that's the interactive XSLT extension that goes beyond the standard.

XML to XML, XML to RDF, JSON to XML and XML to JSON, XML to text -- XSLT can be used for all kinds of transformations. 3.0 also supports streaming transforms, which is very useful for large input files.

XSLT is a declarative DSL made specifically for the XML data model. It does limited things such as navigating the XML tree but does them really well. It lifts the abstraction level so you can focus on the transformation. You can transform XML with bash or a general purpose language like Java, but it will never be so concise or effective.


It certainly has one of the largest interactive XSLT codebases that use Saxon-JS: https://www.saxonica.com/saxon-js/


Just FYI, there are more screenshots on this GH page: https://atomgraph.github.io/LinkedDataHub/


Bummer, the demo app at https://kg.opendatahub.bz.it/ seems to be broken. The concept sounds like something I could use.


The endpoint served by our partners went down.

Can you ping me at martynas@atomgraph.com? Would be appreciated.





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: