Hypergraph is like a graph but one where vertexes can connect more than two node...

lmeyerov · on July 30, 2022

Yep, close -- it's where an edge can connect more than 2 nodes

Two things to make this way more intuitive & practical:

* Modeling: We use them for helping folks correlate across events, or correlating across wide database rows. Ex: Finding bots & fraudsters in your signup events or accounts table . A signup event might have the same weird User Agent, same weird IP, same weird domain name, etc: its good to see events connected across multiple such things! So any time you have an event or a wide row, you can think of it as a hyperedge connecting (correlating) the multiple entities involved.

* Visualizing: It's typically weird visually to draw an edge connecting more than 2 nodes, so most people don't. Instead, sometimes, our users want to see a correlation event as an explicit node in a graph. SO there's a natural bipartite graph from events to entities, like "(signup)-> (user agent)" + "(signup)->(ip)" + ... . But other times its annoying, so they rather just see "(ip:node)<-[signup:edge]-->(useragent:node)". Interestingly, for really wide data, having an explicit "hypernode" (event node) does increase the number of nodes... but significantly decreases the number of edges b/c avoids drawing really big connected components. Formally, add |events| nodes, vs multiply edges by |number of columns|^2 .

If you want to play with it, try "graphistry.hypergraph(pd.read_csv('http://blah.csv'))['graph'].plot()" in http://github.com/graphistry/pygraphistry (or just to get the ._nodes and ._edges dataframes out)

Nowdays, we also do a lot with embedding arbitrary data and exploring similarity graphs that way (use k-nn for inferring similarity edges), so not just across entity links but also say timestamps, $'s, and byte counts. Any dataset embedding can thus be thought of as a projection of the hypergraph, and supporting more interesting columns than just entity columns (categoricals). So in the above example, `graphistry.nodes(pd.read_csv('http://blah.csv')).umap().plot()` and we'll infer the `._edges`. All the nodes in an embedding view are essentially hypernodes, and instead of connecting them to entity/attribute nodes... project those out, and just connect the hypernodes to one another. This gives a very different way to think about stuff like explainability of AI :)

nighthawk454 · on July 30, 2022

Yes, a hyperedge is like a set of nodes. Not limited to just two nodes.

I hadn't thought of relational tables in that way. I guess, you could look at each row of a table as a hyperedge grouping together all the columns for a given record. Not sure how useful that is though. The cross-table join relations are still relatively point-to-point I think.

namibj · on July 30, 2022

Funnily enough, this hypergraph view is used to describe symptomatic query complexity for cyclical joins, the simplest if which is "count triangles in adjacency list" (naive RDBMS yields O(n²) on pathological data, despite the wrist case optimal tactic being something like O(n + #Triangles), with #Triangles limited inherently to O(n^(1.5))).

See https://arxiv.org/abs/1404.0703 for some reading material and one particular way at beyond worst case (i.e., sharing worst case asympotics with the best constant time algorithm, but better performance on non-pathological data (a more precise notion is complicated; there are at least 3 different useful definitions)) runtime performance.

AFAIK the worst cyclical query on a 2-column table is "k-clique counting", from a POV of performance between WCOJ and what Postgres can do (asympotics).

galaxyLogic · on July 30, 2022

It might be just a good way of thinking about it. In practice it might provide some ideas about how to translate between an Object-view and Relational-View.

What I'd like to find is a relational database where field-values can be not just elementary values like numbers and strings but also "object-instances".

I wonder is there such a database? It would seem to offer the best of both worlds, objects, and relations between objects.

ynniv · on July 30, 2022

GemStone is a common one, but object databases are not very common due to the maturity, efficiency, and predictability of relational databases. The deep pipelining and caching models of CPUs are a better match for RDBMs than object ones.

https://en.wikipedia.org/wiki/Gemstone_(database)

galaxyLogic · on July 31, 2022

I understand that Gemstone is an Object-Oriented Database that stores objects and is great at that. But can you query those objects with SQL?

What I'd like to understand is, is there some basic reason why RDBMS and ODBMs must be different databases. Or could their conceptual models of data perhaps be generalized into a single model. Using hyper-graphs perhaps. :-)

nighthawk454 · on July 30, 2022

This might not be quite it, but perhaps you’re looking for graph databases?

galaxyLogic · on July 31, 2022

Can you use SQL with graph databases?

Related to the current topic, do graph databases support hyper-graphs?

nighthawk454 · on July 31, 2022

I don't believe they support SQL. They're somewhat more like pattern-matching on graphs. Most of their query language pages have compare/contrast examples with an equivalent SQL query. e.g. Neo4j's Cypher query language [1], or Apache Tinkerpop Gremlin (seriously) [2] used by Amazon Neptune.

I don't think the most popular ones support hypergraphs, although that sounds cool. I did see HypergraphDB [3] though! Need to read more about that.

[1] https://neo4j.com/developer/cypher/guide-sql-to-cypher/#cyph... [2] https://tinkerpop.apache.org/gremlin.html#:~:text=Host%20Lan... [3] http://hypergraphdb.org/

mnd999 · on July 30, 2022

This can be modelled as a regular graph simply by representing the hyper edge as another node with a set of edges. If those are labelled appropriately to differentiate them from regular nodes, it’s straightforward enough to represent it as a regular graph.

nine_k · on July 30, 2022

Many things can be represented as more complex structures of simpler things.

It's interesting whether a new formalism allows to express things better, unearth interesting structures not apparent using a more busy formalism, etc.

Compare the Maxwell equations in the wordy original form, and in a highly compact Clifford algebra form.

galaxyLogic · on July 31, 2022

Right but doesn't that mean that in essence you will have two types of nodes in your conceptual model. It is no longer that you have just 'nodes' and 'vertices', you will need to have two different types of nodes, and one type of vertex. The new type of node could be called 'hyper-node' perhaps. Else you lose information.