Dimensionality reduction with UMAP combined with HDBSCAN is a popular topic modeling method found in a number of libraries. txtai takes a different approach with a semantic graph.
When enabled, txtai builds a semantic graph at index time as it's vectorizing data. These vector embeddings are then used to create relationships in the graph. Finally, community detection algorithms build topic clusters.
This approach has the advantage of only having to vectorize data once. It also has the advantage of better topic precision given there isn't a dimensionality reduction operation (UMAP).
When enabled, txtai builds a semantic graph at index time as it's vectorizing data. These vector embeddings are then used to create relationships in the graph. Finally, community detection algorithms build topic clusters.
This approach has the advantage of only having to vectorize data once. It also has the advantage of better topic precision given there isn't a dimensionality reduction operation (UMAP).
Read more here: https://neuml.hashnode.dev/introducing-the-semantic-graph