Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Dimensionality reduction with UMAP combined with HDBSCAN is a popular topic modeling method found in a number of libraries. txtai takes a different approach with a semantic graph.

When enabled, txtai builds a semantic graph at index time as it's vectorizing data. These vector embeddings are then used to create relationships in the graph. Finally, community detection algorithms build topic clusters.

This approach has the advantage of only having to vectorize data once. It also has the advantage of better topic precision given there isn't a dimensionality reduction operation (UMAP).

Read more here: https://neuml.hashnode.dev/introducing-the-semantic-graph




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: