Hacker News new | past | comments | ask | show | jobs | submit login

For full text search, the only global frequency information Postgres uses is the stop-word list. It does not do TF-IDF ranking. [1]

For example, if you search for "Bob Peterson", Postgres will rank these two documents the same:

"I saw Bob."

"I saw Peterson."

In contrast, an IDF-aware search would notice that "Peterson" occurs in fewer documents than "Bob" and score "I saw Peterson" higher for that reason.

[1] http://en.wikipedia.org/wiki/Tf%E2%80%93idf

[2] http://stackoverflow.com/questions/18296444/does-postgresql-...




TF–IDF ranking doesn't seem to be too complex a thing to implement. Maybe this is an opportunity for someone here to contribute to the open source project.


The concurrency aspects of this seem a bit tricky. How do we ensure that a bloated index does not screw our results too much?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: