Even if a GTP-X will be able to get my description of a system and because it understands the industry lingo and practices create a extremely optimized RUST code, why would we need systems like apps and webapps (like internet bankings) in a world where we all have access to GTP-X?
Like program a KUKA robot to manufacture typewriters in 2023.
Good link thinks. Have to say we're moving to pgSearch after implementing SOLR alongside a PG/Rails backend. Makes the stack simpler, less components to worry about (less headaches for gem versioning dependencies with SOLR). Lot to be said for it after reading through what's available now...
An alternative is to just layer the FTS on top of vanilla sql to get the extra stuff (this is what I do for my eCommerce backend), so is pretty simple to have something alike:
SELECT ..
-- Get the fts
IN ( FTS QUERY)
ORDER BY
-- The relevance is hardcoded? mayber in another table that store th rankings?
(
Products,
Inventory,
Invoices,..
)
I found is much easier and predictable if I code the "rankings" based on the business logic instead of let the FTS engine guess it. You can store that stuff as normal columns or use the "sources" (ie: products, inventory) as ways to know what could be more important to pull first.
This have the nice property that our search results are ver good and better: Never return non-sensical stuff! (like searching for a apple in the store and get and blog post!)
i think we are talking a little bit different. it doesnt matter what the variables are (your business logic) or some other variables.
Given a certain variable, tfidf/bm-25 will order by relevance and not by match. So it answers the question, what if the name match was off by two characters and the inventory number is less than 200.
tf-idf does not tell you what to order by...but it takes care of all the edge cases of ordering.
now if ur not using text match anywhere and only using business variables...then this entire thread is not for u. But FTS and lucene attack full text search primarily, and that's where the relevance vs ordering discussion comes from
This has been my understanding of the state of Postgres full-text search. It's great if your search requirements are fairly vanilla, but I haven't seen any solutions for more advanced search needs, such boosting, relevance, scoring, etc.
ts_rank functions use the term frequency within that document. not a global term frequency (which is why u need a separate index like what elasticsearch does).
this is important, cos if a word is too common, its considered less significant for a document match. When we calculate IDF, it will be very low for the most occurring words such as stop words (“is” is present in almost all of the documents, and tf-idf will give a very low value to that word).
Is this the start of a future where we can write high level code (Idris, Agda, Coq) and the resulting code will run as fast (and as safe) as RUST? Interesting.
Great list. As I can add anything here, I will say:
- Bitemporal support OOTB (storage would be more expensive, as temporal data needs more disk space)
- CoW capabilities OOTB, so it would be super easy (fast and cheap) to create ephemeral database for development purpose.
- Charge per request (ms of reads, ms of writes) - for the sake of being more specific about serverless.
- AI capabilities that detects the use of the database and suggests indexes or other tweaks to make the database as fast as possible (and cheap), even if schema changes, database size increases or query patterns change
- PostgreSQL support (and all its extensions... I know that's a hard one as PS is based on MySQL)
- OOTB capabilities for Masking and/or anonymizing of data (PCI, PII, etc)