File formats will be documented when I publish the data-files in a few weeks.
What do you mean with postings?
The main index is split into 32 shards (there is also an additional news-index which is updated about every 5-10 minutes). Each shard is updated and queried seperately. The query actually runs 2/3 on a Windows server and 1/3 on a Linux server. The latter in Docker containers. I want to move everything to Linux over time.
Query has two phases. First only a rough - but fast - ranking is done. Then the top results of all shards are combined and completely re-ranked. This is basically a meta search engine hidden within.
First query phase is in src/searchservernew.dpr, and the second phase is in src/cgi/PostProcess.pas.
Thank you. "Postings" is another word for the format of the doc ids and related information in the inverted file. A google for "inverted index postings" will turn up a bunch of references.
What do you mean with postings?
The main index is split into 32 shards (there is also an additional news-index which is updated about every 5-10 minutes). Each shard is updated and queried seperately. The query actually runs 2/3 on a Windows server and 1/3 on a Linux server. The latter in Docker containers. I want to move everything to Linux over time.
Query has two phases. First only a rough - but fast - ranking is done. Then the top results of all shards are combined and completely re-ranked. This is basically a meta search engine hidden within.
First query phase is in src/searchservernew.dpr, and the second phase is in src/cgi/PostProcess.pas.