I worked previously for a very high traffic ecommerce company (Alexa top 300 site).
As part of the search team, I worked on a project where we deliberately rewrote the whole product search engine in Python and Cython, including our own algorithms manipulating documents for deletion, low latency reindexing after edits, and more.
We did this because SOLR was too slow and the process of defining custom sort orders (for example, new sort orders defined by machine learning ranking algorithms, and needing to be A/B tested regularly) was awful and performance in SOLR was poor.
It was a really fun project. One of the slogans for our group at the time was “rewriting search in Python for speed.”
The ultimate system we deployed was insanely fast. I doubt you could have made it faster even writing the entire thing directly in C or C++. It became a reference project in the company to help avoid various flavors of arrogant dismissal of Python as a backend service language for performance critical systems.
Defining custom sort orders in Solr is as simple as uploading a text file with the values you intend to use for ranking.
This is a great feature that is in fact missing from Elasticsearch and saves you so much reindexing time.
There certainly are usecases where Lucene based solutions aren't the best fit. But I think the claim that you couldn't make something faster by moving away from Python is outlandish.
> There certainly are usecases where Lucene based solutions aren't the best fit. But I think the claim that you couldn't make something faster by moving away from Python is outlandish.
I read that as a statement that they implemented a proper and bespoke algorithm, not that the speed of Python is greater than C. I am surprised that you read it that way. Who in their right mind would say Python speed is faster than C speed?
Yes. If the implementation language isn't the determining factor for speed, then what is the cause? There is a branch of Computer Science called Algorithmics[0], wherein one expression of measurement is called Big-O notation[1].
You seem fundamentally confused, for example with tools like Cython.
Many extension module implementations in Python are literally as fast as pure C (not just nearly as fast with minor extra CPython overhead, but literally as fast as pure C by deliberately bypassing CPython VM loop and data models).
Because you're writing regular Python for a production service though, and not artificially writing optimized examples, then you will occasionally have to pay extra costs.
Are you perhaps a bit too invested in your own narrative?
You are incorrect. This is only true if you can precompute all sort orders (many types of sort orders cannot be precomputed and depend on additional context data only available at query time, especially for personalization or trending solutions). Additionally you must deal with precommitment to very poor sharding properties with Solr. With our Python approach, we could hold all results (billions of content items with trimmed down data structures representing only the thinnest container needed for each sort order representation) in memory easily, and dynamically resort on the fly or double sort by multiple sort orders that each required contextual data only available at query time.
As part of the search team, I worked on a project where we deliberately rewrote the whole product search engine in Python and Cython, including our own algorithms manipulating documents for deletion, low latency reindexing after edits, and more.
We did this because SOLR was too slow and the process of defining custom sort orders (for example, new sort orders defined by machine learning ranking algorithms, and needing to be A/B tested regularly) was awful and performance in SOLR was poor.
It was a really fun project. One of the slogans for our group at the time was “rewriting search in Python for speed.”
The ultimate system we deployed was insanely fast. I doubt you could have made it faster even writing the entire thing directly in C or C++. It became a reference project in the company to help avoid various flavors of arrogant dismissal of Python as a backend service language for performance critical systems.