Last time I calculated, the lowest cost way to run spaCy in the cloud was on Google Compute Engine ns1-standard pre-emptable instances. It should be over 100x cheaper per document than using Google, Amazon or Microsoft's cloud NLP APIs. Accuracy will depend on your problem, but if you have your own training data, performance should be similar.
I'm referring to best price per word when the service is continually active. Like, if you want to parse a web dump, what type of instance do you provision a bunch of?
GCE's pre-emptable instances are so much easier to use and manage compared to AWS's spot instances. I made a rule to make stateless services only just to leverage these.
So spacy has support for these languages [1] and wordnet has support for these [2], but neuralcoref (pronoun resolution endpoint) is available only for english.
This current docker image is not exposing those other languages but I can expose them in an update if it helps a lot of people.
That example is a tweet, which the syntax and NER models haven't been trained on. You can make calls to `nlp.update()` to improve it on your own data. We also have an annotation tool, https://prodi.gy , to more quickly create training data.
(I'm the author of spaCy, not this Docker container.)
Relevant links for anyone interested:
* spaCy on Github: https://github.com/explosion/spacy
* NER demo: https://demos.explosion.ai/displacy-ent/
* Neural coref by HuggingFace: https://huggingface.co/coref/
* Accuracy of built-in spaCy models: https://spacy.io/usage/facts-figures
Last time I calculated, the lowest cost way to run spaCy in the cloud was on Google Compute Engine ns1-standard pre-emptable instances. It should be over 100x cheaper per document than using Google, Amazon or Microsoft's cloud NLP APIs. Accuracy will depend on your problem, but if you have your own training data, performance should be similar.