How do you guys get R predictive models into production? Last I used Plumber to put a REST API in front of it then discovered R is a single threaded runtime so effectively you can only go 1 request at a time. I guess the only option is to containerize and run many instances with a load balancer in front? I develop on a Mac so I can’t go the Microsoft R server route and I don’t want to embed myself into some commercial solution, e.g. Rsuite. You can trivially do this with the Python ecosystem.
My feeling is that R is great for anything that doesn’t need to be operationalized into production (monitoring, security, logging, scaling, performance, etc). There are so many good ML/stats libraries in R and most books seem to use R (when written by academics) but it feels like these people have never had to put anything into production.
It depends on what you mean by 'production'. I've had great success setting up my data collection, engineering and predictions in batch processes. I agree though, I would never try to use R with a REST API, but I don't think it was ever designed for that.
As a general rule of thumb, if something needs real time predictions or I need deep learning libraries, I use Python. R is for anything else.
Exactly, production and deployment process are very different. In enterprise it is very rigid with production that has no internet connection and the best if you do not install pkgs there (supported by rsuite).
But I had a customer who treated dev as prod. :)
R is like any other languages, we have a few rest API in production for live prediction. We use rocker docker image with xgboost and plumber, data.table to do pre prediction data wrangling. Hosted on GCP kubernetes, using 0.25 cpu and 250 mem, API is able to do around 40 requests per second per pod. Multi models, both have more than a 1000 trees.
I can highly recommend RestRserve [0] for bringing R models into production (it forks every request so scaling up is easier than with Plumber). I use it regularly for various projects and I have had minimal issues with it.
Maybe I’m missing it, but does this example work for online predictions? My use case is I have a trained model, and I want to put a REST API in front of it that clients call call.
No it is not example for rest API. Sorry I misunderstood you. I will add example for plumber with rsuite.
Nevertheless the example presents workflow where only scoring should be changed to online from batch.
R is single threaded. The same is with python. We use kubernetes for scaling.
But it is not for all applications of course.
R can be put into production. Rsuite is one of the solutions that helps with that.
ymmv, but many of the libraries R uses run on multiple languages, so you can take the models built in R and run them in another language (usually Java).
Python is single threaded as well. Like Python, R can be made multi threaded, and like Python, R can be productionized without having to convert it into another language.
One possible implementation is a pool of R workers. Each request calls an R worker. So if your pool is 100 and you get 20 requests from 20 different users at once, all 20 will be ran simultaneously. Likewise, many tasks can and should be cached. Consider MemcacheD or similar.
My feeling is that R is great for anything that doesn’t need to be operationalized into production (monitoring, security, logging, scaling, performance, etc). There are so many good ML/stats libraries in R and most books seem to use R (when written by academics) but it feels like these people have never had to put anything into production.