I'm sorry, but I'm not sure that you're correct that a data scientist is a statistician who can write distributed systems. Problems have been solved for decades by people with quantitative training, from well before distributed systems (or computers) existed. Very roughly, a data scientist is a person with a combination of quantitative training and some programming skills, who is able to effectively analyze (and explain) different kinds of data, be they audio, text, census information, or whatever it might be. What we call data scientists today are the engineers, computer scientists, and mathematicians of yesterday.
I have no experience in Data Science. How much need is there for a distributed system? How many data points would one need to necessitate it? What if we had a billion data points. Would it be sufficient to run on a fast system overnight to crunch data?
According to me (note zintinio's reply - he disagrees), the etymology of the term is the following.
Back in the day, if you had small data you didn't need a data scientist. You needed a statistician. He'd do some shit in SAS/Python/etc, reading one CSV and writing another. Then the developers could run that on a beefy server with cron and push the CSV output someplace else.
At some point during the 2000's this stopped working - things became complicated and tangled enough that you couldn't just munge CSVs like this. You needed folks who understood the math well enough to come up with algorithms, and who also understood the computer science well enough to scale out. These folks were termed "data scientists". Banks call them "quant developers".
Very often you don't need a distributed system, or anything that isn't trivially parallelizable. Most of the time when I make money it's based on a CSV < 100GB - often it fits in ram. Nowadays I don't think it's misleading to call yourself a "data scientist" if you can only handle such data sets.
I recently learned that Facebook calls their business analysts data scientists. I guess it's just sexier.
Now that the term is diluted, I propose we use adopt the banking industry term "quant developer" to refer to that.