Google Prediction API

lja · on April 11, 2015

This has been around for several years. I tried to get a large company I worked at to use it to complement our internal machine learning algos.

We decided not to send data to google because we weren't sure about how it would be used or stored. I wonder if that was paranoia from management or if anyone else on HN feels the same way about using a third-party for intellectual property initiatives.

higherpurpose · on April 11, 2015

This is exactly why Google should be researching fully homomorphic encryption like crazy, yet I haven't heard anything about them ever considering researching it. I think at the very least FHE is a slightly more practical research area than say researching quantum computers, and could have a bigger positive impact on Google's cloud business in the near future.

fhoffa · on April 14, 2015

Homomorphic encryption available on Google BigQuery, if you need: https://code.google.com/p/encrypted-bigquery-client/

For others, strong certifications of privacy are enough, HIPAA among others: https://support.google.com/work/answer/6056694

arebop · on April 11, 2015

Probably paranoia, unless that large company was a competitive cloud computing business. The TOS [https://cloud.google.com/terms/] has two sentences that pretty well eliminate any risks about the use of data and it also has a few paragraphs about storage.

altano · on April 11, 2015

You should take a look at Azure Machine Learning, a product I work on at Microsoft: http://azure.microsoft.com/en-us/services/machine-learning/

The Azure organization, just like all of MS, takes privacy and data management super seriously (http://azure.microsoft.com/en-us/support/trust-center/privac...). There's tons of documentation on our policies which are clearly laid out.

koalaman · on April 11, 2015

For those of us too lazy to read TOS, why is Azure's product less of a privacy or security concern than Google's?

M2Ys4U · on April 12, 2015

>The Azure organization, just like all of MS, takes privacy and data management super seriously

Hah! So is that why Microsoft fired its chief privacy adviser in 2011 for telling a group of MS' National Technology Officers that “If you sell Microsoft cloud computing to your own governments then the FISA law means that the NSA can conduct unlimited mass surveillance on that data”, because they take privacy seriously?

mibbitirc · on April 11, 2015

The other worry of course is that google suddenly decide to shut it down like they did with quite a few APIs such as the translation API (After some back and forth they actually kept translation API but charge quite a lot for it).

latj · on April 11, 2015

Google didnt shut down translation API. They just started charging for it and people who were used to getting it for free considered it dead.

Personally, I would prefer that Google charge for its services. Its certainly better than advertising or trying to monetize users' data.

losvedir · on April 11, 2015

Hmm... if I recall correctly, originally they were going to shut it down completely. (I certainly see some old articles about that, from a brief search.)

I think there was enough pushback from people willing to pay that they decided to keep it open as a paid service instead.

saurik · on April 11, 2015

Correct.

> UPDATE June 3: In the days since we announced the deprecation of the Translate API, we’ve seen the passion and interest expressed by so many of you, through comments here (believe me, we read every one of them) and elsewhere. I’m happy to share that we’re working hard to address your concerns, and will be releasing an updated plan to offer a paid version of the Translate API. Please stay tuned; we’ll post a full update as soon as possible.

http://googlecode.blogspot.com/2011/05/spring-cleaning-for-s...

pearjuice · on April 11, 2015

There is no reason to believe they don't monetize the data coming in from paid services. In fact, that data is even more valuable than random freebie users.

sukilot · on April 11, 2015

The contract/EULA is a reason.

pearjuice · on April 11, 2015

Is it in there, then? That they won't use their data? Even if it is, it is proprietary, freedom-restricting software so you wouldn't have a clue whether they are coming up with their part of the aggreement.

alwaysdoit · on April 11, 2015

If you obfuscated the labels and stored those elsewhere how could Google make use of it, really?

papercruncher · on April 11, 2015

I replaced most of our custom models (build on scikit-learn) with this.

Pros:

- Super simple API, you can be up and running in a few hours

- I don't need to maintain machines/code to do all the training and querying (huge win for us)

- Relatively fast

- Very limited ways to influence the models it builds

Cons:

- Like other Google APIs, it 500's from time to time (~0.5%) for no good reason

- It's a big black box, doesn't expose much information about the models it builds other than a confusion matrix.

- Very limited ways to influence the models it builds

If ML is critical to your business and you know what you are doing, you can get better results with a custom stack.

WhitneyLand · on April 11, 2015

Can someone give a couple of archetypal examples of what a business would use this for?

Jonovono · on April 11, 2015

Can anyone comment on using this vs http://prediction.io/ ?

mrgordon · on April 11, 2015

I ran data through both services a few weeks ago. Both claimed the same accuracy on cross-validation but the prediction.io interface showed the classifier was performing very poorly on the sample predictions that they display. Sure enough, it continued to do poorly when I tried it on new examples. The Google classifier generalized well and continued to get the same accuracy. This is obviously just one anecdote but it did turn me off a bit.

That said, some of the constraints on the Google API are pretty lame (e.g. the same word with different capitalization is treated as two separate words so you have to downcase everything). We ended up writing our own grid search across classifiers and set up our own web service for using it that has a much nicer API and more tolerance for real world input formats.

alexirobbins · on April 11, 2015

Prediction.io is just an interface on top of Apache Mahout. The results appear to be the same: mediocre.

thawab · on April 11, 2015

They switched from Mahout to Spark a while ago. Two days ago Apache Mahout announced that the new version is using an Apache Spark back-end.

http://www.reddit.com/r/MachineLearning/comments/31yj71/apac...

mrgordon · on April 14, 2015

oh wow they integrated H2O as well. really interesting developments!

nautical · on April 11, 2015

Non reliability of API life and data .. have kept me away from this .. no matter how good/bad it is now .

Radle · on April 11, 2015

Can someone give an example of possible in and output? I have no Idea what I can do with his API besides that it is some kind of data analysis

yowmamasita · on April 12, 2015

https://cloud.google.com/prediction/docs/hello_world

gesman · on April 11, 2015

I want Google to put a live demo of this for weather prediction and stock market prediction to demonstrate how well it works :)

revelation · on April 11, 2015

Is it bad that five out of six features are completely generic and would be expected of any product, and the only remaining one tells us nothing at all?

Pretty sure you could just replace Prediction on this page with your product.