Hacker News new | past | comments | ask | show | jobs | submit login

> AI is electricity then data is the coal/oil that drives it. The big technology players have a massive advantage in their ability to build and deploy tools that collect the data, and then bring it back to be turned into "electricity."

Data is important now, but when we have solved vision, speech, text and robotics to a decent degree, data won't matter as much. The great thing about AI is that we can cheaply copy already trained models or already labeled datasets. There aren't so many datasets needed to solve the most interesting and financially profitable few problems. Of course, there will always be fringe projects where more data is needed, but the main applications will be in the commons. You can copy an AI model if you can talk to it (use it to produce sample outputs). Any model could be copied in a dataset and transferred into another model. The great thing about machine learning is that it learns directly from data, so it's cheap to copy by tracing the inputs and outputs of other public AIs, just as current AIs are taught by tracing the inputs and outputs of people (supervision).




The distinction should be made between training data and [actionable] data.

In the former case we are taking data, labeling it, then using it to build our nets and models. You are correct to an extent that it's a usable model once trained and that data is less important.

However, equally if not more important is the data that is being put into the net to come out as a result/action. Arguably this data comes through the same pipe as training data - and the pipes are similarly limited. So its ALWAYS important because you can't take an action or classify or otherwise without it.

When you add in the reinforcement mechanism, or later unsupervised techniques then those data mechanisms blur between training and action data so the point is moot. It's not a one run process in the long run, it's iterative and always evolving based on the user.


If you don't have data to "put into the net" then you don't have a problem that you want to solve.


The key point is that it's about data friction. If you already have uploaded all of your photos to google cloud, then any new tool or capability google comes up with using a CNN, will be immediately applicable without you having to do anything.


Was about to say the same thing. The incumbents are probably even at a disadvantage because as a young upstart you can always get in on the action by building a more efficient and simpler model and then querying the incumbents to bootstrap yourself.


you can always get in on the action by building a more efficient and simpler model

Except the big 5 have hoovered up a good portion of the ML talent from the universities and are themselves leading the pack in iterative improvement on ML capabilities.

If you are trying to bootstrap a ML company with the trickle of raw data that the big 5 puts out, you're not going to ever get to their size.

This is not about being able to create a 300k/yr company off the back of some table scraps. It's about major companies having too much influence over one market.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: