Hacker News new | past | comments | ask | show | jobs | submit login

Yes, there is hope for a high-level heuristic understanding. Here's my attempt to explain in more familiar terms.

They train a new neural network from scratch for each problem. The network is trained only on the data about that problem. The loss function tries to make sure it can map the inputs to the outputs. It also tries to keep the network weights small so that the neural network is as simple as possible. Hopefully a simple function that maps the sample inputs to the sample outputs will also do the right thing on the test input. It works 20~30% of the time.




Great explanation, thanks. I have some followups if you have the time!

a.) Why does this work as well as it does? Why does compression/fewer-parameters encourage better answers in this instance?

b.) Will it naturally transfer to other benchmarks that evaluate different domains? If so does that imply an approach similarly robust to pre-training that can be used for different domains/modalities?

c.) It works 20-30% of the time - do the researchers find any reason to believe that this could "scale" up in some fashion so that, say, a single larger network could handle any of the problems, rather than needing a new network for each problem? If so, would it improve accuracy as well as robustness?


Boo, go read the other comments that explain all of this instead of wasting people's time.


> I have some followups *if you have the time*

Emphasis mine. No one should feel obligated to answer my questions. I had hoped that was obvious.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: