Darpa Goes “Meta” with Machine Learning for Machine Learning (2016)

siavosh · on Jan 10, 2017

I'm curious to hear from current researchers:

Back when I did some work in neural networks (10-15 years ago), the field had become saturated with 'meta' papers on topology generation, and also on optimization techniques, (slightly better gradient descent, pruning techniques etc). Neural networks had become more art than science. At the end of that period, I got the distinct sense that these were signs that the field had stalled in any significant break throughs. Soon after they fell out favor for well known reasons.

I'm curious outside of the current hype, if there's a sense of that now with the current field practitioners and saturation of papers.

cr0sh · on Jan 10, 2017

First off, I'm not an expert - right now, I'm an amateur student of the field. I don't have any credentials or anything published. In other words, what I'm going to say probably has little to no merit to it.

What I have seen in recent weeks, as I've investigated different news and articles about machine learning, neural networks, and self-driving vehicles, etc - is that it seems like, at least maybe for a certain class of problems, there is a convergence on a generalized pattern, if not a solution, for implementing the network.

This pattern or solution seems built off of the Lecun LeNet MNIST convolutional neural network; specifically, the pattern seems to be roughly:

(input) -> (1:n - conv layers) -> (flatten) -> (1:n - fully connected layers) -> (output)

where:

(1:n - conv layers) = (conv layer) -> (loss layer) -> (activation layer)

and:

(1:n - fully connected layers) = (fully connected layer) -> (loss layer) -> (activation layer)

The (loss layer) is optional, but given that the (activation layer) seems to have converged (in most cases I've seen, which is probably not representative) to RELU, the (loss layer) is needed (sometimes) to prevent overfitting (simple dropout can work well, for instance).

I don't want to say that this pattern is the "be-all-end-all" of deep learning, but I have found it curious just how many different problems it can be successfully applied to. The RELU operation, while not being differentiable at zero - seems to work regardless (and there are other activation functions similar to RELU that are differentiable - like softplus - if needed).

Anyhow - these "building blocks" seem like the basics - the "lego" for deep learning; taking it from art to engineering. As a student, I've been playing with TensorFlow - and recently Keras (a framework for TensorFlow). These tools make it quick and easy to build deep learning neural networks like I described.

My gut feeling is that I'm talking from my rear; my level of knowledge isn't that great in this field. So - grain 'o salt and all that.

/disclaimer: I am currently enrolled and taking the Udacity Self Driving Car Engineer Nanodegree; in the past I've taken the Udacity CS373 course (2012) and the Standford-sponsored ML Class (2011).

malux85 · on Jan 10, 2017

Nice summary!

I would like to augment just a couple of your observations with a few explanations --- I think the reason the architectures that you mention work so well is because they exploit hierarchical structure inherent in the data. Lots of things in the universe have hierarchical structure, e.g. vision has spatial hierarchy, video has spatial and temporal hierarchy, audio has temporal and spatial hierarchy.

(side note, thats how a baby learns vision "unsupervised" -- the spatial patterns on the retina have temporal proximity, so the supervisor is "time")

RELU is good, but you have to remember how you're vectorizing your data - if you're vectorising is normalizing to 0-1 then RELU fits nicely, but if you're scaling to mean 0 std dev. 1, then half your samples are negative and you'll get information loss when RELU does this: http://7pn4yt.com1.z0.glb.clouddn.com/blog-relu-perf.png so remember to keep your vectorizer in mind.

> My gut feeling is that I'm talking from my rear;

Nope not at all, you have good insight - my email address is in my profile, come join the slack group too (email me, let's go from there)

Capt-RogerOver · on Jan 11, 2017

Would you care to expand "(side note, thats how a baby learns vision "unsupervised" -- the spatial patterns on the retina have temporal proximity, so the supervisor is "time")" to a couple of paragraphs? It seems like a very deep and profound thought and would be very interesting to read.

isoprophlex · on Jan 11, 2017

Regarding the information loss..: How about taking the dumb approach and adding a bias? (Turning 'rectified scaling' into an affine transform, I guess)

argonaut · on Jan 10, 2017

When you say "loss layer" I think you mean output? The notation is a bit mixed up.

Those building blocks are precisely the art (it's both art and engineering) that siavosh is referring to. Everyone is mostly using derivatives of the same basic MNIST architecture of LeCun from 20 years ago ('97), because that's what works (no other justification). With tweaks of course (ReLU, batch norm, etc), also because "they just work."

It's funny you should mention dropout, because that's falling out of favor (fully-connected layers are also falling out of favor). Why? Because in many cases it doesn't seem to be necessary if you do batch norm. That's art and engineering (basically you try with dropout and batch norm, or with just batch norm, and just go with batch norm if no difference).

jpfed · on Jan 11, 2017

DISCLAIMER: I am the noobiest noob that ever noobed a noob.

I'm not sure there's "no other justification". A couple reasons I find the structures given above should be expected to work:

1. It roughly parallels the progression of signals entering our brains through our senses, going through more-or-less convolutional sensory areas of cortex before getting mashed around in the more ad-hoc anterior regions.

2. Consider that convolution layers take advantage of structure in their inputs to radically reduce the number of parameters to train. Would the output of a fully-connected layer have the sort of structure that a convolution layer could take advantage of? (Maybe you could train the fully-connected layer to be exploitable by later conv layers, but doesn't that imply both more training time to get it to do that and more redundant output than you'd want?) It makes more sense to put your convolutions up front so they can take advantage of the correlations in your inputs (perhaps indirectly through the outputs of other conv layers).

argonaut · on Jan 11, 2017

1. Is not a justification, it's a motivation / intuition. It proves little since only the first visual layer (there are 6 visual cortex layers) does convolution-esque things.

2. Justifies convolution over fully-connected. It does not justify convolution in the first place.

None of these justify ReLU, max pooling, batch norm.

For example, Geoffrey Hinton (probably out of all deep learning researchers the most neuroscience motivated) actually believes max-pooling is terrible, the brain doesn't do it, and it's a shame that it actually works.

jpfed · on Jan 11, 2017

Regarding your first point, I agree that it doesn't constitute a justification for "this is the best way" so much as justification for "this is worth trying".

When you say

> It proves little since only the first visual layer (there are 6 visual cortex layers) does convolution-esque things.

I admit I'm not clear on the roles of the various layers of visual cortex. But I should note that there are several convolution-like steps which in an artificial network would be implemented with separate layers: center-surround detection (happens to occur before V1, but that's not really a problem for what I'm saying- it's a distraction that I skipped past for concision's sake) feeds into the convolution-like operations of oriented-edge detection, corner detection, etc. It's not like our visual system in toto just does one convolution-like step.

My second point must have been expressed poorly, because I have apparently completely failed to convey its meaning. It is meant to justify the particular choice of ordering convolutions before fully-connected layers. If you have already chosen to include both types, my point is that it makes sense to put those convolutions before the fully-connected layers.

argonaut · on Jan 11, 2017

The things you described are just V1 of the visual cortex. The first area (not actually the first layer, but there are 5 other visual areas that aren't convolution-esque).

ep103 · on Jan 11, 2017

Okay, to someone just looking to get started in deep learning / ML, This post reads like hieroglyphics. Can you suggest resources I can read to better understand it?

cr0sh · on Jan 11, 2017

TL;DR - read my post's "tag" and take those courses!

---

As you can see in my "tag" on my post - most of what I have learned came from these courses:

1. AI Class / ML Class (Stanford-sponsored, Fall 2011)

2. Udacity CS373 (2012) - https://www.udacity.com/course/artificial-intelligence-for-r...

3. Udacity Self-Driving Car Engineer Nanodegree (currently taking) - https://www.udacity.com/course/self-driving-car-engineer-nan...

For the first two (AI and ML Class) - these two MOOCs kicked off the founding of Udacity and Coursera (respectively). The classes are also available from each:

Udacity: Intro to AI (What was "AI Class"):

https://www.udacity.com/course/intro-to-artificial-intellige...

Coursera: Machine Learning (What was "ML Class"):

https://www.coursera.org/learn/machine-learning

Now - a few notes: For any of these, you'll want a good understanding of linear algebra (mainly matrices/vectors and the math to manipulate them), stats and probabilities, and to a lessor extent, calculus (basic info on derivatives). Khan Academy or other sources can get you there (I think Coursera and Udacity have courses for these, too - plus there are a ton of other MOOCs plus MITs Open Courseware).

Also - and this is something I haven't noted before - but the terms "Artificial Intelligence" and "Machine Learning" don't necessarily mean the same thing. Based on what I have learned, it seems like artificial intelligence mainly revolves around modern understandings of artificial neural networks and deep learning - and is a subset of machine learning. Machine learning, though, also encompasses standard "algorithmic" learning techniques, like logistic and linear regression.

The reason why neural networks is a subset of ML, is because a trained neural network ultimately implements a form of logistic (categorization, true/false, etc) or linear regression (range) - depending on how the network is set up and trained. The power of a neural network comes from not having to find all of the dependencies (iow, the "function"); instead the network learns them from the data. It ends up being a "black box" algorithm, but it allows the ability to work with datasets that are much larger and more complex than what the algorithmic approaches allow for (that said, the algorithmic approaches are useful, in that they use much less processing power and are easier to understand - no use attempting to drive a tack with a sledgehammer).

With that in mind, the sequence to learn this stuff would probably be:

1. Make sure you understand your basics: Linear Algebra, stats and probabilities, and derivatives

2. Take a course or read a book on basic machine learning techniques (linear regression, logistic regression, gradient descent, etc).

3. Delve into simple artificial neural networks (which may be a part of the machine learning curriculum): understand what feed-forward and back-prop are, how a simple network can learn logic (XOR, AND, etc), how a simple network can answer "yes/no" and/or categorical questions (basic MNIST dataset). Understand how they "learn" the various regression algorithms.

4. Jump into artificial intelligence and deep learning - implement a simple neural network library, learn tensorflow and keras, convolutional networks, and so forth...

Now - regarding self-driving vehicles - they necessarily use all of the above, and more - including more than a bit of "mechanical" techniques: Use OpenCV or another machine vision library to pick out details of the road and other objects - which might then be able to be processed by a deep learning CNN - ex: Have a system that picks out "road sign" object from a camera, then categorizes them to "read" them and use the information to make decisions on how to drive the car (come to a stop, or keep at a set speed). In essence, you've just made a portion of Tesla's vehicle assist system (first project we did in the course I am taking now was to "follow lane lines" - the main ingredient behind "lane assist" technology - used nothing but OpenCV and Python). You'll also likely learn stuff about Kalman filters, pathfinding algos, sensor fusion, SLAM, PID controllers, etc.

I can't really recommend any books to you, given my level of knowledge. I've read more than a few, but most of them would be considered "out of date". One that is still being used in university level courses is this:

http://aima.cs.berkeley.edu/

https://www.amazon.com/Artificial-Intelligence-Modern-Approa...

Note that it is a textbook, with textbook pricing...

Another one that I have heard is good for learning neural networks with is:

https://www.amazon.com/Make-Your-Own-Neural-Network/dp/15308...

There are tons of other resources online - the problem is separating the wheat from the chaff, because some of the stuff is outdated or even considered non-useful. There are many research papers out there that can be bewildering. I would say if you read them, until you know which is what, take them all with a grain of salt - research papers and web-sites alike. There's also the problem of finding diamonds in the rough (for instance, LeNet was created in the 1990s - but that was also in the middle of an AI winter, and some of the stuff written at the time isn't considered as useful today - but LeNet is a foundational work of today's ML/AI practices).

Now - history: You would do yourself good to understand the history of AI and ML, the debates, the arguments, etc. The base foundational work come from McCulloch and Pitts concept of an artificial neuron, and where that led:

https://en.wikipedia.org/wiki/Artificial_neuron

Also - Alan Turing anticipated neural networks of the kind that wasn't seen until much later:

http://www.alanturing.net/turing_archive/pages/reference%20a...

...I don't know if he was aware of McCulloch and Pitts work which came prior, as they were coming at the problem from the physiological side of things; a classic case where inter-disciplinary work might have benefitted all (?).

You might want to also look into the philosophical side of things - theory of mind stuff, and some of the "greats" there (Minsky, Searle, etc); also look into the books written and edited by Douglas Hofstadter:

https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach

There's also the "lesser known" or "controversial" historical people:

* Hugo De Garis (CAM-Brain Machine)

* Igor Aleksander

* Donald Michie (MENACE)

...among others. It's interesting - De Garis was a very controversial figure, and most of his work, for whatever it is worth - has kinda been swept under the rug. He built a few computers that were FPGA based hardware neural network machines that used cellular automata a-life to "evolve" neural networks. There were only a handful of these machines made; aesthetically, their designs were as "sexy" as the old Cray computers (seriously).

Donald Michie's MENACE - interestingly enough - was a "learning computer" made of matchboxes and beads. It essentially implemented a simple neural network that learned how to play (and win at) naughts and crosses (TIC-TAC-TOE). All in a physically (by hand) manipulated "machine".

Then there is one guy, who is "reviled" in the old-school AI community on the internet (take a look at some of the old comp.ai newsgroup archives, among others). His nom-de-plume is "Mentifex" and he wrote something called "MIND.Forth" (and translated it to a ton of other languages), that he claimed was a real learning system/program/whatever. His real name is "Arthur T. Murray" - and he is widely considered to be one of the earliest "cranks" on the internet:

http://www.nothingisreal.com/mentifex_faq.html

Heck - just by posting this I might be summoning him here! Seriously - this guy gets around.

Even so - I'm of the opinion that it might be useful for people to know about him, so they don't go to far down his rabbit-hole; at the same time, I have a small feeling that there might be a gem or two hidden inside his system or elsewhere. Maybe not, but I like to keep a somewhat open mind about these kinds of things, and not just dismiss them out of hand (but I still keep in mind the opinions of those more learned and experienced than me).

EDIT: formatting

ep103 · on Jan 11, 2017

Oh man, thank you! Thank you!

malux85 · on Jan 10, 2017

Topology generation requires the evaluation of many models - For non-trivial datasets/model sizes the evaluation of a model can be hours, days or weeks, even on current GPUs.

There's much better tooling in place, and hardware in place( modern GPUs) now to distribute the evaluation over a cluster of GPU machines, so it's becoming practical to push the boundaries further

edit: I also forgot to mention, older methods often required the manual specification of heuristics for mutation and crossover rules, this is no longer the case, because the machines can be trained with objective functions to prevent overfitting, optimize execution speed or memory usage and can use techniques like dropout, skip connections, layer / node hyperparameters and so on to direct the topology search

jb1991 · on Jan 10, 2017

Is NEAT among the methods you refer to?

malux85 · on Jan 10, 2017

Yes, but take it way further. i.e. How can you use a Deep Network to direct NEAT? Can you model the topology generation as a latent space, and what would some of the properties of that space be? Can we integrate the best parts of transfer learning to speed up the generate -> evaluate cycle. That sort of thing

sjg007 · on Jan 11, 2017

Sure sounds reasonable. This feels like Markov chain Monte Carlo.

nl · on Jan 11, 2017

GANs (and InfoGAN[0] in particular) has shown there are still pretty huge breakthoughs out there.

Some other things you may be interested in:

Learning to learn by gradient descent by gradient descent[1]

Neural Architecture Search with Reinforcement Learning[2]

[0] https://arxiv.org/abs/1606.03657, https://arxiv.org/pdf/1701.00160.pdf

[1] https://arxiv.org/abs/1606.04474

[2] https://openreview.net/pdf?id=r1Ue8Hcxg

highd · on Jan 10, 2017

The improvements to gradient descent techniques over vanilla gradient descent as well as ideas like batch normalization and residual nets have made even moderately large networks that were previously impossible to train now possible, let alone computationally tractable. Most of this work was done in the last 5-6 years. Furthermore, training can now be made drastically more robust, with far fewer empirically set parameters. This has made deep neural nets far more useful in practical application.

siavosh · on Jan 10, 2017

I don't doubt that significant progess has been made in the last five or so years. Meta papers just make me curious given my past experience: the field had ended up with so many tunable parameters that no one understood, it gave rise to whole new papers to 'learn' some arbitrary set of those parameters by a different method. The dirty secret was you were simply adding more parameters. Thus the real issue was the research field had stalled and hit a temporary dead end.

ma2rten · on Jan 11, 2017

There is no sign of a slowdown as of yet. See Andrew Ng's take on this question:

https://www.technologyreview.com/s/603062/ai-winter-isnt-com...

Training neural networks is still an art to a degree, but it is becoming better understood, not less.

LeanderK · on Jan 10, 2017

Disclaimer: Interested in NN, but no real knowledge yet.

A ML-learning group at my university recently discussed Learning to learn by gradient descent by gradient descent [1], the topic seems related. This indicates to me that at least some progress was made.

[1] https://arxiv.org/abs/1606.04474

PaulHoule · on Jan 10, 2017

I am an outsider but looking at the last year I think significant improvements are going on right now in both deep networks and neural nets.

If you were going into CS grad school for a PhD it is very possible that the bubble might pop before you graduate.

sjg007 · on Jan 11, 2017

Well study AI for the sake of AI not because it is currently sexy. When I studied AI it was all about planning systems. The next wave was Bayesian networks and now it is deep learning and reinforcement learning. There will be something else after that... just the other day we had an article on DL and then prolog style learning to generalize the numerous to symbolics. So the field is at a precipice so to speak. DL is in the application phase and then there's a bunch of theoretical work to do. The theory is probably the domain of mathematicians at this point so melding DL with other AI techniques seems the most promising. That also has multiple applications. You can imagine self learning rule based systems as the next big thing.

PaulHoule · on Jan 11, 2017

To me what is exciting now are the approaches that combine traditional parsing approaches such as finite state machines with learning.

Inductive logic programming is also a field that would benefit from a breakthrough.

sjg007 · on Jan 11, 2017

FYI I was fortunate to see NEAT during a guest lecture in 2006/2007 at Ucla. I read the DL paper by Hinton in Science when it came out and still didn't know enough to capitalize on it at the time.

Florin_Andrei · on Jan 10, 2017

There is disagreement on whether this is currently a bubble or the next big thing. I believe evidence so far points at the latter.

PaulHoule · on Jan 10, 2017

Those are not contradictory!

The story of most academic developments is that somebody develops a new technique, then they solve the problems that are easy to solve with the technique. At some point the problems that can be solved with the technique are solved and nobody has an idea of how to break the technique's limitations. Maybe 20 years after that, somebody gets a new idea.

Some examples: perceptrons, symbolic A.I., multilayer perceptrons, renormalization group theory in physics, etc.

It can take 10 years to get a PhD so it is not impossible that the field can burn out before then. I don't believe it is burning out in 2 or 3 years.

niels_olson · on Jan 11, 2017

My sense is that the major issue is not the algorithms. The algorithms are plenty fine today. The major issue is data. Not "more" data. Just data. There are vast swaths of the economy where people simply haven't figured out how to digitize the data yet. They have no idea the gold mines they are sitting on.

Imagine everyone who never took multi-dimensional calculus in college (most of humanity). Where are they today? Those folks have data they're not using.

eva1984 · on Jan 10, 2017

Not a theory person, so I didn't quite follow whether or to what extent deep learning has made breakthrough on that front.

But at this moment, we are far from done playing with what we already discovered. Even as a black box, deep learning already cracked many traditional tasks to near/above human performance, and enables many application that is simply not possible before.

So, I guess we are going to more papers to come in the following years until no low-hanging fruits left.

deepnotderp · on Jan 10, 2017

Meta-DL is totally possible. Check neuroevolution and NNA search through RL

zopf · on Jan 10, 2017

First off: isn't this from June of 2016?

Second: I don't get it. The primary example they use to illustrate the need has almost nothing to do with model building or selection, and everything to do with selecting and painstakingly cleaning data. This mirrors my experience with data science so far.

"A recent exercise conducted by researchers from New York University illustrated the problem. The goal was to model traffic flows as a function of time, weather and location for each block in downtown Manhattan, and then use that model to conduct “what-if” simulations of various ride-sharing scenarios and project the likely effects of those ride-sharing variants on congestion. The team managed to make the model, but it required about 30 person-months of NYU data scientists’ time and more than 60 person-months of preparatory effort to explore, clean and regularize several urban data sets, including statistics about local crime, schools, subway systems, parks, noise, taxis, and restaurants."

So - the meta part isn't such a big deal. But if DARPA has found a way to properly automate the painstaking process of selecting, cleaning, validating, and normalizing data, well THEN we'll really have something to be impressed about.

amelius · on Jan 10, 2017

So after truck drivers, data scientists are next on the list of people who will lose their jobs to ML.

It seems that this new technology impacts the poorly and highly educated alike.

nicklovescode · on Jan 10, 2017

Similarly, I think it's important to think of jobs in terms of behavior requirement rather than income levels.

For instance, it's well documented that radiologists are going to be one of the first jobs that suffer as a result of ML, due to it being image recognition.

namrog84 · on Jan 10, 2017

He didn't reference income levels. But that of education level. Education level isn't always correlated with income. Just look at a lot of PhD that don't make a ton compared to less "formally" educated

fest · on Jan 11, 2017

I was surprised to see that ECG test results are still analyzed/summarized manually[0]. I have seen ECG machines at hospitals try to summarize the test automatically but doctors don't seem to trust them and still manually analyze the result using measuring compass and specially graduated rulers.

It seems to me that the companies manufacturing these machines used pure signal processing techniques which don't give the results doctors are expecting.

0: http://www.ecglibrary.com/ecgs/norm.png

sjg007 · on Jan 11, 2017

Yes but it will be fought.. also radiology is also outsourced.

drcode · on Jan 10, 2017

I disagree- The last humans to be jobless will be those that specialize in massaging data to feed it to computers.

deelowe · on Jan 10, 2017

You sure about that? I assumed it would be celebrities, fine dining chefs, sommeliers, brew masters, etc... Basically, people who are recognized just as much for their notoriety as much as the products they produce.

My humidor after all, has not a single machine made cigar despite one every so often being improperly rolled and unsmokable.

mastazi · on Jan 10, 2017

On a general level I agree, the less quantitative-based is the job, the more chances there are that it won't be done by machines.

However let's not forget that we have (and we have had for some time now) algorithmic composition[1] and more recently we have seen style transfer through neural networks[2] and if artificial intelligence is able to produce visual art and music, that makes me think that there are very few human activities that may not be automated.

What will "save" some human activities is either technological shortcomings, which are probably temporary (e.g. the hand made cigars in your humidor: the difficulty is that machines are currently not very good at rolling cigars with a whole leaf filler, but I think eventually we will get there) or consumer's preference, which is arbitrary (e.g. you may think "OK so now, in 2025, machines are able to roll as well as the best Cuban rollers, but I choose to buy only human made cigars, just because I can").

[1] https://en.wikipedia.org/wiki/Algorithmic_composition

[2] https://research.googleblog.com/2016/10/supercharging-style-...

pmyjavec · on Jan 11, 2017

I don't think it really matters, humans can still innovate radical new art works anyway. That's all this stuff is based on, existing master works, it's just an augmentation.

Besides, a LOT of the music people listen to is already heavily computer generated.

mastazi · on Jan 11, 2017

Yes it is true that style transfer just applies a known style to an existing picture so it could be considered as not producing original work (I should have chosen a better example), however there are examples of completely algorithmic generated art e.g. https://en.wikipedia.org/wiki/Algorithmic_art

symlinkk · on Jan 11, 2017

> a LOT of the music people listen to is already heavily computer generated

source?

pmyjavec · on Jan 11, 2017

Is this something you need a source for?

People have been "producing" music using computer assited software for decades. I know people who have produced whole albums without ever touching an instrument and who know nothing about musical theory, the product? Pretty good.

Music production software can correct melodies, autotune voices,setup chord progressions, lay down rhythms etc. You could probably argue that knowing advanced musical theory hasn't really been important for a long time because so many people love "simple" music. Pop, dance etc.

This combined with the already practically endless amount of good already available practically makes the need for machine generated music redundant, the automation of musicians, photographers and painters happened a long time ago.

I think music is largely about "discovery" now, there is just such an abundance of great stuff already available, you could never hear it all in a lifetime.

But hey, people still manage to make great stuff and live music is still the life blood of it all ;)

mastazi · on Jan 11, 2017

I did not downvote you but I think you are being downvoted because computer-aided music production and algorithmic music composition are two very different things; when using a DAW program, musicians still need to enter notes using some sort of GUI (nowadays the most common interface is called "piano roll")[1].

Algorithmic composition, on the other hand, is less common, and mostly relegated to avant-garde or art music, although there are examples in popular music such as Brian Eno[2].

[1] http://www.image-line.com/support/FLHelp/html/pianoroll.htm

[2] https://en.wikipedia.org/wiki/Generative_music

pmyjavec · on Jan 12, 2017

Cheers, this is a good point!

aruggirello · on Jan 11, 2017

We'll end up with "human made!" labels on some products...

deelowe · on Jan 11, 2017

Perhaps. Maybe we'll even see an anti tech trend of sorts. Who knows. I just think when everything is the same as generic stuff you buy at Walmart, people will value the things that have a human touch more.

dbwest · on Jan 11, 2017

Reminds me of "The Diamond Age" by Neil Stephenson

pmyjavec · on Jan 11, 2017

...or may contain human inputs :)

stephengillie · on Jan 11, 2017

I assumed it would be customer service and technical support - the human interface to computers, the ones who tell others how to make the machine do what they want.

jobigoud · on Jan 11, 2017

> technical support

Isn't that already very formatted and automation-friendly? I feel a dedicated machine could very well be the one that explains to a human how to make another machine do what he wants. The simple cases have been replaced with automated helplines already.

shostack · on Jan 11, 2017

Data sommelier? Catchy.

emorse · on Jan 10, 2017

May not be true as unsupervised learning improves.

felippee · on Jan 10, 2017

Not a single truck driver has yet lost his job to ML. Just a side note.

mastazi · on Jan 10, 2017

Your statement is true with regards to long-haul trucks, but not in other cases e.g. the port of Rotterdam has had driverless container trucks for some years now https://www.youtube.com/watch?v=xhZBalESrXU

Also confirmed by this article https://www.naiop.org/en/Magazine/2015/Summer-2015/Business-...

renesd · on Jan 10, 2017

Mining, shipping, and human transport driver jobs have all been lost to robots. There's giant trucks in WA Australia, the ports (again in Aus), and trains in airports. The big new train line in Sydney also has no drivers. (Not just these, and not just in Australia -- these are just used as examples you can look up).

digler999 · on Jan 10, 2017

Driver jobs have been lost to autonomous container movers at shipping ports. Not expecting ML to have been integral to that, but driver jobs were indeed lost due to automation.

argonaut · on Jan 10, 2017

You must be thinking of longshoremen, not drivers.

taneq · on Jan 11, 2017

I don't know if it's "ML" but ore trucks at all the new Western Australian iron ore mines are autonomous. Instead of a driver per truck, you have a couple of operators in the control room giving orders to the fleet.

deepnotderp · on Jan 10, 2017

Literally no one hires data scientists/ML engineers only to make a basic feedforward ConvNet.

zitterbewegung · on Jan 10, 2017

Should have a 2016 in the title. I know that Mathematica tries to do this with machine learning functions and its available in Mathematica 10 https://www.wolfram.com/mathematica/new-in-10/highly-automat...

Cacti · on Jan 10, 2017

This isn't about automating model generation, which is an old goal of many, but of DARPA's involvement in it and the start of a new program to encourage work in this area.

jb1991 · on Jan 10, 2017

Dah, that's nothing. We have models that we built specifically to determine the best way to train other models. After 2 years of tweaking these models, we learned that the best way to optimize our learning was to build new models that train these models for determining how to train our other models.

joshvm · on Jan 11, 2017

The obvious citation is the wonderfully titled Learning to learn by gradient descent by gradient descent https://arxiv.org/abs/1606.04474

beachbum8029 · on Jan 10, 2017

So it's models all the way down?

randcraw · on Jan 10, 2017

Of course. Even Plato knew that. But since most models are wrong... there's life yet in us 'bags of mostly water'.

Capt-RogerOver · on Jan 11, 2017

Models are not wrong. They are just models. Ie, != reality.

notgood · on Jan 10, 2017

Until we realize such matrix of models analyzing each other is what we tend to call "consciousness"

jb1991 · on Jan 10, 2017

That is exactly what they realized when they created me.

domenukk · on Jan 11, 2017

Dun dun dun

jgalloway___ · on Jan 10, 2017

I pitched something similar a few years back to a client and the client's architect agreed there was value in it but we couldn't make it fit in our sprint schedule so went with some other user stories.

malux85 · on Jan 10, 2017

Topology generation is the new feature engineering, it's not surprising that it's the first thing to go. I have been working on advancing HyperNEAT for this very purpose - similar, but not quite the same to: http://blog.otoro.net/2016/05/07/backprop-neat/

Non-sequential and non-hierarchical topologies are the future, and it makes sense that machines should generate them

aaronsnoswell · on Jan 10, 2017

Can you expand on what you mean by non-sequential and non-hierarchical?

benevol · on Jan 10, 2017

We're about to really get to know and feel the exponential property of technology.

radarsat1 · on Jan 10, 2017

With recent successes in specific games (Go, arcade games), I'd love to see a resurgence of interest in "General Game Playing" research, i.e., the idea that an AI is given a description of a game and learns how to play it.

Doing a search shows that it's a subject that is still being taught, but I rarely see recent articles about it.

bluetwo · on Jan 10, 2017

I'm also very interested in this area of AI. It does seem like it is due for a resurgence.

I wonder a bit if there is a bias against the gaming aspect of it.

The jumps in poker have been kind of neat lately. Would like to see more variations played.

mastazi · on Jan 11, 2017

This is completely OT but I just googled about the topic of "solving" Texas Hold'em Poker and other imperfect information games and discovered that one of the contributors to the research on Counterfactual Regret Minimization algorithms, developed at the University of Alberta[0], is Oskari Tammelin[1] who also created, in the late 90s, my favourite piece of music software, Jeskola Buzz[2]. His personal website[3] contains pages on both topics.

[0] http://poker.srv.ualberta.ca/about

[1] https://arxiv.org/abs/1407.5042

[2] https://en.wikipedia.org/wiki/Jeskola_Buzz

[3] http://jeskola.net/

radarsat1 · on Jan 11, 2017

That's pretty awesome! I used to use Buzz quite a bit too! In fact his harddrive mishap is what pushed me over the edge to become an open source advocate ;) I wish I could find the details about that accident, but if I can remember correctly he lost the source code for Buzz and people spent quite a bit of time reverse engineering the .exe just to be able to continue writing "gear" for Buzz..

mastazi · on Jan 11, 2017

Yes, he eventually re-wrote the whole application in .NET, that is the version I'm using now and 99% of my old tracks and plugins from around 1999-2003 still work OK in the current version!

I wish Buzz was open source too, I don't know if you've ever tried any but there are a couple of similar open source applications, my favourite used to be Psycle, even though the project doesn't seem to be very active nowadays http://psycle.pastnotecut.org/portal.php and there is also Buzztrax which directly inspired from Buzz http://buzztrax.org/

mastazi · on Jan 11, 2017

BTW if anyone is looking for buzzmachines.com then you're out of luck, it seems that the website was hacked and DB user info leaked, more back story here: http://forums.jeskola.net/viewtopic.php?f=2&t=2040

The new reference site for Buzz gear is this one: http://buzz.robotplanet.dk/

bluetwo · on Jan 11, 2017

Well, I'm also interested in this area, so I guess I have something else to research!

phaemon · on Jan 10, 2017

Despite having thought of the dangers of AI, this literally never occurred to me. Humans don't create AI, machines do.

LesZedCB · on Jan 10, 2017

Imagine if we had the unambiguous source code for our own bodies? AI will be able to have that without any work, and can modify and recompile with little to no effect and all benefit in the matter of seconds, where humans have to wait years.

This is referred to as the intelligence explosion[0] or AI Foom[1].

[0] https://en.wikipedia.org/wiki/Intelligence_explosion

[1] https://wiki.lesswrong.com/wiki/The_Hanson-Yudkowsky_AI-Foom...

mastazi · on Jan 11, 2017

> Imagine if we had the unambiguous source code for our own bodies?

We sequenced human DNA, so I think we already have something kinda like it. I'm not a scientist but my basic understanding is that while we have the whole sequence (in your source code analogy, we can do git clone or git pull) we are not really good at tweaking it ("oops this codebase uses architectural solutions we're not familiar with, we will have to study it extensively before we can deploy those proposed changes to the production environment")

jcranmer · on Jan 11, 2017

DNA is not really source code. At best, it's kind of like... an algorithm textbook. Only a very tiny portion of DNA (like, ~1-2% in Homo sapiens) codes for proteins in the manner that you would have learned in biology class. A much larger fraction is used somehow, but how much and what its functions are is still a matter of major debate.

The effort to sequence the human genome has also led us to discover that there are other ways in which traits can be heritable that don't involve changes to DNA. The focus now is largely on epigenetics, although explanations invoking microbiomes of bacteria inside our organs was definitely popular for a while.

In short, we really don't know a whole lot about our own molecular biology, and a lot of the research in the past 60 years since the discovery of DNA has tended to show "there's more going on than we thought." Where things involve just DNA, we have very good tools for reading (sequencing) and writing (CRISPR/Cas9) it. What we don't have good tools for is modifying our epigenetics, and we don't have a good handle on what comes from DNA and what comes from epigenetics.

mastazi · on Jan 11, 2017

Thanks, I'm interested in this topic, would you be able to suggest sources that are understandable by a layperson?

jcranmer · on Jan 11, 2017

Unfortunately, I'm no expert in molecular biology. Some of the more recent stuff I've picked up by following Derek Lowe's blog (http://blogs.sciencemag.org/pipeline/), which occasionally discusses some new results in molecular biology.

allenz · on Jan 11, 2017

The standard way to learn this stuff is a course/textbook followed by reading the important papers. I recommend https://ocw.mit.edu/courses/biology/7-28-molecular-biology-s...: it is well-organized and a great use of time, even though it doesn't get into the latest most exciting topics.

If you just want an overview of what's going on, I curated these links for you. For each, focus on the main idea and why it is significant for accomplishing biological goals.

https://en.wikibooks.org/wiki/Human_Physiology/Homeostasis#O...

https://en.wikipedia.org/wiki/Molecular_biology section 1-2

Skim https://en.wikipedia.org/wiki/Central_dogma_of_molecular_bio...

https://www.khanacademy.org/science/biology/classical-geneti... watch 2x speed

https://en.wikipedia.org/wiki/Genetics#Research_methods

https://en.wikipedia.org/wiki/Cellular_communication_(biolog...

https://www.khanacademy.org/science/biology/cell-signaling

https://mcb.berkeley.edu/courses/mcb110spring/nogales/mcb110...

https://en.wikipedia.org/wiki/Epigenetics intro and diagram only

http://www.zymoresearch.com/learning-center/epigenetics/what... all pages

https://www.jove.com/science-education-database/2/basic-meth... understand basic research methods

https://www.khanacademy.org/science/biology/human-biology#im...

skim https://en.wikipedia.org/wiki/Induced_pluripotent_stem_cell

https://www.cancerquest.org/cancer-biology/ is awesome. I recommend angiogenesis, metastasis, and tumor-host interactions

mastazi · on Jan 11, 2017

Thank you so much, this is awesome! I really appreciate the time you must have spent to put it together!

Capt-RogerOver · on Jan 11, 2017

What other dangers did you think of? It seem the only dangers all stem exactly from this: when the technology becomes exponential (machines making machines), at which point they become better at it than humans, and humans can't longer predict the outcome. The technological singularity. The black hole and the big bang of the conscious technology, with results that we can't begin to fathom how to fathom. It's all based on the assumption that AI will make other better AI. It's not an option, it's an inevitable and logical progression for ML tech.

mark_l_watson · on Jan 11, 2017

From my limited experience of just working on a couple of DARPA projects (decades ago), DARPA is interested in new technology focused on specific problems.

The idea of automating aspects of data science like data cleansing and model building makes sense. A good goal.

There is obviously a lot of excitement in deep learning success stories but I would like to see more effort also put into both fusing more traditional AI with deep learning, and also better interactive UIs for data science and machine learning work flows.

A little off topic, but I started using Pharo Smalltalk again this week after not touching it in a long time. I was thinking about using Pharo like environments instead of tools like iPython for organizing workflows. I admit this is likely not such a good idea because most of the great libraries for data science and AI are in C++, Java, Python, and Scala.

yellowapple · on Jan 10, 2017

I'm pretty sure this is exactly how Skynet starts. ;)

More seriously, as someone who's interested in machine learning but doesn't really know where to start in terms of really understanding it beyond "it takes input, does some weird statistical magic, and gives you an output": would one of these D3M tools (should they become readily available and less theoretical) be a decent starting point? Or would it still be better from a learning perspective to start with something more fundamental or "basic building block"? In other words: does it help to know how to build these models from scratch even if you're using some tool to do it automatically?

Florin_Andrei · on Jan 10, 2017

> doesn't really know where to start

- Take Andrew Ng's ML class on Coursera.

- Install TensorFlow and TFLearn.

- git clone something based on TF from GitHub and hack it.

- Do some pet project from scratch.

- profit

xiphias · on Jan 10, 2017

The idea is nothing new..everything in machine learning is being automated. The hard problem is to find.the next step that can be easily...just randomly training new models and adjusting doesn't work, because training time goes up too fast.

adamnemecek · on Jan 10, 2017

I've been trying to find an answer to this for some time. Is it possible that ML could be implemented much better on analog computers as they have a "native support" for differential calculus?

dnautics · on Jan 10, 2017

"native support" for differential calculus is not a problem for ML. If you have a sufficiently powerful programming language, you can trivially do forward differentiation/automatic differentiation instead of backprop which is just about as "native support" as it gets (AD can also be done in, say, C, or python, it's just trickier). Granularity is also not really that much of a problem, you can reduce your bit precision to ~12 bits and still get good results with ML.

adamnemecek · on Jan 10, 2017

This isn't about language support or library support, this is about hardware support. How many instructions does it take to derive/integrate a function on x86? On an analog platform this is very close to "single instruction". What if you could solve calculus idk, 1000x times faster. It could be even more though.

dnautics · on Jan 11, 2017

Yeah, and on analog platforms you have serious problems with accumulating. Good luck doing a 100x100 matrix times a 100x500 matrix on an analog system, which is the sort of thing you need for machine learning.

Automatic differentiation is usually very little overhead. Think "putting a second set of values through a compiled slightly differently function".

Although there are calculus-related issues like saturation, smoothness of transfer functions, etc. "solving calculus" is not the major problem with machine learning efficiency and throughput. Linear algebra is the main bottleneck.

p1esk · on Jan 11, 2017

Actually, matrix-matrix multiplication is where analog computing really shines.

A multiplication operation can be done with a single transistor: one operand is gate voltage, another one is source to drain voltage. The resulting current through the transistor is proportional to the product of the two. An addition operation is even simpler: it's just connecting two wires carrying two different currents - the resulting current on the output wire will be the sum (due to Kirchhoff laws).

Of course, you pay for this efficiency with precision, which for some applications (e.g. neural networks) can be a reasonable trade off.

kelvin0 · on Jan 10, 2017

OK, so how do you store your model? In analog or digital format? If it's digital you might have to convert the analog signals into digital, with all the constraints involved with such a conversion. Also when you replay your model to process some data, you need to convert back from digital to analog.

Please correct me if I misunderstood your point.

Otherwise you might need to clarify your idea a bit more.

adamnemecek · on Jan 10, 2017

Note that my arguments are pretty likely wrong in some fundamental ways but here it goes.

> OK, so how do you store your model? In analog or digital format?

This is a good question. Do you think that conversion back and forth would diminish the gains from analog computing?

kelvin0 · on Jan 10, 2017

Conversion from analog to digital will necessarily involve the loss of information. Now the question is, is the data lost crucial to the operation of your system or not? If losing some data is not that big a deal, you might as well build it digitally from the ground up? On the other hand if you can't afford to lose any original information from your analog system, you might have quite a few obstacles to overcome and design an entire system which would be analog based (kinda like the human brain on some level). But then again, if you can build a perfect functioning human brain a nobel prize awaits you.

adamnemecek · on Jan 10, 2017

> Conversion from analog to digital will necessarily involve the loss of information. Now the question is, is the data lost crucial to the operation of your system or not? If losing some data is not that big a deal, you might as well build it digitally from the ground up?

I think of it as the output being the solution of an equation/dynamic system. It's ok you lose the equations, you care about the solution. I've been doing some light internet research into this not too long ago and came across this title https://www.amazon.com/Neuromorphic-Photonics-Paul-R-Prucnal... that seems to suggest that analog photonics might be the way forward.

But I can't really comment on the state of this. Photonics is definitely becoming more and more viable it appears.

kelvin0 · on Jan 10, 2017

>"...The goal of D3M is to help overcome the data-science expertise gap by enabling non-experts to construct complex empirical models through automation of large parts of the model-creation process..."

This makes me think of the premise behind Excel, and Access: provide a way for (mostly) non-programmers a nice wizard like tool to analyze data. Of course, the (painful) shortcomings of such tools are known all too well. Hope this Darpa project fares better. On the plus side it democratizes access to a technology, the good and the bad.

lowglow · on Jan 10, 2017

This is epic and nice to see more people thinking this way. We're actually tackling something similar at Asteria right now and we're about to open source our device client software in a much needed RFC. Gitter here for anyone interested: https://gitter.im/getasteria/General

randcraw · on Jan 10, 2017

I suspect the goal here is to automate the task of intelligence analysts. As the quantity of signal data has gone up exponentially, the DoD needs ever more subject matter experts who are also schooled in data science techniques. But that's a rare combo of skills, and given the high demand for these folks in industry and the better pay there, I suspect Uncle Sam is suffering a dire unmet need. Ergo the proposed solution: automate them.

But the example in the article of traffic modeling with 30 person-months of analysis and 90 p-months of cleansing illustrates how hard such analysis actually is. Ain't no way you can automate it.

Useful intel signal is hard to find in noise, especially in what by now must be zottabytes. Enlisting anyone to do analysis without deep domain expertise, especially someone as dumb as a computer, is not a promising strategy for success. But it sounds like a great way for beltway bandits to get funding for long term blue sky R&D contracts...

eruditely · on Jan 10, 2017

Is Darpa just a really good program or something? Across the years I have yet to hear any governmental rachet toward decay, they have been seemingly involved and doing it will for quite some time.

Any reason for this?

slantaclaus · on Jan 10, 2017

Upvote because title