How should we evaluate progress in AI?

YeGoblynQueenne · on July 16, 2018

>> GOFAI had several defects, but… the main thing is, nearly all of it was false.

It would be nice to see some kind of substantial examples of how "(nearly all of) GOFAI was false", accompanying statements like the one above. The problem of course is - those are very hard to come by.

That is so because logic-based AI was abandoned. And it was abandoned because funding was cut repeatedly, not because of its failure to prove this theory or achieve that aim, but because the ones holding the purse strings were administrators and military pencil-pushers, who had no way to know a successful, or failed, program if one came up and bit them in the boogies.

And just to substantiate my comment- what, exactly, was "false" about logic programming, one of the major research subjects in GOFAI? It worked just fine back then, it works just fine right now. In very practical, down to earth terms, you can prove a proposition, or a predicate, true or false by automatic means, sure as you can answer "2 + 2 = ?".

So, really- more substance, less assertiveness, would do a world of good to those for whom "AI" means everything that they read online after 2012 and who may end up missing a hell of a lot of the history of the field if they take that sort of "GOFAI failed" statements at face value.

zshrdlu · on July 17, 2018

I've always thought GOFAI was the most progress we made in the field. If some extraterrestrial beings were to come upon earth, and issue the challenge: Show us what you've got

We'd scramble to find the source code of Logic Theorist (Simon, Newell, Shaw) and SHRDLU(Winograd), and we'd hold them aloft, and cry: "Behold! Thinking machines!"

colorint · on July 16, 2018

As a general matter, you can't prove a predicate true or false by automatic means. Logic programming is just as "artful" as imperative or functional programming, because they all run into the same problem: it's impossible to tell the difference between long-running and infinite computation. The question of algorithms for general logic was explicitly addressed as the Entscheidungsproblem, meaning decision problem, which was independently proven undecidable by both Turing and Church:

https://en.wikipedia.org/wiki/Entscheidungsproblem

YeGoblynQueenne · on July 16, 2018

>> As a general matter, you can't prove a predicate true or false by automatic means.

Not in the general case, sure, yet in practice I'm sure we've all written plenty of code that terminates just fine. [Edit: I'm talking about imperative as well as logic or functional programming code].

The question then is- what does it mean when a program terminates? In the case of principled approaches like logic or functional programming, you have a pretty good idea what that means (e.g. a logic program proves a theory true or false). When an imperative program terminates, it's a very hairy affair to say what, exactly, termination means.

[Edit 2: Actually, if you think about it, there's nothing we can really achieve in the general case (including machine learning; see language learning in the limit). In practice, on the other hand, we're doing things, alright - by continuously relaxing principles and fudging limits as necessary (see PAC learning)].

taeric · on July 16, 2018

I confess I was tempted to take a pass on this one. In large, because I've fatigued a lot on reading about AI and Machine Learning. Didn't help that this is a large article.

That said, I encourage everyone to give this more than a single pass. There is irony that we are, to this day, still quoting and agreeing with Feynman's Cargo Cult Science piece. It is almost disheartening to see that we still have a hard time listening to advice from the early 1970s. By and large, though, this piece does a great job really laying out what makes it so hard to level most criticisms at AI related studies. The cross disciplinary look is one I wouldn't have thought to do, but really does explain a lot.

I'm torn, because I'm sympathetic to most of the defenses. It is hard to really believe we are making meaningful progress, though. Even if I am enjoying many of the small practical improvements we have managed to get out of things.

Erlich_Bachman · on July 16, 2018

What would "meaningful progress" entail for you? How would your life change, how would the life on earth in general change? Great achievements in science (aided by AI)?

taeric · on July 16, 2018

That is a different question, though. The question if if there is meaningful progress in ai. The amount of progress explained solely by percentage improvements against a benchmark is pretty high.

I don't think this is particularly damning. Nor do I think it should be halted. However, I agree it is hard to call progress.

Erlich_Bachman · on July 16, 2018

What specifically would be easy for you to call progress?

taeric · on July 16, 2018

An equation that could predict a better ML model. Honestly, I think claiming I want an equation is a touch too much. However, a falsifiable prediction would be nice.

Imagine if the only way we got accurate ballistics was by requiring faster more powerful guns all of the time. "We can hit the target, but only if we upgrade our guns to railguns and limit ourselves to large targets."

andyidsinga · on July 16, 2018

As a person that works in software development and trying to make sense of data - I found the article pretty interesting after the paragraph below. The discussion around science, engineering and adjacencies are good to keep in mind.

> AI researchers often say they are doing engineering. This can sound defensive, when you point out that they aren’t doing science: “Yeah, well, I’m just doing engineering, making this widget work better.” It can also sound derisive, when you suggest that philosophical considerations are relevant: “I’m doing real work, so that airy-fairy stuff is irrelevant. As an engineer, I think metaphysics is b.s.”

joe_the_user · on July 16, 2018

Well, one interesting thing about the article is the way it describes GOFAI as based on false ideas, which is harsh but fair. But in a lot of ways, GOFAI was "applied philosophy", at least something of a direct map of one form of rationalist philosophy to computation.

GOFAI was abandoned but the various perspectives remain and current AI goes forward without necessarily having any larger perspective at all, just "make it work". One might guess we're waiting for a different coherent philosophy to organize the process.

throwawayjava · on July 16, 2018

> GOFAI was abandoned

GOFAI, as a 10ish year era of easy money for CS research agendas, was a relatively cheap investment and ultimately a tremendous financial, economic, and scientific success. If you're using functional languages, sql databases, or networked computers then you're benefitting from GOFAI. ATM you're using all three.

arethuza · on July 16, 2018

GOFAI was around for a lot longer than 10 years - it arguably started at the Dartmouth workshop and was still going strong in the '90s - so at least 40 years.

throwawayjava · on July 16, 2018

but the easy money didn't persist that entire time. Maybe 20 in total, max. more like 10 most places.

point is, most people funded by gofai money weren't doing agi. they were building pls, dbs, networks, etc.

arethuza · on July 16, 2018

That wasn't my experience - few people were really doing AGI (though some were, even up to the early 90s) but most effort in any area that was explicitly identified as AI was generally in applications (e.g. fault diagnosis and qualitative modelling - the areas I worked in).

aisofteng · on July 16, 2018

Any engineer that thinks metaphysics is “b.s.” is uneducated in the schools of thought that made engineering of any sort possible in the first place.

j88439h84 · on July 16, 2018

Can you explain what you mean by this?

andyidsinga · on July 16, 2018

yep -- sadly, I've heard that sentiment over and over.

edit: I should clarify - I've heard that sentiment a lot, but now that I think of it, the "it's BS" part usually comes from a place of frustration due to time constraint.

..which leads to this article about why software is late: https://www.computer.org/cms/Computer.org/ComputingNow/homep...

stared · on July 16, 2018

Please, no. While he provides some food for thought, asks a few important questions (and provokes even more), this text is full of pseudointellectualism. As in: fancy words but utter lack of understanding of the core subjects one writes about. Most importantly: what is science (no, it is not a trivial question; I recommend going Ludwik Fleck's "Genesis and development of a scientific fact" route, http://www.evolocus.com/Textbooks/Fleck1979.pdf) and what is AI (it is a vast field; some of it IS math/CS (as in: proving things), but it is a small part).

For example, many of things he says would fit other practical disciplines, e.g. medicine. Yes, experiments are on a group of people. Yes, criterion that "drug X works better than drug Y, but we don't know why" is sufficient.

> I don’t know data science folks well, but my impression is that they find the inexplicability and unreliability of AI methods frustrating.

Well, it is not the main issue (speaking as a data scientist working with AI). At least he acknowledges his lack of expertise.

> These failures of scientific practice seem as common in AI research now as they were in social psychology a decade ago. From psychology’s experience, we should expect that many supposed AI results are scientifically false.

Also - no, it is not at level of psychology when it comes to the replication crisis. A lot of code (though, unfortunately, not all) is shared online, by the authors or other contributors, and people do replicate it (or there is an absence of replication, which also conveys a message).

Radim · on July 16, 2018

Speaking as both a ML researcher and an applied ML business owner (one who hired stared at one point — hi Piotr :), I respectfully disagree.

The replication crisis in "AI" may not be as bad as psychology (I wouldn't know), but it's not great. Sadly, my brain has somehow learned to equate "SOTA" with "hot-stitched crap, stay away". Too many painful lessons.

On the subject of publishing code: this is useful to the degree that it removes bad faith as the possible reason for the lack of replicability. But otherwise helps little in practical terms. You just have the privilege to sift through the bugs and bad design in more close-up.

"I am afraid you are right. I used to reach ~72% via the given random seed on an old version of pytorch, but now with the new version of pytorch, I wasn't able to reproduce the result. My personal opinion is that the model is neither deep or sophisticated, and usually for such kind of model, tuning hyper parameters will change the results a lot (although I don't think it's worthy to invest time tweaking an unstable model structure)."

= quote [1] for one of the "new SOTA" papers from NLP (WikiQA question answering), where the replication scores came out 62% instead of claimed 72%.

I generally call this the "AI Mummy Effect" — looks great but crumbles to dust on touch.

[1] https://github.com/pcgreat/SeqMatchSeq/issues/1

stared · on July 16, 2018

Hi Radim!

To make it clear, I am not happy with the current state of reproducibility in AI. Yet, it is still better that in all disciplines I interacted with (quantum physics, mathematical psychology). There the standard prectice was to not include any code, even if the paper bases on it.

Vide my answer to "Why are papers without code but with results accepted?" (https://academia.stackexchange.com/questions/23237/why-are-p...).

So, I was so happy to see that in Deep Learninig a lot of code appears on GitHub (I am the most happy if it appears in different frameworks, implemented y different people).

Dirty code provides limited value. It's hard to learn from it, it's hard to re-use it, and its performance may depend on the phase of the Moon (and system setting, software versions, etc). Yet, IMHO, is much better than no code. It is not only about good faith, but about including all details. Some of them may seem unimportant (even to the author), yet crucial for the results.

The next level is resonably well written code, with clear environment setting (e.g. Dockerfile/requirements.txt), and the dataset. Otherwise it is hard to proof it against "on my environment it works":

> where the replication scores came out 62% instead of claimed 72%

jononor · on July 16, 2018

The cited example was attempted reproduced and found wanting. A plausible reason for this was discovered, and is accessible online. This in much better than no replication being performed at all. But yes, we need to continuously up the game. Adopting a standard that ensures model stability would be a good next target, and not accepting papers that don't uphold it.

I'd also like to see open-source runnable code the default for paper acceptance, with some sort of 'punishment' for not having it. Maybe even make it prerequisite of empirical papers.

arvinjoar · on July 16, 2018

FYI: The author of this blog post also wrote "How to do research at the MIT AI Research Lab". While it's fair to suspect he may not be fully up to scratch with how things are done, he should have a pretty good overview

stared · on July 16, 2018

My criticism is towards this paper, not necessarily - the author. Surely, he knows something about AI (otherwise it would be impossible write anything gaining such publicity) and philosophy (AFAIK it is his field).

Though, even if someone is accomplished scientist in a given field, it does not mean they are incapable of making (to put it mildly) questionable statements (Noam Chomsky on data-driven NLP, Judea Pearl on Deep Learning, Roger Penrose on quantum measurement and consciousness; from historical - Albert Einstein on quantum physics).

Yet, there are many errors which won't be noticed by newcomers, but are demonstrably false for researchers and practitioners. It is dangerous as novices may be prone to "appeal to authority" and mistake witty style for knowledge.

Don't take me wrong - I am all for sharing ideas, even half-baked. But I think that it works well better when there isn't artificially boosted confidence.

YeGoblynQueenne · on July 16, 2018

>> (...) Noam Chomsky on data-driven NLP, (...)

If you can excuse the slightly combative tone, data-driven (i.e. statisical) NLP is a big potato and Chomsky was dead on the money: you can model text, with enough examples of text, but you can't model language. Because text is not language.

Which is why we have excellent dependency parsers that are useless outside the Brown corpus (if memory serves; might be the WSJ) and very successful sentiment classifiers for very specific corpora (IMDB), etc, but there is no system that can generate coherent language that makes sense in a given conversational context and even the most advanced models can't model meaning to save their butts. And don't let me get started on machine translation.

Like I say - apologies for the combative tone, but in terms of overpromising, modern, statistical NLP takes the biscuit. A whole field has been persisting with a complete fantasy -that it's possible to learn language from examples of text- for several decades now, oblivious to all the evidence to the contrary. A perfect example of blindly pursuing performance on arbitrary benchmarks, rather than looking for something that really works.

stared · on July 18, 2018

I started with a combative tone, so well - no apologies needed.

Well, still - current translation systems are data-driven, without exception, vide http://norvig.com/chomsky.html.

And LSTMs are awesome at picking grammar, even not one considered English grammar (line braking patterns, proper names, markup for links, etc). Vide http://karpathy.github.io/2015/05/21/rnn-effectiveness/

There are other issues like keeping track of the context, in which they suck (as of now). And right now it is like text-skimming quality, rather than "understanding" of text.

For understanding meaning, it seems that text is not enough, we need embodied cognition. Not necessarily walking robots (though, it might help) but being able to combine various senses. Some concepts are rarely communicated explicitly with words (hence - learning from an arbitrarily large text corpus may not suffice), but we have enough of experience from vision, touch etc.

Since I am mostly into DL for vision (though some interest in cognitive science), I got a lot of insight of the current SOTA (and its limitations) in NLP from http://www.abigailsee.com/2017/08/30/four-deep-learning-tren.... See also:

> while word embeddings capture certain conceptual features such as “is edible”, and “is a tool”, they do not tend to capture perceptual features such as “is chewy” and “is curved” – potentially because the latter are not easily inferred from distributional semantics alone.

Aqua · on July 16, 2018

On the one hand you make this sound extremely bad, while at the same time you describe it as just "making questionable statements".

Also, maybe I misunderstood the analogy, but I think you're being very unfair putting Albert Einstein who was wrong on quantum physics in the same basket as Roger Penrose with his view on consciousness, which may be questionable, but hasn't been disproved.

stared · on July 16, 2018

You are right that I shouldn't have put them in the same basket.

While Penrose's ideas on consciousness are not considered mainstream (neither by cognitive scientists nor quantum physicists), they don't fall in the infertile basket of:

- people gravitating to the state of science they were "raised into"

- people talking about things they are don't mastered

In this case it is a healthy scientific peculiarity. And who knows, it may turn out true. Or false, yet fertile. As ideas of faster-than-light communication with quantum states - which was flawed, yet gave birth to quantum information (more to this story, and an interesting overlap of non-science and science, in http://www.hippiessavedphysics.com/).

laichzeit0 · on July 16, 2018

> “This year, we’re getting Z% correct, whereas last year we could only get (Z-ε)%” does sound like progress. But is it meaningful?

This is one thing that bother's me a lot when I read published work. I have a feeling this is a result of everyone using the same benchmark datasets, so it inevitably becomes more of an _engineering_ exercise rather than scientific progress.

In NLP the difference between a publishable result and one that is not is often to squeeze out a few extra (Z-ε)% by throwing in an attention mechanism and ensembles to your new super duper improved SOTA architecture.

This is the problem of "replicability", which really requires more than just the same benchmark dataset used over and over again. The author seems to touch on this point later on.

Then there's the issue of "reproducibility". Very few researchers seem to publish their code with instructions how to build and re-create their results. What an awful lot of time is wasted trying to reproduce results. Here's a good example: https://groups.google.com/forum/#!topic/word2vec-toolkit/Q49...

joejerryronnie · on July 16, 2018

I am not trained in any type of field that is remotely related to AI research or engineering. Outside of some basic ML projects at work, I am not well versed in the practical application of AI technologies. But I do wonder a few things about AI:

- Have we made real, technological progress over the last 50 years or are we just leveraging far greater computing power and the ability to collect much larger data sets to run statistical analysis on?

- Will general purpose AI consist of essentially layers and layers of AI's that can handle progressively more abstract inputs, models, and patterns? For instance, the lowest level AI is what we see today - a very powerful tool but bound to a specific use case. One layer up may be able to combine inputs from a dozen 1st tier AI's to generalize a tiny bit more on the individual use cases and can deal with a tiny bit more ambiguity. One level up will evaluate inputs from a dozen level 2 AI's and so on. With the final top layer (perhaps millions of levels up) resembling general purpose processing similar to a human brain. What if this model ended up producing true general purpose AI, but the amount of input synthesizing and modeling required so much processing power that the speed at which general purpose AI could operate is no faster than a human brain?

- Can we achieve general purpose AI through purely algorithmic means, or will we need to implement a hybrid biological model to achieve real breakthroughs? If we could accomplish this, would we understand the detailed mechanisms of the biological component of the hybrid or would it forever remain a black box that we just tap into?

Anyway, not sure this adds a whole lot to the specific discussion on how best to measure AI progress, but they're questions I've been pondering lately.

pnloyd · on July 16, 2018

Ya that's kind of an interesting question. With Moore's law starting to approach it's physical limitations it would seem AGI wouldn't be feasible with current algorithms.

Not to mentions as your describing.. those "millions" of layers of narrow AI's sounds like impossible amount of work to do..

I don't think very many machine learning experts really believe that those techniques will lead to AGI.

taeric · on July 16, 2018

I'm curious just how true it is that Moore's law is starting to approach physical limitations. I just recently listened to some of Feynman's speeches collected in [1]. One of them was about how to place all of the works of an encyclopedia onto the tip of a pin.

Did it cover anything we couldn't do today? No. But that was the point. Just using a somewhat naive view of the physical matter that you would be putting something on, it was possible to go quite dense. Imagine if we started going even denser.

Do I suspect we are approaching limits? Certainly. Question is more of just how much further we can go. And will we need a dramatic shift of any sort before we could realize some extra distance?

[1] https://www.audible.com/pd/Science-Technology/The-Pleasure-o...

ghaff · on July 16, 2018

>I'm curious just how true it is that Moore's law is starting to approach physical limitations.

Well, Moore's "Law" in the narrow sense has been, to a large degree, about CMOS process scaling and that's clearly running into physical limits.

There are other levers to get better economical performance--some of which come at the cost of extra work in software. For certain workloads, GPUs and TPUs have been an important work around. There almost certainly are further optimizations involving stacking and interconnects. Probably other application-tailored designs (which then have to have software tailored for them individually).

But CMOS scaling has been such a powerful lever that there's legitimate concern that it may not be possible to replicate that kind of advance using other techniques.

taeric · on July 16, 2018

Fair. In the original context, it is clear they were on CMOS, and yes we do seem to be nearing these limits quite quickly.

I'm curious if/when we could/should move off of current CMOS techniques.

Erlich_Bachman · on July 16, 2018

> Will general purpose AI consist of essentially layers and layers of AI's that can handle progressively more abstract inputs, models, and patterns?

This will very likely be the case. It has already sort of been the case in the AI that we have already built. The best understanding we have about the real brain - is something similar to that case.

Notably, we also want it to be the case, we will try to make it work first in that way, before any other way, because that's the only way for us to be able to understand it. We would rather have a machine that we can still understand at least on some levels, instead of one which is just a black box. So of course we will want to split it up into several levels which will be easier to understand, at least individually.

> Can we achieve general purpose AI through purely algorithmic means

No one knows. The only way to know is to actually build something, then progress slowly by iterating and doing more experimentation, trying to implement new ideas etc. Focus the attention on the most lucrative ideas. This is what the AI community is doing right now.

> What if this model ended up producing true general purpose AI, but the amount of input synthesizing and modeling required so much processing power that the speed at which general purpose AI could operate is no faster than a human brain?

I don't think it is even a common idea that it is a specific goal of general AI research to make it "faster" than a human brain. Faster for what? Thinking faster? Being smarter - sure? Being more agile in terms of not requiring sleep, or being ever-vigilant while things like driving for exapmle, sure. But faster? What would that even mean in practice? We already have machines that are faster than humans at physical tasks, they don't require GAI.

Producing a brain which works at the same speed as a human would still be a tremendous achievement and would transform the life on earth as we know it. Even if it would be 10 times slower than a human, it would be an achievement.

Even if this is a slow AI, we could just copy it multiple times and make them work together. Like people are working together now. If you only have 10 people in the world sufficiently knowledgable in a field, they can only do so much work and progress. If you could copy each one of them 1000 times, those copies could pursue different paths simultaneously and the progress would be much faster. Or why not copy a perfect store clerk so that no people would need to do that anymore?

Ability to backup and restore certain snapshots of a brain would be amazing.

And of course, if there is a physical way to speed up a brain - it would certainly be much easier once you have a working brain, than right now would proper understanding of how it even works.

joejerryronnie · on July 16, 2018

>I don't think it is even a common idea that it is a specific goal of general AI research to make it "faster" than a human brain. Faster for what? Thinking faster? Being smarter - sure? Being more agile in terms of not requiring sleep, or being ever-vigilant while things like driving for exapmle, sure. But faster? What would that even mean in practice? We already have machines that are faster than humans at physical tasks, they don't require GAI.

I think a large concern regarding AI is the notion that machines will develop the ability to process information, make decisions, and evolve their own capabilities at speeds exponentially faster than humans can comprehend, much less react to - leading to some horrible dystopia or extinction level event. Futurists have projected that by the end of this century, all major/strategic business decisions will be made solely by AI entities. I can see the distribution of multiple, slower AI's being more powerful than a single faster AI, but only if the multitude worked together toward a common goal rather than at odds with each other.

>Even if this is a slow AI, we could just copy it multiple times and make them work together. Like people are working together now. If you only have 10 people in the world sufficiently knowledgable in a field, they can only do so much work and progress. If you could copy each one of them 1000 times, those copies could pursue different paths simultaneously and the progress would be much faster. Or why not copy a perfect store clerk so that no people would need to do that anymore?

Is it a foregone conclusion that true general purpose AI will ultimately result in sentience? If yes, once sentience is attained, would the AI's "agree" to do anything we asked of them? Would sentient AI's work as a hive mind or would they be as fractious as current human politics (I would hate to meet the first "Trumpian" AI in a dark alley)?

mlthoughts2018 · on July 16, 2018

> “It’s not scientific progress unless you understand where the improvement is coming from.”

I don’t agree with this. If you can chronicle improvement, that is progress. Giving a satisfying linguistic description of that improvement, when possible, might be more progress, but merely documenting it is extremely important scientific progress in its own right.

Overall this essay was extremely hard to read and should cut down about 75% of the content. The whole wolpertinger thing is nothing but a distraction. Just say AI is a mixture of disciplines and serves a mixture of outcomes. It only takes away from the essay to act like you’re being literary or nuanced with the wolpertinger thing when all it does is subtract from the arguments.

And to boot, after so many words, the final advice is extremely hollow... literally just saying,

> “And so we should try to do better along lots of axes.”

How should we improve? I guess by “doing better” on “multiple axes.”

The section on “antidotes” is hardly better, saying:

> “I will suggest two antidotes. The first is the design practice of maintaining continuous contact with the concrete, nebulous real-world problem. Retreating into abstract problem-solving is tidier but usually doesn’t work well.“

Except this is already what basically everyone tries to do. Research labs try to maintain direct contact with state of the art benchmark tasks on a wide variety of data sets. And often they work extremely hard to produce results robust across several tasks and several data sets.

And in various other fractured or specific cases, the researchers are very clear up-front they are solving one particular, ultraspecific problem in the scope of the paper.

(Unfortunately the second antidote is more “wolpertinger”... ugh.)

amasad · on July 16, 2018

> And to boot, after so many words, the final advice is extremely hollow... literally just saying, > > “And so we should try to do better along lots of axes.” > How should we improve? I guess by “doing better” on “multiple axes.”

That's not what the final advice is, the author is suggesting the use of "meta-rationality":

> "AI is a wolpertinger: not a coherent, unified technical discipline, but a peculiar hybrid of fields with diverse ways of seeing, diverse criteria for progress, and diverse rational and non-rational methods. Characteristically, meta-rationality evaluates, selects, combines, modifies, discovers, creates, and monitors multiple frameworks."

Although not expanded on in this essay, it seems like the whole blog is dedicated to the topic.

mlthoughts2018 · on July 16, 2018

> "That's not what the final advice is, the author is suggesting the use of "meta-rationality""

I think you mis-read that section of the essay, because the whole conclusion of the meta-rationality section was the quote that I already gave in my comment, “And so we should try to do better along lots of axes.”

Literally, that is the sum-up of advice in the lone section of the essay that possibly has any call to action or advice. It gives a fairly quick and superficial overview of meta-rationality (which is OK), but does not say anything at all about putting it into practice except for "doing better" on "multiple axes" (literally, this is all it says).

So when you say the "final advice" is meta-rationality -- that's already what I was talking about. That's exactly the part where the essay fails to give any type of actionable payoff at all.

aisofteng · on July 16, 2018

“The whole wolpertinger thing” is a metaphor and a literary device. This isn’t a technical manual.

I don’t know why you found this hard to read. The writing is clear and understandable. The dismissiveness of your comment and the fact that its out-of-handedness is based on nothing objective suggests that you’re not the target audience, especially since, in contrast to your comment, the thoughts in this article are researched, decently sophisticated, and exposed in a discursive manner.

mlthoughts2018 · on July 16, 2018

I found the article to be meandering, unclear, messy, and not based on an active appraisal of the way progress in AI work already is judged. I don’t know why you claim my comment has “dismissiveness” — it does not. Pointing out that the failures of the writing or arguments makes it hard to read and lacking in any useful conclusion or call to action is not dismissive at all. On the contrary, I gave up a lot of time to interact with the essay by reading it and reflecting on it. It’s just not a good essay.

AndrewKemendo · on July 16, 2018

not based on an active appraisal of the way progress in AI work already is judged.

I don't think that's accurate at all. From the article:

"Adjacent to engineering is the development of new technical methods. This is what most AI people most enjoy. It’s particularly satisfying when you can show that your new system architecture does Z% better than the competition."

The ImageNet competition results from 2012 was the major turning point that exploded AI research, specifically in that computer vision was able to beat human level classification. Similarly for Chess previously and more recently Go.

Goodfellow's work with GANs and Pearl's work in Baysian Causality are the only major exceptions I see right now that are not based on the competitive improvement around a baseline. No other major scientific field approaches it this way.

mlthoughts2018 · on July 16, 2018

I disagree very strongly. Many fields over long periods of the history of science have oriented themselves around benchmark problems.

Some things which come to mind are:

- C. elegans for connectomics

- Drosophila experiments for a wide range of biology benchmarks

- even previously in computer vision there was the so-called "chair challenge" [0], and dozens and dozens of canonical face detection, object detection, and segmentation data sets used frequently as benchmarks across many papers

- in Bayesian statistics there are various canonical data sets for evaluating theoretical improvements in hierarchical models and general regression

- in finance there is CRSP and the Kenneth French Data Library

It's very common across many fields to orient around benchmark problems and data sets, and it has been for a really long time. This is not at all new with ImageNet, not even just in the tiny world of computer vision.

[0]: < http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.226... >

ordu · on July 16, 2018

> If you can chronicle improvement, that is progress.

Yes, but not a scientific one.

mlthoughts2018 · on July 16, 2018

No, it is scientific progress, specifically. Science is about reproducability, not explainability (which, in many cases, is not even possible).

taeric · on July 16, 2018

Amusingly, then, many cases are still not science. Not uncommon for attempts at reproducing to fall short of what was originally claimed. :)

mlthoughts2018 · on July 16, 2018

That makes no sense. Attempting reproducibility and discovering that it cannot be obtained is also science. It's like you are trying to say science can only be defined by positive results, not negative results.

Science can be defined by positive or negative results... it's just that the single thing science "is about" (whether a positive instance or negative instance) is reproducibility. If you can reliably reproduce a behavior that you cannot explain, that's hugely scientific. If you can show reproducibility wasn't achieved in some certain conditions, that's also hugely scientific. Reproducibility is the thing.

taeric · on July 16, 2018

That is science, yes. But if you are not attaining reproducible results, it is hard to argue you are making progress.

I agree that not all experiments should be in the positive or negative category. Ideally a solid mix. However, the spin of this and other stories is that pretty much no studies are seeing healthy reproduction. And nobody can really say why. Outside of p hacking and such. Which is fairly universally agreed as not progress.

mlthoughts2018 · on July 16, 2018

I agree totally with what you say here, but circling back to the original quote from the essay, the author tries to say that any type of result which you cannot explain is not scientific progress, and that is what I disagreed with originally. Often we can get reproducible results about repeatable behavior or phenomena that "just work" without a solid, low-level, reductionist explanation about why, and in those cases, it's totally OK and still counts as valid progress. Similarly, when we try to do reproducibility studies and we cannot replicate a result, that is progress in the sense of ruling out a result, or shifting the burden of evidence back on the original researcher and casting proper doubt on something. Not usually as exciting or effectful as positive results, but progress nonetheless.

I totally agree we live in a world where publication incentives create perverse anti-science problems, with file drawer bias, p-hacking, falsifying data, etc.

I'm just saying that in the essay, the author seems to go waaay too far in claiming that it can only be scientific if you can tack on some type of "explanation" (which, we could even debate what that means and how you could know if you have the 'right' or 'complete' explanation).

taeric · on July 17, 2018

I think you're raising a fair point. I was won over by the distinction of scientific versus engineering progress. The idea being that scientific progress had to have added something to our scientific understanding. An example I used in a sibling post was how you don't necessarily learn more about ballistics if the only way you could hit targets was a more powerful gun.

That said, I have to grant that is probably too reductionist. I think I like the idea, as it is just trying to be specific with types of progress, but I don't know if that is an accepted standard, or just one being proposed.

sgt101 · on July 16, 2018

If the subject is seen as a system rather than a thing you can make some progress. On the one hand we have Artificial Intelligence, where insights from cognition and biology are used to model and explain reasoning and behaviour. On the other we have AI where people use the outcome of Artificial Intelligence research with other pragmatically selected technical components to develop technology. I think that there is interdependence and exchange between the two, but they have different methodologies and processes. I also think that huge trouble is created when members of one community use the other communities clothes and achievements. For example Artificial Intelligence researchers talking about the practical impact of AI and near term application of their work and conversely AI researchers claiming that their work "works like a brain".

ggm · on July 16, 2018

Most (human) languages have a set of rules and a variant set of variances ("i before e except after c, and this catalog of things which we pretend don't exist")

so ML and NLP are cases of things which are ameanable to rules based systems, and because english corpus exists widely can be tested openly, against each other and a common norm of comprehension (for english speakers)

generalized AI does not lie here: systems which uncover the grammer rules and exceptions do not generalize to systems which uncover rules in Law, or Equity, or financial trading, or other things. Yes, you can train nets. But the commonality here, is you can train, not that emergent AI is found.

(not an AI person, strongly anti-AI perspective from life experience in compsci)

fny · on July 16, 2018

Is there such a thing as AGI really though instead of collections of modules that are trained and may have no meaningful mode of interaction? What does NLP have to do with the mathematics of financial trading?

AGI has always felt to me as a large scale interdisciplinary/intercomputational activity, much in the same way that most human intelligence derived from years of intergenerational and interpersonal intellectual development.

AGI will never be one but many: many intelligences and systems interacting to produce something of utlitarian value.

ggm · on July 16, 2018

much in the same way that most human intelligence derived from years of intergenerational and interpersonal intellectual development.

wow! I had no idea the archeology on early hominids was that good... (yes, implicit /s)

taeric · on July 16, 2018

This treats human language as a somewhat static thing that can be fully specified. Which, interestingly, avoids a large portion of human languages in poetry and metaphor. A bit like judging story telling by only accounting true stories, thus completely ignoring all of fiction. Saying nothing of false stories purported as true.

It is quite possible that understanding is nothing more than a statistical relating of otherwise unrelated things. Some things, we accept as being 100% related. But, quite a lot of things are just "highly related." Where "highly" is probably not specified.

toolslive · on July 16, 2018

For typical problems (playing chess, tissue segmentation, translation, ...) you have the quality of the solution (sometimes difficult to measure) versus the cost of achieving it (energy/enthropy and maybe time). The cost is important.

jmickey · on July 16, 2018

Relevant - AI Progress Measurement page by the EFF: https://www.eff.org/ai/metrics

tim333 · on July 16, 2018

This seems to an article written by a philosophical type who doesn't really understand contemporary AI from a technical point of view. I'm not sure it's terribly useful.

rrherr · on July 16, 2018

About the author:

“I did a PhD in artificial intelligence at MIT. My undergraduate degree was in math. I’ve also studied cognitive science, biochemistry, Old English and Ancient Greek literature. None of that qualifies me to write Meaningness, but it may explain a certain STEM-ish orientation, decorated with occasional literary jokes. ...

I have founded, managed, grown, and sold a successful biotech informatics company. That may explain a certain practical orientation, and lack of interest in philosophical theories that depend on the world being very unlike the way it appears.”

https://meaningness.com/about-my-sites

visarga · on July 16, 2018

I expected better insights from a person who has a PhD in AI and spans multiple other fields. I think there is a gap between philosophy and AI, and philosophy has a lot of catching up to do. I'm especially looking at the "hard problem" and "qualia". In the meantime AI people implemented a lot of the activities of the mind and philosophers are still stuck on these toy concepts that don't go so far. Instead, they should focus on the implications of agent-environment learning - reinforcement learning and evolutionary strategies - as the true sources of human intelligence.

tim333 · on July 16, 2018

Ah maybe not then.

nyolfen · on July 16, 2018

the site this blog is hosted on is really great for explaining a lot of aspects of postmodernism to engineering-oriented or analytical thinkers. i got totally sucked in by it over my christmas vacation two years ago and read the whole thing.

montenegrohugo · on July 16, 2018

I loved this

Sorry tim! :P