To me, this entire piece reads like goal-post moving and lacks understanding of the field and was incredibly premature.
> I am confused as to why they continue to spend so much time on building Deep RL systems that beat games with clearly defined rules and point systems. I think everyone gets it now, with enough time, money, and computers almost brute forcing every single possible action, Deep RL can beat almost any game.
Right, before AlphaGo, superhuman Go was decades away. Then it became obvious and therefore easy. He brings up a laundry folding robot almost as if discrete games can be trivially applied to robotics. If you think about it for one second, Shogi, the game that AlphaZero solves with the largest action space has 10^5 actions. Consider a robot with 3 joints that can move in any dimension motors. Discretizing the continuous action space into 10 buckets yields an action space of 10^9. Something AlphaZero will never solve. In that sense, the entire analogy is a straw man and everyone in the field knows. This doesn't even touch the fact that 10 buckets would be useless for fine motor control.
Since they published this, there have been great advancements in learning action embeddings for robotic tasks. With a couple of expert examples, an algorithm can learn to open a door effectively. That's not that far from folding laundry.
Personally, my entire PhD thesis attempts to solves the exact deficiencies they raise. Simply put, the work goes on. Instead of saying that the field is useless, try to understand and contribute to it. At the very least, give technical reasons that it is a dead end path.
Again, this is goal-post moving. Just because we are not there yet, it does not mean it will not happen. We just solved it for one door, now we need to generalize.
Why do you have to start your comment by accusing me of moving the goal posts? I don't even like ball games. Why is it so hard to have a decent conversation on the internet?
It's a serious concern and I stand by it. It's perfectly possible -it's even easy- to make algorithms, systems, programs, what have you, that can solve one problem, perform one task, and do nothing else. Hell, you can train a dog to perform one task. Sit boy! Boy sits. And that's all it knows to do. 80% of AI research output, ever since Dartmouth, has been like that. 99% of AI research output in the last ten years is like that. My entire phylum of AI research (I'm a PhD student studying Inductive Logic Programming and even knowing nothing about that, the name should give you an idea) is roundly dismissed today as a "failure" because it couldn't get from "we can solve an instance of this problem" to "we can solve any instance of this problem" quickly enough. Researchers promised it was just a matter of "generalising" like you say, tuning and tuning until everything worked in the real world as well as it worked in blocks worlds, but the sources of funding didn't listen and then we had a winter [1].
Why is what you're doing any different when you look at it from the present? We haven't seen the future yet. In the present, you have a robot that can open one door. In the future, you have a lot of doors that wait for robots to open them.
Actually, looking at your paper, you don't even have a robot that can open one door. You have a simulated robot opening a simulated door. Talk about moving the goal posts!
Have you seen the Pythons' silly walk sketch? "It's not particularly silly, is it?" "Yes, I think with government backing I can make it very silly" [2]. That's all you got there so far.
Edit: Sorry, it's probably not your paper. But it's as far from folding laundry as you can go. I don't even see the simulated robot folding simulated laundry anywhere yet and even "simulated cloth" is a joke - the simulation itself (of the behaviour of cloth) is not even close to the real world. Simulating a robot opening a simulated door might get your paper published, but simulating a robot folding simulated shirts won't even get you a simulated paper.
I am just a lay man interested in the field, but like it sort of makes sense that you can have logical rules and inference, like I mean the whole point of AI as usually conceptualized was that they understood logic, like insert your fav Dr Who/Star Trek joke here, but briefly speaking what happened? I get the charm of ML/DL but it's essentially like data fitting or something. Wouldn't you expect logic in AI? What went wrong and what is the state of the art now?
Artificial Intelligence, as a field of research, was basically created by a man
named John McCarthy, who came up with the name and convened the first academic
workshop on the subject (in Dartmouth, in 1956). John McCarthy (who is also
known as the creator of Lisp) was the student of Alonzo Church, who gave his
name to the Church-Turing Thesis and is remembered for his description of the
Lambda calculus, a model of computation equivalent to universal Turing machines.
These are all logicians, btw, mathematicians steeped in the work of the founders
of the field of mathematical logic as it was developed mainly in the 1920's by
Frege, Hillbert, Gödel, Russel and Whitehead, and others (of course there was
much work done before the 1920's, e.g. by Boole, Leibnitz... Aristotle... but it
only really got properly systematised in the 1920's).
So it makes sense that the field of AI was closely tied with logic, for a long
time: because its founder was a logician.
There were different strains of research in subjects and with goals adjacent to
and overlapping with AI, as John McCarthy laid them down. For example,
statistical pattern recognition or cybernetics (the latter intuition I owe to a
comment by Carl Hewitt here, on HN). And of course we should not forget that the
first "artificial neuron", described by Pitts and McCulloch in 1938 (long
before everything else of the sort) was a propositional logic circuit, because
at the time the models of human cognition were based on the propositional
calculus.
The problem with McCarthy's programme, of logic-based AI, is that logic is full
of incompleteness and NP-hardness results, and the more expressive power you
demand from a logic language, the more of those endless pits of undecidability
and despair you fall down. So work in the field advanced veerrryyy slooowwllyyy
and of course much of it was the usual kind of incremental silly-walk stuff I
point out in the comment above. To make matters worse, while significant
academic results drip-dripped slowly and real-world applications were few and
far between, you had prominent researchers overpromising and launching
themselves into wild flights of fancy with abandon based on very early actual
results (certainly not McCarthy! and I don't want to point fingers but cough
Marvin Minsky cough).
Then this wretched dog by the name of Sir James Lighthill :spits: was
commissioned by the British Science Research Council, the holders of the Spigot
of the Tap of Funding, to write a report on the progress of AI reserach and the
report was full of misunderstandings and misrepresentations and the Spigot was
turned off and logic-based AI research died. Not just logic-based AI-
Lighthill's :spits: report is the reason the UK doesn't have a robotics sector
to speak of today. Then, while all this was happening in the US and Europe, the
Japanese caught the bug of logic-based AI and they decided to spend a bunch of
money to corner the market for computers like they had with electrical devices
and automobiles, and they instituted something called the Fifth Generation
Computer project (special-purpose computer architectures to run logic
programming languages). That flopped and it took with it one of the few
branches of classical AI that had actually delivered, automated theorem proving.
The first link I posed in the comment above is to a televised debate between
Lighthill on the one side and on the other side John McCarthy, Donald Michie
(dean of AI in the UK) and er, some other guy from cognitive science in the US,
I always forget his name and I'm probably doing him a disservice. You may want
to watch that if you are curious about what happened. Pencil pushers without an
inkling of understanding killed logic-based AI research is what happened. That
was a bit like the comet that killed the dinosaurs and gave the mammals a
chance. Meaning those adjacent to AI research directions, like probabilistic
reasoning, connectionism and pattern recognition found an opening and they took
their chance and dominated research since then. I am well aware of the
connotations of my own metaphor. What can I say? Dinosaurs are cool. Ask any
five-year old to explain why.
Thank you for the detailed reply. Sadly killing off Logic based AI isn't the only crime we can attribute to pencil pushers. But I had some intuition I wanted to ask about.
Can a sort of vaguely type theory based - and I mean like in the sense of programming be used to reason about law and the like?
Like suppose some article says a man should pay 20% income tax but it says in some other act of some other law that a man shall pay 30% income tax. Like obviously I'm just giving an example but I mean like long cryptic legalese busting. Can we detect contradictions or show that the law or agreement esp like say something like TPP or whatever is inconsistent, by defining types and transactions?
I'm sorry I couldn't word it better. But I hope you get a gist of what I'm saying.
I haven't really looked into that kind of thing. There's work I'm aware of in legal argumentation with logic programming and First Order Logic in general, for example:
It is relatively easy to have a decent conversation on the internet. But when I state "a door" and you immediately ask about "any door", that is clearly moving the objective in the face of a the current state of affairs. If that is an accusation, then I am guilty.
I make and made no claims about RL being the end all be all. In fact, I am quite modest and would say that any researcher claiming their field is the path forward is incredibly egotistical and unlikely. However, to me, there is still progress being made and so there is no good reason to stop. Like everything, there is plenty of value of exploring until we understand completely why it is a bad path.
I think you may have missed the point of my post where I state that the entire field knows that RL is very far from something like folding laundry. Again, techniques are still being developed. But to state that "simulated" environment are a cop out is very silly since AlphaZero was trained in a "simulated" environment. Let alone the many algorithms being trained in simulated virtual environment for self-driving cars.
Where did I 'state that "simulated" environment are a cop out'? Can you see what I'm complaining about? How can we have a productive conversation when I don't even know what you'll be responding to, when I make a comment?
You think there's any comparison between "simulating" a chessboard and simulating a road, or a door?
I think you are underplaying the difficulties of training robots with reinforcement learning and overplaying the utility of training robots in simulated environments to perform tasks in the real world. The latter in particular just doesn't work well.
Reinforcement learning is promising and its current marriage with deep learning is interesting, but no field of research goes anywhere without accepting its limitations and working to overcome them. I understand what you say in your original comment that there's work being done to address limitations, but the paper you linked is not that work. It's just business as usual - the kind of work that gets published, but doesn't change anything.
I will address your points here, but you should really review your own comments for the bad faith discussion you accuse me of doing.
> Sorry, it's probably not your paper. But it's as far from folding laundry as you can go. I don't even see the simulated robot folding simulated laundry anywhere yet and even "simulated cloth" is a joke - the simulation itself (of the behaviour of cloth) is not even close to the real world. Simulating a robot opening a simulated door might get your paper published, but simulating a robot folding simulated shirts won't even get you a simulated paper
You are right that you never said "cop out" but I am just summarizing your words. Your complaint is stating that RL doesn't tackle training in the real world. In some way, RL is "neglecting the responsibility of training a real physical robot". That is the definition of "cop out". To which I have already openly admitted that the responsibility of training a physical robot, while the long term vision, is not the intermediate goal at the moment.
No, there is no comparison between a simulating chessboard and the real world physics. All I am stating is that one just needs the simulation to be realistic enough. The problem is not the fact that it is simulated. If your complaint is that the physics engines are not good enough to train RL agents, that is completely orthogonal to the field of RL. Work is being done by these folks as well. Again, it also ignores every self-driving company who do use simulated virtual environment to train/test their algorithms.
Again, I am not overplaying or underplaying anything. I have yet to make any claim other than "RL people know we are very far from anything robotics related". I don't know how that can possibly be construed as overplaying.
If you feel that the LASER paper accomplished nothing, then we just fundamentally disagree. They did something that no one had ever done before in the field of RL. Is that not the definition of original research?
But you're misrepresenting my comments again! I never said that training in
simulation is a "cop out" because I don't think it's a cop out!
I agree that simulation results can be real resuls. I concede that the LASER
paper made an original contribution to knowledge. My complain is that you
overplay the result in the LASER paper when you say that "with a couple of expert
examples and algorithm can learn to open a door effectively" but ommit to
mention that this happened in simulation.
There's more to say here that I'd like to put into words at some point, but I
don't think I'm in the right mood right now. I apologise if you were hoping for
some more insightful remarks from me. Thanks for the conversation - I appreciate
the time you took to give your point of view and share your expertise.
The GP specified a robot with three joints in his illustrative example. Each joint has three axes (physical dimensions) along which it can move. Each movement has an arbitrary scale of movement of 0-10, since the GP discretized continuous space into 10 buckets arbitrarily for sake of illustration. So 10 choices for each of 3 joints for each of 3 dimensions. That comes out to 10 choices for Joint 1 Dimension 1 x 10 choices for Joint 1 Dimension 2 x … 10 choices for Joint 3 Dimension 3 which comes out to 10^9.
You have 3 joints with 3 dimensions. So 9 variables you can control. If each variable has 10 choices, and you need to make a choice for each, that is 10^9 possible combinations.
"3 joints" -> What are the 3 dimensions that joints move along? I naively would have thought one or two (ie. how much spin on one axis and how much spin on the other axis) just by naively looking at my elbow for a second.
For an arm of fixed length, polar coordinates have 2 dof.
Yes, there are multiple types of joints. In this case, you could think of a ball joint like a shoulder or hip. They move "up-down" and "left-right" but also rotate.
I think the problem is exaggerated. Even with three ball joints, the action space is not that large since there are constraints on the velocity of the joints. They have to move gradually. So the actual action space is a lot smaller. A lot RL problem has similar continuity constraints, cuz in real world we are dealing with time series signals. I am not a expert in the RL domain yet, so I open myself to any opinion.
Makes me wonder if anyone has looked into using genetic algorithms combined with RL where the genetics determine the reward function.
This seems to be how humans have evolved. Ultimately, all living animals are here based on only one reward function, the ability to have had an uninterrupted chain of reproduction. Our nervous system provides stimuli and our brains chemicals provide positive or negative rewards (pain or pleasure) that optimize us taking actions that result in having an uninterrupted chain of reproduction (it's why sex feels good and putting your hand on a stove feels bad).
Presumably, both the reward function within our brain, as well as the signal it interprets (nervous system) evolved to find a more optimal combination of inputs and reward scalars for each input to maximize for this singular goal (reproduction).
Maybe we need to frame RL goals in much more simple terms, and allow genetic algorithms to evolve their own inputs and reward functions on their own.
RL is one of my weakest fields of knowledge in the AI field, so I'm sure some of this has been tried before, I'm curious how much and what the results have been.
>genetic algorithms combined with RL where the genetics determine the reward function.
I have been working on this problem for years (2+ as researcher, 2 as PhD student).
The main issue is that evolution is both massively parallel and had plenty of runtime to get to human level intelligence.
The person that pushes this evolution/evolved reward point is Andrew G. Barto and his students/collaborators over the years.
Satinder Singh in particular is actively working on gradient based algorithms to find rewards (e.g. https://arxiv.org/abs/2102.06741)
> Maybe we need to frame RL goals in much more simple terms, and allow genetic algorithms to evolve their own inputs and reward functions on their own.
I was checking HN while the current iteration of this (gradient based, genetic was my master thesis) algorithm, the main complexity is figuring out:
1) What are the sub-goal e.g. grasping things
2) How to solve those goals e.g. motor control
3) How to do something useful, e.g. surviving
Balancing those three processes is the current hurdle.
Also, evolution isn’t trying to get to human level intelligence. It’s just one out of millions of adaptations that work, it’s recent, and it’s rare. Change Earths parameters a little over the past several million years, and maybe we don’t evolve.
> The main issue is that evolution is both massively parallel and had plenty of runtime to get to human level intelligence.
How many entities are we talking about for substantial evolution? I know that there have been 100 billion "humans" (not that it's so clear-cut) alive, so guessing this is on the order of ~trillions of entities to simulate some evolution for (but maybe I'm really underestimating the early tail of tons and tons of microorganisms and small short-lived life that got us to this point).
Is the bandwidth of evolution that much larger than what we could possible simulate with computation, especially for a much simpler world/task than "generally survive"?
The purpose of any scientific field is to generate knowledge, i.e. to actually understand the conceptual underpinnings of something like intelligence.
This idea that all that's necessarily for engineering intelligence is throw some chemicals into a bucket and turn the heat on is bad. By that logic you can just write a universe simulator, wait a million years and maybe you solve AI as a side challenge. if AI is just evolution and genetics and genetics is just physics just solve that and we're good to go.
It's like if someone tried to build a bridge and he just clobbers things together until it stands up and then prays that it doesn't fall down. That's not how engineering works obviously, but that's the attitude we have towards AI.
What AI needs at this point is the very opposite. An actual theory of intelligence at a high level because we haven't really made progress on that front in decades.
The problem with this view is that the outside world has such an immensely vast amount of data to it that the problem becomes uncomputable.
It took ~3 billion years of real evolution to reach Humans. This evolution occurred on a planet scale, including naturally formed barriers which rose and fell, changes to climatic conditions, and even a few stellar events to shake up evolution. There isn't even a compelling reason to think that Humans couldn't have arisen anytime within the last ~300 million years. Implying that the probabilities of intelligent life emerging are low, or that the conditions are poorly understood and rare.
Effectively each attempt to learn an agent which has to interact with the real world runs into these problems. The solution is to make the reward more complex and the simulated environment more realistic - both actions which increase the computational costs of the problem faster than the improvements arrive.
Well the failure of multilayer perceptrons to converge into useful models was because of limited compute scale, which eventually was solved with the advent of powerful GPU's and CUDA popularising using their parallel computationa meant for graphics rendering for the linear operations used during back propogation.
Maybe the problem isn't that the RL algorithm is wrong, but that it just doesn't work without a 3 billion year, atom resolution, planet scale computer.
> and allow genetic algorithms to evolve their own inputs and reward functions on their own
I've been playing with genetic algorithms for years now as a hobby and this type approach was a dead end for me, the GA entities would just "game the system" as it were and would min/max in surprising ways.
My latest genetic algorithm creation https://littlefish.fish has performed far better at pattern recognition than I expected. I really think they've got massive potential.
I think the stove and sex examples are on the right track but these qualities are also what every animal experiences. Well... judging by the face of a dog when he's humping your leg, I'm sure it feels good for him.
Anyways, I think there's another ingredient that's missing that we humans uniquely have. I think that ingredient is the fear of death. The knowledge that of all our intellect and powers as a human, we will inevitably die. Its better summed up by terror management theory I believe.
Dogs have achieved an unbroken chain of survival dating back as far as your ancestors have, so they've achieved the same survival goals as you.
They've managed to do so without our intellectual abilities, which goes to prove that our goals of making RL algorithms become "smart" is malformed since high level complex and abstract reasoning skills apparently aren't a necessarily prerequisite to survival, at least in our earth environment.
One benefit with genetic algorithms is the fact that it can handle multiple objectives, like the NSGA-II algorithm. I used it to evolve a neural net in my master thesis.
I find these kind of articles just perplexing. Research is incremental, tiny steps pushing the boundaries of knowledge. DeepMind has done things that were thought to be decades away using deep reinforcement learning. These research advancements may or may not end up being important for AGI in the future, but that's just what research is.
It's refuting the premise that supervised RL becomes less supervised because you put the feedback in a handcrafted function and use a neural network. Deep RL in its current state should be grouped with Supervised RL, in other words (which is why I personally think that imitation learning is a great way forward, in contrast with the author). The issue is the amount of interactive tweaking and lack of a natural reward function that prevents DeepRL from being unsupervised.
AlphaZero is not supervised, in the sense that it learned from known correct actions (earlier versions of AlphGo did learn from online games). So although it needed human supervision, sure, it didn’t need us to provide correct answers.
The authors point does somewhat stand that you don’t have the problem of reward engineering in board games so they are a dead end from that point of view - they skirt around the core problem instead of tackling it.
AlphaZero only works on video games. If you remove its ability to judge progress by game score, which is a reward function (though not the one used, due to delayed reward issues), then it's not capable of finding its feedback. It only works in constructed environments where the environment provides the reward function implicitly. Maybe we can video-game-ify laundry folding sufficiently? I'm doubtful.
Nothing perplexing about it, there have been a multitude of grandiose promises by the AI area and I think people are just getting tired of it so they might expect more radical results at the current point of time.
Research is indeed incremental and revolutions only happen after critical mass has been accumulated in one or more areas, leading to a breakthrough that wasn't possible before. Sure.
And since that's true, let's just temper the expectations of the wider public. Investors and governments might need the grandiose claims in order for the area to receive money but everybody else needs a balanced and objective take on the question "Where is the area right now?"
If it can't fold laundry, or cook by physically picking stuff up from the fridge, well, let's just say it out loud and be done with it. That way nobody will be perplexed. :P
I don't agree with this article but it is not perplexing at all. Dead ends exist. The universe is highly, highly limited and everything eventually has a dead end. The question is, are we there yet?
For certain things yes, for other things no. But to assume there is never a dead end and that everything can be overcome through incremental development and research is patently a false assumption. There are many examples of dead ends within research and development.
Thus in short his proposal is likely wrong, but it is not a perplexing proposal. Nor is his proposal guaranteed to be wrong and there is a possibility he may be right. For example Elon predicted self driving will be a finished problem in a year. Guess what?
I actually sort of dislike this whole "perplexing" attitude that some people have. It's like yeah his opinion seems wrong or his opinion is not the norm, but there's no need to treat it as if it's "perplexing." It's like you observing animal behavior in a lab and your so "perplexed" on how someone can have a differing opinion.
People can have differing opinions and sometimes these opinions can be right and overturn an existing paradigm.
Instead of saying you find someone perplexing or strange, just say you disagree. It's more civil and it respects the underdogs of the past who fought against overwhelming odds to change entire schools of thought and bring our knowledge closer to answering the ultimate question.
So perplexing how some people are so rude nowadays. See what I did there?
I actually find this technique used a lot on HN. They disagree with someone but they want to insult them without violating HN rules so they treat the person as if they're some kind of lab experiment and observing how they're behavior is so "strange" or "perplexing". The admins likely fail to see just how insulting these kinds of comments are.
Perplexing is when someone jumps off a cliff while detonating a stick of dynamite. Someone with a differing opinion is NOT perplexing.
"I find it so perplexing that someone would think that... despite that... " and so on.
Really people should call it out. It's rude and manipulative.
It's possible they are actually perplexed. At least in my experience, there are a good number of people who can't hold in their head different perspectives on a topic. They just can't. The world is simple and black and white in their mind. And they aren't necessarily just plain old dumb, and they aren't uneducated. They just see black and white everywhere.
The world is full of gradients but also binary (aka black & white) systems. Charge is binary, computers are binary, life and death are binary. Systems can be either binary or gradients if one person believes something is binary or discreet that's his opinion and it's not automatically wrong.
This assumption that gradients are everywhere/ubiquitous/superior is not correct.
No wait let me reframe what I said.
How is it that someone can exist that can't comprehend the fact that many things in the universe aren't gradients and that they are in fact black and white? That is SO perplexing. It's baffling to me how someone can think like that and basically walk up to someone and examine them like some sort of sub human and announce that this other person is so perplexing because they don't think like them??
Like how come these people don't announce these things in public? They don't go to someones face and tell them that their opinions and behavior is so baffling? Why? Maybe it's because most people are aware that saying something like this is offensive. So they save it for HN where they can announce this garbage all the time without retribution from the other party.
It's just so perplexing to me how you, daniel, are unable to comprehend this. You say these "black and white" people are everywhere. Do you walk up to them in real life and tell them how "perplexing" you think they are? No. You don't. But you probably do it all the time on HN. So baffling this behavior.... I mean your not necessarily just plain old dumb and you aren't uneducated... yet you still do this on HN and can't see that it's an insult on HN just like it is in real life. So baffling that a human exists that thinks like this. I am truly perplexed.
This conclusion was arrived at because you found it perplexing that someone had a binary opinion on a topic and refused to consider a gradient.
Why would someone having a binary opinion on a topic be perplexing? Binary things exist. Thus it would only be perplexing if you felt that binary things didn't exist. That's the logic derived from reading your statements. Nothing is assumed here, it is a logical derivation.
Of course, reality is far more nuanced than that. You are fully aware that binary things exist. You're not stupid. But than again neither is the person you're "perplexed" about. He's fully aware that gradients exist as well. Nobody is actually so stupid that they believe binary things don't exist nor is anyone really so stupid as to believe gradients don't exist. Such a belief is completely ludicrous. We ALL know this. There is nothing to be "perplexed" about here.
The true nature of what's going on is that both of you only had a difference of opinion. But instead of discussing it in a civil way you decided to call anyone with a differing opinion than you "perplexing." Your opponent was OBVIOUSLY not someone who sees the world in black and white, just like you are OBVIOUSLY not someone who only sees the world in gradients.
But it gets even more nuanced then that. I'm willing to bet you weren't even aware you were being insulting at the time.Let me make this more clear. You don't suffer from brain damage, so you're also 100% aware that calling someones behavior "perplexing" to their face is insulting. Whether it'd be on HN or in real life.
You know this, you're aware of this yet at the very moment when you called someones behavior "perplexing" on HN you lost all awareness. The human mind is biased and contradictory. It lies to others and to itself to justify things such that certain actions can be taken at certain times.
Your brain was too busy constructing a retort to use against an opponent that you never were able to see the hypocrisy within yourself. Would I call that perplexing? No. It's actually normal. Lots of people use the "perplexing" tactic on HN, and likely all of them have the same hypocritical blindness to the rudeness of such an action. It's not you who was biased it's always your opponent who sees everything as black and white. People are ironically biased towards always seeing other people as biased rather than themselves.
In fact, the people who are the least biased aren't the people calling out others, the person who is the least biased is the person who is aware of their own biases.
Well. Now you're aware. And hopefully you'll wise up and be less rude. Instead of asking me to re-read your content, why don't you re-read my content. The whole point of the example is to illustrate the hypocrisy and unreasonableness and rudeness of the "perplexing" tactic. Yeah, I'm fully aware you're not so stupid as to think the world is never black and white, the whole point was to illustrate how you're likely fully aware that your opponent isn't so stupid as to think the world can never be a gradient.
everywhere is in my sentence. Some people see black and white everywhere. That's perplexing. Given the word everywhere is there, I'm going to guess that you can now see that your derivation isn't logical.
It's an internet forum, so no big deal, but you've gone on quite an accusatory rant without actually reading my comment properly.
Yeah it isn't a big deal. That's why your able to call someone perplexing to their face on an internet forum. You're clearly not doing this in real life.
>Some people see black and white everywhere.
Very unlikely someone is actually like this. We see gradients with our physical eyes with brightness and saturation, we hear gradients with sound for volume. Our human bodies are tuned to analyze and perceive gradients. It is fundamentally impossible to see black and white everywhere.
These people only see black and white in certain topics you are discussing with them, and it is not automatically "perplexing" that they disagree with you on those topics.
People like you are everywhere as well. What type of person are you? Someone who accuses people who disagree with you as people who see things only in terms of black and white.
>It's an internet forum, so no big deal, but you've gone on quite an accusatory rant without actually reading my comment properly.
Not only am I accusatory. But I'm accusing you of something that is 100% true. And I would tell you this to your face in real life. Think about it.
In men, high levels of endogenous testosterone (T) seem to encourage behavior intended to dominate – to enhance one's status over – other people. Sometimes dominant behavior is aggressive, its apparent intent being to inflict harm on another person, but often dominance is expressed nonaggressively. Sometimes dominant behavior takes the form of antisocial behavior, including rebellion against authority and law breaking. Measurement of T at a single point in time, presumably indicative of a man's basal T level, predicts many of these dominant or antisocial behaviors. T not only affects behavior but also responds to it. The act of competing for dominant status affects male T levels in two ways. First, T rises in the face of a challenge, as if it were an anticipatory response to impending competition. Second, after the competition, T rises in winners and declines in losers.
Why isn't the person I responded to replying to what I wrote? Probably because he's been dominated.
This post serves to win the game and lay out ammunition for others to use in the future.
Every culture and conversational arena has rules that must be obeyed when playing the dominance game. By pointing out how someone is violating the spirit of HN etiquette I dominate the other party by elucidating how the other person isn't playing by the rules. I can link to this thread in the future for anyone who wants to call someone "perplexing" in the future.
You should note that dominance games exist in both men and women. Everybody flexes sometimes but most people are never self aware about it.
You should also know that relating gender to behavior even though it's scientifically valid is a cultural no no. Expect to lose the dominance game should you ever go this route as you'll be labelled as sexist or people will become incredulous. "Are you literally implying men and women can behave differently? That's preposterous!!"
There's a subtle veiled insult here when you called me a woman. It's also a little off topic, but whatever. The obvious move would be to call you sexist as the rules of our modern culture say it's an easy win. I can also just be silent and you automatically lose as others vote you down. But you aren't technically wrong about the differences between male and female behavior. So I won't take that route because it's cheap.
Either way you are completely off. I'm a dude. I'm a heterosexual male doing what heterosexual male humans typically do in the wild. I seek to dominate ass holes who call others perplexing. I dominate by being fucking completely right and not by using cheap conversational tactics. This entire post was the typical male testosterone fueled maneuver. I'm curious as to why you weren't able to see that? Maybe it's because you're female? Go read a psychology book on gender. Maybe that will help you understand the male mind better.
There is that lovely quote floating around that change is slower than people expect in the short term but faster in the long term.
It is far too early to write of plain ol' deep reinforcement learning as a failure. It hasn't yet been 5 years since AlphaGo really shocked the unwary, and that was rather cutting edge at in 2016.
I think short- and long-term predictions are wrong in different ways. Short-term predictions often fail to come true, leading to the diagnosis of slowness. Long-term predictions often fail to even come false, because the world has changed so much the prediction is no longer applicable. In these cases, change was faster than expected, but not in a way that settles the original point (often pertaining to progress or some other value judgement).
I first saw the quote in something by Robert Cringely. (The Internet claims that Cringely attributed it to Amara, but I couldn't find the original source. Maybe "Accidental Empires"?)
Maybe quoting Roy Amara, a 1960s Stanford Computer scientist. He said "we overestimate the impact of technology in the short-term and underestimate the effect in the long run" [0].
I dimly remember similar sentiments in Arthur C. Clarke's Profiles of the Future (1662) [1], where he talked about "Hazards of Prophecy", where predictions suffered from either "failure of imagination" (predicted too little change) or "failure of nerve" (could/should have foreseen far reaching change, but chickened out and wrote down a watered-down version).
As ACC said it "The failure of nerve seems to be the more common; it occurs when even given all the relevant facts the would-be prophet cannot see that they point to an inescapable conclusion."
'We consider the following shift in paradigm: instead of training a policy through conventional
RL algorithms like temporal difference (TD) learning [6], we will train transformer models on
collected experience using a sequence modeling objective. This will allow us to bypass the need for
bootstrapping for long term credit assignment – thereby avoiding one of the “deadly triad” [6] known
to destabilize RL. It also avoids the need for discounting future rewards, as typically done in TD
learning, which can induce undesirable short-sighted behaviors. Additionally, we can make use of
existing transformer frameworks widely used in language and vision that are easy to scale, utilizing a
large body of work studying stable training of transformer models.'
Yuck. I hate how expensive transformers are. You can see it clearly in transformers vs gans; GANs can generate a frame in about 20ms, whereas it takes seconds or more to make a frame with transformers. It’s not even clear that it’s necessarily better quality.
That said, I doubt we’ll be able to make an RL GAN, so maybe this is the best way. Though now I wonder how well an RL GAN might work…
I think you are conflating "Transformers" and "autoregressive models". Transformers are a general purpose architecture for transforming sequences into other sequences with self-attention. AR models / GANs are frameworks for generative modeling. The model architecture is almost entirely orthogonal to the generative framework.
You can use transformers as part of GANs [1], and you can even use them as discriminative models for images [2].
Yeah, I work on on-device audio, so definitely agree on the expense problem. In audio we've now got a few different approaches that work really well for sequence modeling, and we're constantly finding cool ways to make inference run faster.
Reframing reinforcement learning as sequence modeling /should/ make it possible to reuse (m)any of the approaches we use for audio, including GANs. Generative audio is nicely analogous to RL problems : There's complex state and interesting predictive distributions, which shift subtly over time, and you need to combine short-term good behavior (good individual samples) in a reasonable way to get good long-term behavior (matching melspectra).
Aren't actor critic algorithms very close to GANs already? You have a generator/actor/policy that produces data and a discriminator/critic/q that says if the data is good or bad. The critic trains on the data generated by the actor and some extra info given by the user (rewards or example data) and the actor learns from the signal given by the critic.
There has been a lot of discussion recently in certain AI research communities recently about whether or not the entire idea of reinforcement learning is even necessary. Self supervised sequence models seem to represent the best path forward for general purpose problem solving agents because we can essentially just keep improving them by increasing parameter counts. There was a previous HN discussion about the idea here https://news.ycombinator.com/item?id=27659526
Those models require a _lot_ of training data to exist. And I have never seen supervised sequence model achieve super human performance at anything, the way AlphaGo did.
I wish the author had picked a title like "Single-task reinforcement learning is a waste" or "Reward function engineering is a waste".
At the very end of his essay he mentions some directions: homeostasis, Friston's free-energy principle, and predictive processing. I agree that all of these are very interesting. A few steps less ambitious is recent work on reinforcement learning to reach desired outcomes without specifying reward functions. All of these seem to require more focus on learning the model of dynamics in the relevant domain (what happens when I take action A from state S?) and less on value / policy learning.
Ok great, but why can't that happen in the context of "deep reinforcement learning"? For complex environments with partial observability, don't we probably want something like states represented in an embedding? As we improve our model, is there anything broken about the approach of optimizing a differentiable function?
"There are likely better approaches to deep RL, and they include ..." seems like a better framing than "deep RL is a waste of time."
I have to disagree with a good chunk of this article.
The article completely misses the main advantage (to me) of reinforcement learning:
Reinforcement learning allows you to optimise on non-differentiable outcomes.
I can't differentiate real life, but I want to optimise a process within real life. This feels tantalisingly close to AGI. If I can figure out a reward function, I can use reinforcement learning.
Yes, this requires a reward function to be defined. Yes, this is a challenge to AGI. But to say that the big labs are not aware that this is a challenge to AGI is unfair. DeepMind is actively investigating open ended learning: https://deepmind.com/research/publications/open-ended-learni....
Just because the labs haven't tackled all the questions doesn't mean that they're not busy tackling difficult questions.
DeepMind just published a paper "Reward is enough":
Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.
> DeepMind just published a paper "Reward is enough"
That paper does not mean that DeepMind believes that reward is enough. It is one of their many papers that is seeking to espouse a point a view. It does not mean it is their exclusive point of view.
For example, DeepMind are actively hiring for researchers in open-ended learning (check their website). This is not a new thing for them, e.g. they were quickly on the ball with apprenticeship learning too (https://arxiv.org/abs/1706.06617).
My guess: a function that you can evaluate but that you can not differentiate. Differentiate being "know the slope".
Differentiable functions are great because you can run gradient descent on them in a very optimized way. Example: if your objective is to have a very high value, search in the direction that has a mounting slope.
Though maybe I'm missing something 'cos it seems to me you can run gradient descend on non-differentiable functions. It just requires more evaluations.
> Though maybe I'm missing something 'cos it seems to me you can run gradient descend on non-differentiable functions
Isn't that exactly the point though? If you don't have an analytical solution for the gradient of the loss (reward) wrt the parameters - yes - you could brute force a numerical solution but as the number of parameters grows that quickly becomes infeasible. Approaches such as RL and GA provide a more intelligent way to search the parameter space.
Neural network models have millions of parameters, which means you are trying to optimize in spaces with millions of dimensions. If you can't use differentiation you get hit really hard with the curse of dimensionality.
> So now we have the top machine learning research institutes, DeepMind and OpenAI, still spending the majority of their time and resources on Deep RL
DeepMind has diversified at least some since 2019, and I'm fairly confident that OpenAI is spending more resources on huge transformer models than on RL these days.
Which is really the only thing that has changed, since even in 2019 there are at least a dozen world-class institutions doing AI/ML research aimed at addressing issues raised in this blog post (and others).
The blog post is accurate about OpenAI/Deepmind c. 2019, but is wrong about the overall composition of research effort in the field c. 2019. Outside of two small and very new labs, most ML research wasn't focused on RL, and most RL research wasn't focused on DRL as a silver bullet.
Sort of of the west coast SV version of only paying attention to work out of MIT and Stanford and therefore missing most of the interesting things happening in the world.
I think when looking at least at large scale applications in the context of games (and not just super-expensive showcases), like in the Stockfish chess engine, we see that it's not primarily about depth, it's about architecture design. Reference: start here https://stockfishchess.org/blog/2021/stockfish-14/ and go down the rabbit hole...
In the broadly useful domain of recommender systems (which typically make use of some type of RL-like feedback loop, but can be implemented using simple clustering approaches), at least in 2019, neural network-based approaches didn't seem to fair too well, either: https://arxiv.org/pdf/1907.06902.pdf (arXiv pre-print, but this is an award-winning paper).
Since then, it seems that researchers are moving away from getting deeper and deeper (the low-hanging fruit), and try to be more creative instead: new architectures, combining symbolic (logic-based) and sub-symbolic (ML-based) AI, etc.
Most of the improvements in real world have been by improving data representation for perception (improvements in transformers, self supervised learning), so so far the article seems right.
The article is about the failure of reinforcement learning to make it out of games and into any kind of real world task, so I think alpha go doesn't really change the argument.
AlphaFold is from DeepMind, but it uses modern neural networks, but not reinforcement learning. DeepMind is not just simply wasting money, they are doing important AI research in other areas as well.
Deep-learned convolutional nets work wonders for visual recognition. Visual recognition via "the old AI" looked impossible by 1980, but today it looks easy.
(I see the visual segmentation models for self-driving cars from the "autonomous systems lab" in the next building over and think... It would be so easy to make something that honks for cyclists.)
Text analysis, reinforcement learning, etc. seem to be areas where deep learning might very well reach a plateau. In the case of images the meaning is not changed by a random "hot pixel", but changing one letter in a sentence can reverse the meaning of a sentence, changing one piece position in a chess game is the difference between a win or loss. These "binary" situations aren't a good match with the assumptions of continuity, differentiablity and such that neural networks depend on.
> Text analysis, reinforcement learning, etc. seem to be areas where deep learning might very well reach a plateau... changing one piece position in a chess game is the difference between a win or loss. These "binary" situations aren't a good match with the assumptions of continuity, differentiablity and such that neural networks depend on.
Deep learning can't do board games like chess and go?!
Wait... are you using pretty darn subtle dry sarcasm to argue that deep learning won't reach a plateau?
I think that markov chain monte carlo is pretty cool even with lightweight playouts. (pick a random move)
The neural net by itself is a "half-baked" chess or go player, it needs the MCMC to be a strong player. (MCMC plus lightweight playouts can beat me at chess if not at go.)
Same with text-analysis, code generation and such. If you can build a hybrid system where the neural net comes up with half-baked answers that can be corrected by a system which is capable of comprehending things like "well-formed" and "valid" then you could be cooking with gas.
What I am seeing though is that people aren't "beginning with the end in mind" the way the Wright Brothers did with flying, rather they are throwing stuff at the wall and seeing what sticks.
Reward function design and overfitting or “cheating” (optimizing to some incidental thing instead of the real problem) is a major reason genetic and evolutionary AI approaches never took off. I don’t think anyone ever figured out how to make reward function design easier or make any kind of unsupervised evolutionary learning work.
I played with EC a lot in college and these systems were almost comically good at the “cheating” part of overfitting. I watched evolving programs do things like learn the scheduling behavior of the OS kernel (because the reward function was threaded) or the disk timing differences resulting from where different parts of the sample set were stored on the drive. They could guess the answer by inferring load time.
Has there been any work on "growth" based RL models ? Like how a human baby starts off with a small brain and little capability to move around, but slowly starts to roll over, gets head control, crawl and eventually walk, grasp and develop fine motor control as their brains and physical abilities grow together.
The current method looks to me like starting from scratch with a fully capable human with a huge untrained brain instead of progressively expanding their actuators and control plane
I recommend Peter Hiesinger's 'The Self-Assembling Brain' if you're interested in a neurobiologist's perspective on this problem and its importance to AI.
You'll only hear one side of the coin. I would also want to hear from people who understand AI/ML and have decided not to use it in an application domain. That's probably the most valuable info, knowing when not to use a tool.
I understand ML reasonably well, and have decided to use classical AI instead, because I want real-time performance on low-end hardware (and I also want to be able to predict and extend the operating parameters). It's a lot harder, though, because I'll have to understand the problems at a fundamental (mathematical) level better than I currently do, and all the time I can hear a little voice saying “a neural network could do this in half an hour of work and a week of training”.
It's birds and airplanes: airplanes don't have feathers, but they solve different problems. Aerodynamics is a science, actual airplanes are engineered. But aerodynamics as a science evolved more slowly than actual flying airplanes; that's where we're at with ML.
Put your money where your mouth is. Train a homeostatic surprise minimization model to beat OpenAI’s PPO at Dota 2. The beautiful thing about benchmarks is that you can easily persuade everyone just by beating them.
You’re implying that it’s either AGI or nothing and I can’t agree. There are plenty of applications and use cases for ML out there. Should that be called “AI”? I don’t care too much personally. Is it overhyped? Sure. But there is enough useful stuff there that I don’t expect another winter or AGI any time soon.
I think AI is over-hyped now in exactly the same way that the dot com bubble over-estimated the impact of the internet at exactly the same time that most people were under-estimating the impact of the internet.
"In 2020 everyone will laugh at the idiots investing in the dot coms, and also three of the five largest companies in the world will be internet companies founded after 1990" would've seemed like a contradiction. Both sides were wrong in the late 90s.
The techno-hippies talking about AGI are insane, and so were the techno-hippies who imagined the internet would connect the world and solve all our problems.
The MBAs are selling bullshit as always and a lot of it'll go bust.
But there's a lot of value in the past decade of advancements in ML and a lot more to come, with a lot of chaff in the wheat.
The future will be shaped by ML more than the average person thinks, and those changes will be more prosaic than the hypsters would have us imagine, and those changes will be huge in ways that people kind of see now but take on an unexpected shape.
I think in 2050 we'll be nowhere close to fully autonomous robotics, but also the combined forces of the USA military will in aggregate have a robotic land/air/sea fighting force that's 1) larger than most of the world's militaries and 2) capable of toppling a nation-state like Iraq or Afghanistan completely autonomously (ie, without any remote control).
I think in 2050 we won't have robotaxis, but every warehouse and port in the developed world will be a nearly lights-out operation.
I think in 2050 we'll still have wait staff and baristas but most non-sitdown food prep establishments will have at most one employee.
I think in 2050 chatbots will still be useless for replacing callcenter work but also video games will have incredibly immersive social environments that are at least as stimulating as real social interactions.
Making technological predictions on such a timescale is folly.
1821 is two centuries away - the minimum necessary to use a plural form of the noun "century". For a person of 1821, the technologies that we rely upon routinely would be completely unknown. What would they recognize? Not cars, not computers, not electrical appliances and light, not pretty much anything in our households save furniture.
Of all the tech that you can meet in an American street of 2021, the only thing that would be somewhat familiar to people of 1821 would be guns.
I'm not sure how you reach that conclusion when we keep doing crazy things that haven't ever been possible before with AI literally like every 6 months right now.
> I am confused as to why they continue to spend so much time on building Deep RL systems that beat games with clearly defined rules and point systems. I think everyone gets it now, with enough time, money, and computers almost brute forcing every single possible action, Deep RL can beat almost any game.
Right, before AlphaGo, superhuman Go was decades away. Then it became obvious and therefore easy. He brings up a laundry folding robot almost as if discrete games can be trivially applied to robotics. If you think about it for one second, Shogi, the game that AlphaZero solves with the largest action space has 10^5 actions. Consider a robot with 3 joints that can move in any dimension motors. Discretizing the continuous action space into 10 buckets yields an action space of 10^9. Something AlphaZero will never solve. In that sense, the entire analogy is a straw man and everyone in the field knows. This doesn't even touch the fact that 10 buckets would be useless for fine motor control.
Since they published this, there have been great advancements in learning action embeddings for robotic tasks. With a couple of expert examples, an algorithm can learn to open a door effectively. That's not that far from folding laundry.
Personally, my entire PhD thesis attempts to solves the exact deficiencies they raise. Simply put, the work goes on. Instead of saying that the field is useless, try to understand and contribute to it. At the very least, give technical reasons that it is a dead end path.