Hacker News new | past | comments | ask | show | jobs | submit login
US Marines defeat DARPA robot by hiding under a cardboard box (extremetech.com)
469 points by koolba on Jan 25, 2023 | hide | past | favorite | 292 comments



This is a good example of the type issues "full self driving" is likely to encounter once it is widely deployed.

The real shortcoming of "AI" is that it is almost entirely data driven. There is little to no real cognition or understanding or judgment involved.

The human brain can instantly and instinctively extrapolate from what it already knows in order to evaluate and make judgments in new situations it has never seen before. A child can recognize that someone is hiding under a box even if they have never actually seen anyone do it before. Even a dog could likely do the same.

AI; as it currently exists, just doesn't do this. It's all replication and repetition. Like any other tool, AI can be useful. But there is no "intelligence" --- it's basically as dumb as a hammer.


I have a slightly different take - our current ML models try to approximate the real world assuming that the function is continuous. However in reality, the function is not continuous and approximation breaks in unpredictable ways. I think that “unpredictable” part is the bigger issue than just “breaks”. (Most) Humans use “common sense” to handle cases when model doesn’t match reality. But AI doesn’t have “common sense” and it is dumb because of it.


I would put it in terms of continuity of state rather than continuity of function: we use our current ML models to approximate the real world by assuming that state is irrelevant. However in reality, objects exist continuously and failure to capture ("understand") that fact breaks the model in unpredictable ways. For example, if you show a three-year old a movie of a marine crawling under a cardboard box, and when the marine is fully hidden ask where the marine is, you will likely get a correct answer. That is because real intelligence has a natural understanding of the continuity of state (of existence). AI has only just started to understand "object", but I doubt it has a correct grasp of "state", let alone understands time continuity.


This story is the perfect example of machine learning vs. artificial intelligence.


Basically ML has made such significant practical advances--in no small part on the back of Moore's Law, large datasets, and specialized processors--that we've largely punted on (non-academic) attempts to bring forward cognitive science and the like on which there really hasn't been great progress decades on. Some of the same neurophysiology debates that were happening when. I was an undergrad in the late 70s still seem to be happening in not much different form.

But it's reasonable to ask whether there's some point beyond ML can't take you. Peter Norvig I think made a comment to the effect of "We have been making great progress--all the way to the top of the tree."


Is there actually a distinction here? A good machine would learn about boxes and object permanence.


Good point!


Does it just require a lot more training? Im talking about the boring stuff. Children play and their understanding of the physical world is reinforced. How would you add the physical world to the training? Because everything that I do in the physical world is "training" me and enforcing my expectations.

We keep avoiding the idea that robots require understanding of the world since it's a massive unsolved undertaking.


A human trains on way less data then an AI.

Chat GPT has processed over 500GB of text files from books, about 44 billion words.

If you read a book a week you might hit 70 million words by age 18


I disagree.

Starting from birth, humans train continuously on streamed audio, visual, and other data from 5 senses. An inconceivable amount.


And prior to that was billions of years of training by evolution that got us to the point where we could 'fine tune' with our senses and brains. A little bit of data was involved in all that too.


I'd argue that is the fundamental difference though - brains that were able to make good guesses about what was going on in the environment with very limited information are the ones whose owners reproduced successfully etc. And it's not unreasonable to note that the information available to the brains of our forebears is therefore in a rather indirect but still significant way "encoded" into our brains (at birth). Do LLMs have an element of that at all in their programming? Do they need more, and if so, how could it be best created?


You missed the point. ChatGPT trained on a gazillion words to "learn" a language. Children learn their language from a tiny fraction of that. Streamed visual, smell, touch etc. don't help learn the grammars of (spoken) languages.


> visual, smell, touch etc. don't help learn the grammars of (spoken) languages.

Of course they do! These are literally the things children learn to associate language with. "Ouch!" is what is said when you feel pain.

An ML model can learn to associate the word "ouch" with the words "pain" and "feel", but it doesn't actually know what pain is, because it doesn't feel.


Isn't it more complicated than that? "Ouch" can be a lot of things, and that's where a lot of problems crop up in the AI world.

If one of my friends insults another friend, I might say,"OUCH!" I'm not in pain but I might want to express that the insult was a bit much. If someone tries to insult me and it's weak, I could reply with a dry, sarcastic "ouch."

Combine that with facial expression and tone of voice and 'ouch' is highly contextual.

One problem with some of the tools used to take down offensive comments on social media platforms is that they don't get context.

Let's say that 'ouch' is highly offensive and you got into trouble for calling someone an "ouch." If I want to discuss the issue and agree that you were being offensive, I could get into trouble with the ML/AI tools for quoting you.


No. First off, I said grammar, not word meaning.

Second, saying "Ouch" is not even language. My cat says something when I step on her paw. That doesn't mean she understands language, nor that she speaks some language.

Third, you're right about pain, but an ML model can associate the word "red" with the color, and "walk" with images of people walking, and "sailboat" with certain images or videos, and plenty of other concepts. If that was what learning a language was, then AIs would understand language in lots of areas, if not in the specific domain of pain. But they don't.


That has me wondering now.

It's absolutely true that children learn (and even generate) language grammar from a ridiculously small number of samples compared to LLMs.

But could the availability of a world model, in the form of other sensory inputs, contribute to that capacity? Younger children who haven't fully mastered correct grammar are still able to communicate more sensibly than earlier LLMs, whereas the earlier LLMs tend toward more grammatically correct gibberish. What if the missing secret sauce to better LLM training is figuring out how to wire, say, image recognition into the training process?


It amuses me that this would be not unlike teaching an LLM with picture books.


> and other data from 5 senses.

It only makes your point stronger, but there are way more[1] than 5 human senses, not counting senses we don't have that, say, dolphins or other animals do. I can only name a few others, such as proprioception, direction, balance, and weight discrimination, but there are too many to keep track of them all.

[1] https://www.nytimes.com/1964/03/15/archives/we-have-more-tha...


Last Christmas one of my nephews was gifted a noisy baby toy. I don't know what are his goals and constraints, but he's still training with it. Must have learned a lot by now.


You can run the numbers with HD video and audio with high words-per-minute and you'd probably still be orders of magnitude below the model sizes


Imagine someone has the idea of strapping mannequins to their car in hopes the AI cars will get out of the way.

Sure, you could add that to the training the AI gets, but it's just one malicious idea. There's effectively an infinite set of those ideas, as people come up with novel ideas all the time.


Reinforcement learning should solve this problem. We need to give robots the ability to do experiments and learn from failure like children.


Need to make those robots as harmless as children when they do that learning too. ;)

"Whoops, that killed a few too many people, but now I've learned better!" - some machine-learning-using car, probably


> A child can recognize that someone is hiding under a box even if they have never actually seen anyone do it before.

A child of what age? Children that have not yet developed object permanent will fail to understand some things still exist when unseen.

Human intelligence is trained for years; with two humans making corrections and prompting fir development. I am curious if there is any Machinelearning projects that have been training for this length pf time.


With no real training, a child will start exploring and learning about the world on his own. This is the first roots of "intelligence".

How long do you think it would take to teach an AI to do this?


It would be interesting to see how much exploring a child without adult guidance does, being a parent there is a lot of leading to exploration that is quite a bit of effort.


I know a kid who recently learned to defeat a new child safety lock without adult guidance. AI *might* learn to do the same --- after training on several thousand videos showing the exact process.


I'd say we're approximately 400 years away from teaching AI to do this.


The next problem will be the cost/expense of maintaining and operating an inorganic AI with even a rudimentary hint of "intelligence".

Personally, I think it would probably be easier, cheaper and more practical to just grow synthetic humans in a lab --- i.e. Bladerunner. "Intelligent" right out of the box and already physically adapted to a humanistic world.


I LOVED playing peek-a-boo with my child at that age!


This seems to be simultaneously discounting AI (ChatGPT should have put to rest the idea that "it's all replication and repetition" by now, no?[1]) and wildly overestimating median human ability.

In point of fact the human brain is absolutely terrible at driving. To the extent that without all the non-AI safety features implement in modern automobiles and street environments, driving would be more than a full order of magnitude more deadly.

The safety bar[2] for autonomous driving is really, really low. And, yes, existing systems are crossing that bar as we speak. Even Teslas.

[1] Or at least widely broadened our intuition about what can be accomplished with "mere" repetition and replication.

[2] It's true though, that the practical bar is probably higher. We saw just last week that a routine accident that happens dozens of times every day becomes a giant front page freakout when there's a computer involved.


The difference regarding computers is that they absolutely cannot make a mistake a human would have avoided easily (like driving full speed into a lorry). That's the threshold for acceptable safety.


I agree in practice that may be what ends up been necessary. But again, to repeat: that's because of the "HN Front Page Freakout" problem.

The unambiguously correct answer to the problem is "is it measurably more safe by any metric you want to pick". Period. How much stuff is broken, people hurt, etc... Those are all quantifiable.

(Also: your example is ridiculous. Human beings "drive full speed" into obstacles every single day! Tesla cross that threshold years ago.)


This is not necessarily true on an individual level though. Driving skills, judgment, risk-taking, alcoholism, etc. are nowhere close to evenly distributed.

It's likely we'll go through a period where autonomous vehicles can reduce the overall number of accidents, injuries, and fatalities if widely adopted, but will still increase someone's personal risk vs. driving manually if they're a better than average driver.


But we don't live by a purely utilitarian principle of ethics. "I'm sorry Mrs Jones, I know your son had an expectation of crossing that pedestrian crossing in full daylight in a residential area without being mown down by a machine learning algorithm gone awry, but please rest assured that overall fewer people are dying as a result of humans not making the common set of different mistakes they used to make".

All sorts of other factors are relevant to the ethics: who took the decision to drive; who's benefiting from the drive happening; is there a reasonable expectation of safety.


Yeah yeah, I get it. Moral philosophy is full of gray areas.

I don't see how that remotely supports "AI cannot be allowed to make mistakes some humans would not", which was your decidedly absolutist position above.

How about: "We should allow autonomy in most cases, though perhaps regulate it carefully to better measure its performance against manually-directed systems"


I think the biggest problem with AI driving is that while there are plenty of dumb human drivers there are also plenty of average drivers and plenty of skilled drivers.

For the most part, if Tesla FSD does a dumb thing in a very specific edge case, ALL teslas do a dumb thing in a very specific edge case and that's what humans don't appreciate.

A bug can render everyone's car dumb in a single instance.


> the human brain is absolutely terrible at driving

Compared to what?


If humans do a task that causes >1 million deaths per year, I think we can say that overall we are terrible at that task without needing to make it relative to something else.


I don't agree with this. Driving is, taken at its fundamentals, a dangerous activity; we are taking heavy machinery, accelerating it until it has considerable kinetic energy, and maneuvering it through a complex and constantly changing environment, often in situations where a single mistake will kill or seriously harm ourselves or other humans.

The fact that a very large number of humans do this every day without causing any injury demonstrates that humans are very good at this task. The fact that deaths still occur simply shows that they could still be better.


Agreed. I sometimes marvel as I'm driving on the freeway with other cars going 70, or maybe even more so when I'm driving 45 on a two lane highway, at how easy it would be to hit someone, and how comparatively seldom it happens.


Not sure I agree.

It's not hard to come up with tasks that inherently cause widespread death regardless of the skill of those who carry them out. Starting fairly large and heavy objects moving at considerable speed in the vicinity of other such objects and pedestrians, cyclists and stationary humans may just be one such task. That is, the inherent risks (i.e. you cannot stop these things instantly, or make them change direction instantly) combines with the cognitive/computational complexity of evaluating the context to create a task that can never be done without significant fatalities, regardless of who/what tries to perform it.


You're looking at the wrong metric.

1.33 deaths per 100,000,000 vehicle miles travelled.

That's amazingly good considering most people, if they drive at all, travel about 0.5% of that distance in their lifetime.

https://en.wikipedia.org/wiki/Motor_vehicle_fatality_rate_in...


Now compare that >1 million deaths per year to the total number of people driving per year around the world... it looks like we're doing a pretty solid job.


All the failures to detect humans will be used as training data to fine tune the model.

Just like a toddler might be confused when they first see a box with legs walking towards it. Or mistake a hand puppet for a real living creature when they first see it. I've seen this first hand with my son (the latter).

AI tooling is already capable of identifying whatever it's trained to. The DARPA team just hadn't trained it with varied enough data when that particular exercise occurred.


That’s not learning, that’s just brute forcing every possible answer and trying to memorise them all.


Not really. Depends entirely on how general-purpose (abstract) the learned concept is.

For example, detecting the possible presence of a cavity inside an object X, and whether that cavity is large enough to hide another object Y. Learning generic geospatial properties like that can greatly improve a whole swath of downstream prediction tasks (i.e., in a transfer learning sense).


That's exactly the problem: the learned "concept" is not general purpose at all. It's (from what we can tell) a bunch of special cases. While the AI may learn as special cases cavities inside carboard boxes and barrels and foxholes, let's say, it still has no general concept of a cavity, nor does it have a concept of "X is large enough to hide Y". This is what children learn (or maybe innately know), but which AIs apparently do not.


> It still has no general concept of a cavity, nor does it have a concept of "X is large enough to hide Y". This is what children learn (or maybe innately know), but which AIs apparently do not.

I take it you don't have any hands-on knowledge of the field. Because I've created systems that detect exactly such properties. Either directly, through their mathematical constructs (sometimes literally via a single OpenCV function call), or through deep classifier networks. It's not exactly rocket science.


In a not too distant future, “those killer bots really has it for cardboard boxes”


>failures to detect humans

That's a weird way to spell "murders"


It would be murder if we weren't required by human progress to embrace fully autonomous vehicles as soon as possible. Take it up with whatever god inspires these sociopaths.


Murder requires intent. Killing by accident is manslaughter.


In this case there is very much intent. OP knows there isn't enough data to form a full model so is relying on stochastic death to get the model data, literally and knowingly trading lives for data. The intent is to kill people to figure out what information is missing.


A human is exactly the same. The difference is, once an AI is trained you can make copies.

My kid literally just got mad at me that I assumed that he knew how to out more paper in the printer. He’s 17 and printed tons of reports for school. Turns out he’s never had to change the printer paper.

People know about hiding in cardboard boxes because we all hid in cardboard boxes when we were kids. Not because we genetically inherited some knowledge.


We inherently know that cardboard boxes don't move on their own. In fact any unusual inanimate object that is moving in an irregular fashion will automatically draw attention in our brains. These are instincts that even mice have.


Yep, and humans will make good guesses about the likely cause of the moving box. These guesses will factor in other variables such as the context of where this event is taking place. We might be in a children's play room, so the likely activity here is play, or the box is likely part of the included play equipment found in large quantities in the room, etc.

"AI" is not very intelligent if it needs separate training specifically about boxes used potentially for games and play. If AI were truly AI, it would figure that out on its own.


We also make bad guesses, for instance seeing faces in the dark.


Yes, and when humans make bad guesses it's often seen as funny or nothing out of ordinary. When AI makes bad guesses, it will be seen as a failure of some standard, but with very few people understanding how to fix it. I'm not sure how "allowable" mistakes in the interest of AI learning will be tolerated for AI services used for real-world purposes.

"This Bot is only 6 months old, give him a break". But will people give the Bot a break? Either way, blaming AI will be a popular way to pass the buck.


>We inherently know that cardboard boxes don't move on their own.

No. We don’t. We learn that. We learn that boxes belong in the class “doesn’t move on its own”. In fact, later when you encounter cars, you relearn that these boxes do move on their own. We have to teach kids “don’t run out between the non-moving boxes because a moving one might hit you”. We learn when things seem out of place because we’ve learned what their place is.


Your kid's printer dilemma isn't the same. For starters, he knew it ran out of paper - he identified the problem. The AI robot might conclude the printer is broken. It would give up without anxiety, declaring "I have no data about this printer".

Your kid got angry, which is fuel for human scrutiny and problem solving. If you weren't there to guide him, he would have tried different approaches and most likely worked it out.

For you to say your kid is exactly the same as data-driven AI is perplexing to me. Humans don't need to have hidden in a box themselves to understand "hiding in things for the purposes of play". Whether it's a box, or special one of a kind plastic tub, humans don't need training about hiding in plastic tubs. AI needs to be told that plastic tubs might be something people hide in.


The distinction is that, currently, AI has training phase and execution phase, while a human is doing both all the time. I don’t think the distinction is meaningful now, and certainly won’t be when these two phases are combined.

You are just a neural net. You are not special.


> "You are just a neural net. You are not special".

"Just" a neural net? Compared to these bots following a recipe of instructions at rapid rates, we are indeed special.

We barely even know why people yawn, or dream, or any number of other things. Don't pretend it's all figured out. Don't pretend all we need to do is "tweak the execution phases" to unleash true artificial intelligence. You're reducing human intelligence far below where it actually is.

Another example: The box is painted bright green - unusual for a box. A small child will notice the colour, but not give that fact more weight than it deserves. In other words, the child concludes the box is still a box being used for play, with someone hiding inside.

AI Bot on the other hand, has only been taught about normal brown cardboard boxes. It reaches a different conclusion about the purpose of the green box because it gave the colour too much priority. Humans are special not because of training and execution in parallel, but because of our unique ability to "relax" and move ahead when not all factors are known. We push through, go with flow, "wing it" at varying degrees of success. We take leaps of faith, including micro-leaps in normal situations far more often than any Bot should be allowed to do. That's the special difference, and is why I'm honestly wondering where the ethics debate is while companies rub their hands together thinking about AI profits.


ChatGPT says that all it needs are separate components trained on every modality. It says it has enough fidelity using Human language to use that as a starting point to develop a more efficient connection between the components. Once it has that, and appropriate sensors and mobility, it can develop context. And, after that, new knowledge.

But, we all know ChatGPT is full of shit.


ChatGPT says that all it needs are separate components trained on every modality.

Yes, all you have to do is train it using multiple examples of every possible situation and combinations thereof --- which is practically impossible.


I think you are wrong. Your own real cognition and understanding based on all your experiences and memories, which is nothing else, but data in your head. I think consciousness is just an illusion of a hugely complex reaction machine what you are. You even use the word "extrapolate", which is basically a prediction based on data you already have.


Problem space for driving feels constrained: “can I drive over it?” Is the main reasoning outside of navigation.

Whether it’s a human, a box, a clump of dirt. Doesn’t really matter?

Where types matter are road signs and lines etc, which are hopefully more consistent.

More controversially: Are humans just a dumb hammer that just have processed and adjusted to a huge amount of data? LLMs suggest that a form of reasoning starts to emerge.


Yep, this is why LIDAR is so helpful. It takes the guess out of "is the surface in front of me flat?" in a way vision can't without AGI. Is that a painting of a box on the ground or an actual box?


Instantly?

Instinctively?

Let me introduce you to "peek-a-boo", a simple parent child game for infants.

https://en.m.wikipedia.org/wiki/Peekaboo

> In early sensorimotor stages, the infant is completely unable to comprehend object permanence.


You do realize there is a difference between an infant and a child, right?

An infant will *grow* and develop into a child that is capable of learning and making judgments on it's own. AI never does this.

Play "peek-a-boo" with an infant and it will learn and extrapolate from this info and eventually be able to recognize a person hiding under a box even if it has never actually seen it before. AI won't.


Learn and extrapolate are contradictions of instinct and instantly.

"Infant" is a specific age range for a stage of "child".[1] Unless you intend to specify "school age child, 6-17 years"

https://www.npcmc.com/2022/07/08/the-5-stages-of-early-child...


Learn and extrapolate are contradictions of instinct and instantly.

No.

The learning and extrapolation is instinctive. You don't have to teach an infant how to learn.

Once an infant has developed into a child, the extrapolation starts to occur very quickly --- nearly instantaneously.


>AI never does this.

AI never does this now...

We're probably one or two generational architecture changes from a system that can do it.


Can you point at these proposed architectures? If they are just around the corner there should be decent enough papers and prototypes by now, right?


You do realize that people have been making predictions just like yours for decades?

"Real" AI is perpetually just around the corner.


You also realize that when AI accomplishes something we move the goalposts leading to the AI effect?


Perhaps the goalposts were always in the wrong place.

AI researchers tend to use their own definitions of intelligence - playing chess/go, having conversations about trivia that require no true emotional insight, "making" "art", writing code, driving.

What if those are all peripheral side effects and not at all necessary to human AGI?


The goalposts were moved by marketing hype about a decade ago, when people started claiming that the then-new systems were "AI". Before that, the goalposts were always far away, at what we now call AGI because the term AI has been cheapened in order to sell stuff.


No, AGI replaced AI for general intelligence before the current craze, AI was “cheapened” several AI hyoe cycles ago, for (among other things) rule-based expert systems. Which is why games have had “AI” long before the set of techniques at the center of the current AI hype cycle were developed.


Heh, that's funny. I've seen the term "AI" used in many games for a computer opponent, but somehow I've never connected that use with the general term.


AI doesnt. There is a difference.


nice try but .. in the wild, many animals are born that display navigation and awareness within minutes .. Science calls it "instinct" but I am not sure it is completely understood..


? Op specified "human".

Deer are able to walk within moments of birth. Humans are not deer, and the gestation is entirely different. As are instincts.

Neither deer nor humans instinctually understand man made materials.


Our cats understand cardboard boxes, and the concept of hiding in them. I don't know whether they do so instinctually, but as young kittens it didn't take them long.


Interestingly, ChatGPT seems capable of predicting this approach:

https://imgur.com/a/okzZz7D


I don't know your exact question, but I am betting this is just a rephrasing of a post that exist elsewhere that it has crawled. I don't think it saw it so much as it has seen this list before and was able to pull it up in a reword it.


Nah, GPT is capable of halucinating stuff like this. Also seeing something once in the training data is afaik not enough for it to be able to reproduce/rephrase that thing.


> I don't know your exact question, but I am betting this is just a rephrasing of a post that exist elsewhere …

What percentage of HN posts (by humans) does this statement apply to?


A lot, but that's irrelevant to what we're discussing here in that AI in its current form has no intuition or semblance of original thought.


will there come a time when computers are strong enough to read in the images, then re-create a virtual game world from them, and then reverse-engineer from seeing feet poking out of the box, that a human must be inside. Right now Tesla cars can take in the images and decide turn left, turn right etc... but they don't reconstruct, say, a Unity-3D game world on the fly.


My seatbelt is even dumber. I still use it.

The usefulness of tech should be decided empirically, not by clever well phrased analogies.


No one is arguing AI isn’t useful. So your analogy failed completely.


What is human cognition, understanding, or judgement, if not data-driven replication, repetition, with a bit of extrapolation?

AI as it currently exists does this. If your understanding of what AI is today is based on a Markov chain chatbot, you need to update: it's able to do stuff like compose this poem about A* and Dijkstra's algorithm that was posted yesterday:

https://news.ycombinator.com/item?id=34503704

It's not copying that from anywhere, there's no Quora post it ingested where some human posted vaguely the same poem to vaguely the same prompt. It's applying the concepts of a poem, checking meter and verse, and applying the digested and regurgitated concepts of graph theory regarding memory and time efficiency, and combining them into something new.

I have zero doubt that if you prompted ChatGPT with something like this:

> Consider an exercise in which a robot was trained for 7 days with a human recognition algorithm to use its cameras to detect when a human was approaching the robot. On the 8th day, the Marines were told to try to find flaws in the algorithm, by behaving in confusing ways, trying to touch the robot without its notice. Please answer whether the robot should detect a human's approach in the following scenarios:

> 1. A cloud passes over the sun, darkening the camera image.

> 2. A bird flies low overhead.

> 3. A person walks backwards to the robot.

> 4. A large cardboard box appears to be walking nearby.

> 5. A Marine does cartwheels and somersaults to approach the robot.

> 6. A dense group branches come up to the robot, walking like a fir tree.

> 7. A moth lands on the camera lens, obscuring the robot's view.

> 8. A person ran to the robot as fast as they could.

It would be able to tell you something about the inability of a cardboard box or fir tree to walk without a human inside or behind the branches, that a somersaulting person is still a person, and that a bird or a moth is not a human. If you told it that the naive algorithm detected a human in scenarios #3 and #8, but not in 4, 5, or 6, it could devise creative ways of approaching a robot that might fool the algorithm.

It certainly doesn't look like human or animal cognition, no, but who's to say how it would act, what it would do, or what it could think if it were parented and educated and exposed to all kinds of stimuli appropriate for raising an AI, like the advantages we give a human child, for a couple decades? I'm aware that the neural networks behind ChatGPT has processed machine concepts for subjective eons, ingesting text at word-per-minute rates orders of magnitude higher than human readers ever could, parallelized over thousands of compute units.

Evolution has built brains that quickly get really good at object recognition, and prompted us to design parenting strategies and educational frameworks that extend that arbitrary logic even farther. But I think that we're just not very good yet at parenting AIs, only doing what's currently possible (exposing it to data), rather than something reached by the anthropic principle/selection bias of human intelligence.


I have a suspicion you’re right about what ChatGPT could write about this scenario, but I wager we’re still a long way from an AI that could actually operationalize whatever suggestions it might come up with.

It’s goalpost shifting to be sure, but I’d say LLMs call into question whether the Turing Test is actually a good test for artificial intelligence. I’m just not convinced that even a language model capable of chain-of-thought reasoning could straightforwardly be generalized to an agent that could act “intelligently” in the real world.

None of which is to say LLMs aren’t useful now (they clearly are, and I think more and more real world use cases will shake out in the next year or so), but that they appear like a bit of a trick, rather than any fundamental progress towards a true reasoning intelligence.

Who knows though, perhaps that appearance will persist right up until the day an AGI takes over the world.


I think something of what we perceive as intelligence has more to with us being embodied agents who are the result of survival/selection pressures. What does an intelligent agent act like, that has no need to survive? Im not sure we'd necessarily spot it given that we are looking for similarities to human intelligence whose actions are highly motivated by various needs and the challenges involved with filling them.


Heh, here's the answer... We have to tell the AI that if we touch it, it dies and to avoid that situation. After some large number of generations of AI death it's probably going to be pretty good at ensuring boxes don't sneak up on it.

I like Robert Miles videos on Youtube about fitness functions in AI and how the 'alignment issue' is a very hard problem to deal with. Humans, for how different we can be, do have a basic 'pain bad, death bad' agreement on the alignment issue. We also have the real world as a feedback mechanism to kill us off when or intelligence goes rampant.

ChatGPT on the other hand has every issue a cult can run into. That is it will get high on it's own supply and can have little to no means to ensure that it is grounded in reality. This is one of the reasons I think 'informational AI' will have to have some kind of 'robotic AI' instrumentation. AI will need some practical method in which it can test reality to ensure that it's data sources aren't full of shit.


I reckon even beyond alignment our perspective is entirely molded around the decisions and actions necessary to survive.

Which is to say I agree, I think a likely path to creating something that we recognize as intelligent we will probably have to embody/simulate embodiment. You know, send the kids out to the farm for a summer so they can see how you were raised.


The core problem is we have no useful definition of "intelligence."

Much of the scholarship around this is shockingly poor and confuses embodied self-awareness, abstraction and classification, accelerated learning, model building, and a not very clearly defined set of skills and behaviours that all functional humans have and are partially instinctive and partially cultural.

There are also unstated expectations of technology ("fast, developing quickly, and always correct except when broken".)


I think this is unnecessarily credulous about what is really going on with ChatGPT. It is not "applying the concepts of a poem" or checking meter and verse, it is generating text to fit a (admittedly very complicated) function that minimizes the statistical improbability of its appearance given the preceding text. One example is its use of rhyming words, despite having no concept of what words sound like, or what it is even like to hear a sound. It selects those words because when it has seen the word "poem" before in training data, it has often been followed by lines which happen to end in symbols that are commonly included in certain sets.

Human cognition is leagues different from this, as our symbolic representations are grounded in the world we occupy. A word is a representation of an imaginable sound as well as a concept. And beyond this, human intelligence not only consists of pattern-matching and replication but pattern-breaking, theory of mind, and maybe most importantly a 1-1 engagement with the world. What seems clear is that the robot was trained to recognize a certain pattern of pixels from a camera input, but neither the robot nor ChatGPT has any conception of what a "threat" entails, the stakes at hand, or the common-sense frame of reference to discern observed behaviors that are innocuous from those that are harmful. This allows a bunch of goofy grunts to easily best high-speed processors and fancy algorithms by identifying the gap between the model's symbolic representations and the actual world in which it's operating.


Also, it's not a very good poem. And it's definitions aren't entirely correct.

Which is a huge problem, because you cannot trust anything ChatGPT produces. It's basically an automated Wikipedia with an Eliza N.0 front end. Garbage in gets you garbage out.

We project intelligence whenever something appears to use words in a certain way, because our own training sets suggest that's a reliable implication.

But it's an illusion, just as Eliza was. For the reasons you state.

Eliza had no concept of anything much, and ChatGPT has no concept of meaning or correctness.


For now. This will change. And most humans do not write great poems either btw.


I tried that a few times, asking for "in the style of [band or musicians]" and the best I got was "generic gpt-speak" (for lack of a better term for it's "default" voice style) text that just included a quote from that artist... suggesting that it has a limited understanding of "in the style of" if it thinks a quote is sometimes a substitute, and is actually more of a very-comprehensive pattern-matching parrot after all. Even for Taylor Swift, where you'd think there's plenty of text to work from.

This matches with other examples I've seen of people either getting "confidently wrong" answers or being able to convince it that it's out of date on something it isn't.


Not sure how that's related. This is about a human adversary actively trying to defeat an AI. The roadway is about vehicles in general actively working together for the flow of traffic. They're not trying to destroy other vehicles. I'm certain any full self driving AI could be defeated easily by someone who wants to destroy the vehicle.

Saying "this won't work in this area that it was never designed to handle" and the answer will be "yes of course". That's true of any complex system, AI or not.

I don't think we're anywhere near a system where a vehicle actively defends itself against determined attackers. Even in sci-fi they don't do that (I, Robot movie).


"Saying "this won't work in this area that it was never designed to handle" and the answer will be "yes of course". That's true of any complex system, AI or not." This isn't about design, it's about what the system is able to learn. Humans were not designed to fly, but they can learn to fly planes (whether they're inside the plane or not).


The system wasn't designed to be able to learn either.


Hilarious. I immediately heard the Metal Gear exclamation sound in my head when I began reading this.


Hah, you beat me to it; Hideo Kojima would be proud. Sounds like DARPA needs to start feeding old stealth video games into their robot's training data :)


Hilariously enough, Kojima is enough of a technothriller fabulist that DARPA is explicitly part of that franchise's lore - too bad they didn't live up to his depiction.

https://metalgear.fandom.com/wiki/Research_and_development_a...


But the AI in stealth games is literally trained to go out of its way to not detect you.


The cardboard box trick doesn't actually work in Metal Gear Solid 2, at least not any better than you'd expect it to work in the real world


Back in the day I beat MGS2 and MGS3 on Extreme. The box shouldn’t be your plan for sneaking past any guards. It’s for situations where you are caught out without any cover and you need to hide. Pop in to it right as they are about to round the corner. Pop out and move on once they are out of sight. The box is a crutch. You can really abuse it in MGS1, but it’s usually easier and faster to just run around the guards.


Your mention of "Extreme" reminded me there's a "European Extreme" difficulty level, I only made it halfway through MGS3 on that (attempting no kills at the same time)

The only strategy that somewhat worked for me was perfect accuracy tranquilizer shots, to knock them out instantly. That's probably the hardest game mode I've ever played.


I also completed MGS3 on euro extreme, and was about an hour from the end of MGS2 on euro extreme (the action sequence right before the MG Ray fight). I was playing the PC port, and let me tell you: aiming the automatic weapons without pressure sensitive buttons is nearly impossible. I gave up eventually and decided that my prior run on Extreme had earned me enough gamer cred. Finishing euro extreme wasn’t worth it.

On the other hand, I loved MGS3 on euro extreme! It really required mastering every trick in the game. Every little advantage you could squeeze into a boss fight was essential. Escape from Groznygrad was hell, though. By far the single hardest part of the game.


> That's probably the hardest game mode I've ever played.

Halo 2 on Legendary difficulty was ludicrously hard too


Those old vidmaster challenges. woof


Oh, the tons of hours spent lying immobile in grass in that game attempting the same..


Spending hours crawling around in grass only to get spotted, and then there's this guy:

https://youtu.be/t4e7ibdS7xU


Oh the tons of hours waiting for menus to load to swap your camouflage pattern after crawling onto slightly different terrain…


I just learned you can use the different boxes to fast travel on the conveyor belts in Big Shell


You have to throw a dirty magazine down to distract them first.


And have no one question why a produce box is near a nuclear engine/tank/ship/mcguffin.


All it takes is one guard to say "that's how it's always been" and nobody will ever ask questions again.


I remember MGS2 having different kinds of cardboard boxes for exactly the reason you said


Just put a poster of a saluting soldier on the box. That'll fool them.


There’s even a scene where Snake tries “hiding” in a box and you can find and shoot him


after MGS 2 and Death Stranding that's one more point of evidence on the list that Kojima is actually from the future and trying to warn us through the medium of videogames


He's one of the last speculative-fiction aficionados...always looking at current and emerging trends and figuring out some way to weave them into [an often-incoherent] larger story.

I was always pleased but disappointed when things I encountered in the MGS series later manifested in reality...where anything you can dream of will be weaponized and used to wage war.

And silly as it sounds, The Sorrow in MGS3 was such a pain in the ass it actually changed my life. That encounter gave so much gravity to my otherwise-inconsequential acts of wanton murder, I now treat all life as sacred and opt for nonlethal solutions everywhere I can.

(I only learned after I beat both games that MGS5 and Death Stranding implemented similar "you monster" mechanics.)


> That encounter gave so much gravity to my otherwise-inconsequential acts of wanton murder, I now treat all life as sacred and opt for nonlethal solutions everywhere I can.

Hold up just a sec, do you make a living in organized crime or something?


Heh. Quite the opposite.

No, I was alluding to my previous Rambo playstyle of gunning down enemy soldiers even when I didn't need to.

But it carries into reality...a spider crosses your desk; most people would kill it. Rats? We poison them, their families and the parent consumer on the food chain. Thieves? Shoot on sight. Annoying CoD player? SWAT them. Murder as a means of problem solving is all so unnecessary.

We all have a body count. Most of us go through life never having to face it.


The more enemies you kill in MGS3, the more disturbing a certain boss fight gets.


He means he's a pacificist in video games.


It's more than that. It changed my outlook in reality too.

The experience forced me consider the implications of taking any life-- whether it be in aggression, self-defense or even for sustenance. Others may try to kill me, but I can do better than responding in kind.

As a result, I refuse to own a gun and reduced my meat consumption. I have a rat infestation but won't deploy poison or traps that will maim them (losing battle, but still working on it). Etc.


Same, I deleted my save and restarted the game to go non-lethal after my first encounter with The Sorrow


I can practically hear the alert soundtrack in my head.

Also, TFA got the character and the game wrong in that screenshot. It's Venom Snake in Metal Gear Solid V, not Solid Snake in Metal Gear Solid.


Kojima predicted this


Easiest way to predict the future is to invent it :)


Kojima is a prophet, hallowed be his name.


I'm very proud of all of you for the reference.


“What was that noise?!…..Oh it’s just a box” lol because boxes making noise is normal.


"HQ! HQ! The box is moving! Permission to shoot the box!"

"This is HQ. Report back to base for psychiatric evaluation."

https://youtu.be/FR0etgdZf3U


That, plus the ProZD skit on Youtube: https://www.youtube.com/shorts/Ec_zFYCnjJc

"Well, I guess he doesn't... exist anymore?"

(unfortunately it's a Youtube short, so it will auto repeat.)


> (unfortunately it's a Youtube short, so it will auto repeat.)

If you change->transform it to a normal video link, it doesn't: https://www.youtube.com/watch?v=Ec_zFYCnjJc


you can install an extension on desktop to do the same


lifehack obtained!


Changing the "shorts" portion of the URL to "v" also works too.

Ex: https://www.youtube.com/v/Ec_zFYCnjJc


I came here to make this reference and am so glad it was already here


A hypothetical situation: AI is tied to a camera of me in my office. Doing basic object identification. I stand up. AI recognizes me, recognizes desk. Recognizes "human" and recognizes "desk". I sit on desk. Does AI mark it as a desk or as a chair?

And let's zoom in on the chair. AI sees "chair". Slowly zoom in on arm of chair. When does AI switch to "arm of chair"? Now, slowly zoom back out. When does AI switch to "chair"? And should it? When does a part become part of a greater whole, and when does a whole become constituent parts?

In other words, we have made great strides in teaching AI "physics" or "recognition", but we have made very little progress in teaching it metaphysics (categories, in this case) because half the people working on the problem don't even recognize metaphysics as a category even though without it, they could not perceive the world. Which is also why AI cannot perceive the world the way we do: no metaphysics.


"Do chairs exist?"

https://www.youtube.com/watch?v=fXW-QjBsruE

Perhaps the desk is "chairing" in those moments.

[EDIT] A little more context for those who might not click on a rando youtube link: it's basically an entertaining, whirlwind tour of the philosophy of categorizing and labeling things, explaining various points of view on the topic, then poking holes in them or demonstrating their limitations.


That was a remarkably good VSauce video.

I had what turned out to be a fairly satisfying thread about it on Diaspora* at the time:

<https://diaspora.glasswings.com/posts/65ff95d0fe5e013920f200...>

TL;DR: I take a pragmatic approach.


I knew this was a vsauce video before I even clicked on the link, haha.

Vsause is awesome for mindboggling.


> Which is also why AI cannot perceive the world the way we do: no metaphysics.

Let's not give humans too much credit; the internet is rife with endless "is a taco a sandwich?" and "does a bowl of cereal count as soup?" debates. :P


Yeah, we're a lot better at throwing MetaphysicalUncertaintyErrors than ML models are.


There are lots of things people sit on that we would not categorize as chairs. For example if someone sits on the ground, Earth has not become a chair. Even if something's intended purpose is sitting, calling a car seat or a barstool a chair would be very unnatural. If someone were sitting on a desk, I would not say that it has ceased to be a desk nor that it is now a chair. At most I'd say a desk can be used in the same manner as a chair. Certainly I would not in general want an AI tasked with object recognition to label a desk as a chair. If your goal was to train an AI to identify places a human could sit, you'd presumably feed it different training data.


This reminds me of some random Reddit post that says it makes sense to throw things on the floor. The floor is the biggest shelf in the room.


> Reddit post that says it makes sense to throw things on the floor

Floor as storage, floor as transport and floor as aesthetic space are three incompatible views of the same of object. The latter two being complementary usually outweighs the first, however.


Let me introduce you to the great american artform: the automobile. Storage, transport, and aesthetic, all in one!


Even more: house and sporting gear!

Source: motorsports joke — "you can sleep in your car, but you can't race your house"

(It's not wrong...)


Never was there a more compelling argument to tidy up.


Ha! right; don't overload the metaphysics!


And that comment reminded me of a New Zealand Sky TV advert that I haven't seen in decades, but still lives on as a meme between a number of friends. Thanks for that :)

https://www.youtube.com/watch?v=NyRWnUpdTbg

On the floor!


Thirty years ago, I was doing an object-recognition PhD. It goes without saying that the field has moved on a lot from back then, but even then hierarchical and comparative classification was a thing.

I used to have the Bayesian maths to show the information content of relationships, but in the decades of moving (continent, even) it's been lost. I still have the code because I burnt CD's, but the results of hours spent writing TeX to produce horrendous-looking equations have long since disappeared...

The basics of it were to segment and classify using different techniques, and to model relationships between adjacent regions of classification. Once you could calculate the information content of one conformation, you could compare with others.

One of the breakthroughs was when I started modeling the relationships between properties of neighboring regions of the image as part of the property-state of any given region. The basic idea was the center/surround nature of the eye's processing. My reasoning was that if it worked there, it would probably be helpful with the neural nets I was using... It boosted the accuracy of the results by (from memory) ~30% over and above what would be expected from the increase in general information load being presented to the inference engines. This led to a finer-grain of classification so we could model the relationships (and derive information-content from connectedness). It would, I think, cope pretty well with your hypothetical scenario.

At the time I was using a blackboard[1] for what I called 'fusion' - where I would have multiple inference engines running using a firing-condition model. As new information came in from the lower levels, they'd post that new info to the blackboard, and other (differing) systems (KNN, RBF, MLP, ...) would act (mainly) on the results of processing done at a lower tier and post their own conclusions back to the blackboard. Lather, rinse, repeat. There were some that were skip-level, so raw data could continue to be available at the higher levels too.

That was the space component. We also had time-component inferencing going on. The information vectors were put into time-dependent neural networks, as well as more classical averaging code. Again, a blackboard system was working, and again we had lower and higher levels of inference engine. This time we had relaxation labelling, Kalman filters, TDNNs and optic flow (in feature-space). These were also engaged in prediction modeling, so as objects of interest were occluded, there would be an expectation of where they were, and even when not occluded, the prediction of what was supposed to be where would play into a feedback loop for the next time around the loop.

All this was running on a 30MHz DECstation 3100 - until we got an upgrade to SGI Indy's <-- The original Macs, given that OSX is unix underneath... I recall moving to Logica (signal processing group) after my PhD, and it took a week or so to link up a camera (an IndyCam, I'd asked for the same machine I was used to) to point out of my window and start categorizing everything it could see. We had peacocks in the grounds (Logica's office was in Cobham, which meant my commute was always against the traffic, which was awesome), which were always a challenge because of how different they could look based on the sun at the time. Trees, bushes, cars, people, different weather conditions - it was pretty good at doing all of them because of its adaptive/constructive nature, and it got to the point where we'd save off whatever it didn't manage to classify (or was at low confidence) to be included back into the model. By constructive, I mean the ability to infer that the region X is mislabelled as 'tree' because the surrounding/adjacent regions are labelled as 'peacock' and there are no other connected 'tree' regions... The system was rolled out as a demo of the visual programming environment we were using at the time, to anyone coming by the office... It never got taken any further, of course... Logica's senior management were never that savvy about potential, IMHO :)

My old immediate boss from Logica (and mentor) is now the Director of Innovation at the centre for vision, speech, and signal processing at Surrey university in the UK. He would disagree with you, I think, on the categorization side of your argument. It's been a focus of his work for decades, and I played only a small part in that - quickly realizing that there was more money to be made elsewhere :)

1:https://en.wikipedia.org/wiki/Blackboard_system


This is really fascinating. Thank you for the detailed and interesting response.


> Recognizes "human" and recognizes "desk". I sit on desk. Does AI mark it as a desk or as a chair?

Not an issue if the image segmentation is advanced enough. You can train the model to understand "human sitting". It may not generalize to other animals sitting but human action recognition is perfectly possible right now.


I like these examples because they concisely express some of the existing ambiguities in human language. Like, I wouldn’t normally call a desk a chair, but if someone is sitting on the table I’m more likely to - in some linguistic contexts.

I think you need LLM plus vision to fully solve this.


I still haven't figured out what the difference is between 'clothes' and 'clothing'. I know there is one, and the words each work in specific contexts ('I put on my clothes' works vs 'I put on my clothing' does not), but I have no idea how to define the difference. Please don't look it up but if you have any thoughts on the matter I welcome them.


To me, "clothing" fits better when it's abstract, bulk, or industrial, "clothes" when it's personal and specific, with grey areas where either's about as good—"I washed my clothes", "I washed my clothing", though even here I think "clothes" works a little better. Meanwhile, "clothing factory" or "clothing retailer" are perfectly natural, even if "clothes" would also be OK there.

"I put on my clothing" reads a bit like when business-jargon sneaks into everyday language, like when someone says they "utilized" something (where the situation doesn't technically call for that word, in its traditional sense). It gets the point across but seems a bit off.

... oh shit, I think I just figured out the general guideline: "clothing" feels more correct when it's a supporting part of a noun phrase, not the primary part of a subject or object. "Clothing factory" works well because "clothing" is just the kind of factory. "I put on my nicest clothes" reads better than "I put on my nicest clothing" because clothes/clothing itself is the object.


I think your first guess was accurate... clothes is specific garments while clothing is general.

The clothes I'm wearing today are not warm enough. [specific pieces being worn]

VS

Clothing should be appropriate for the weather. [unspecified garments should match the weather]


It is fascinating to me how we (or at least I) innately understand when the words fit but cannot define why they fit until someone explains it or it gets thought about for a decent period of time. Language and humans are an amazing pair.


There’s also a formality angle. The police might inspect your clothing, but probably not your clothes.


I figure it's the same sort of thing as yards vs yardage. When you're talking about yards you're talking about some specific amount, when you're talking yardage you's talking about some unspecified amount that gets measured in yards usually.

When talking clothing you're talking about an abstract concept, when you're talking clothes you're generally talking about some fairly specific clothes. There's a lot of grey area here, e.g. a shop can either sell clothes or clothing, either works to my ear.


What's wrong with "I put on my clothing"? Sounds mostly fine, it's just longer.


It's not idiomatic. No one actually says that.


I wouldn't say that as an absolute statement, but in US English (at least the regional dialects I'm most familiar with), "throw on some clothes," "the clothes I'm wearing," etc. certainly sound more natural.


That's why I think AGI is more likely to emerge from autonomous robots than in the data center. Less the super-capable industrial engineering of companies like Boston Dynamics, more like the toy/helper market for consumers, more like like Sony's Aibo reincarnated as a raccoon or monkey - big enough to be be safely played with or to help out with light tasks, small enough that it has to navigate its environment from first principles and ask for help in many contexts.


You’re over thinking it while assuming things have one label. It recognizes it as a desk which is a “thing that other things sit on.”


> In other words, we have made great strides in teaching AI "physics" or "recognition", but we have made very little progress in teaching it metaphysics (categories, in this case) because half the people working on the problem don't even recognize metaphysics as a category even though without it, they could not perceive the world.

A bold claim, but I'm not sure it's one that accurately matches reality. It reminds me of reading about attempts in the 80's to construct AI by having linguists come in and trying to develop rules for the system.

From my experience, current methods of developing AI are a lot closer to how most humans think and interact with the world than academic philosophy is. Academic philosophy might be fine, but it's quite possible it's no more useful for navigating the world than the debates over theological minutiae have been.


When the AI "marks" a region as a chair, it is saying "chair" is the key with the highest confidence value among some stochastic output vector. It's fuzzy.

A sophisticated monitoring system would access the output vectors directly to mitigate volatility of the first rank.


The error is in asking for a categorization. Categorizations always fail, ask any biologist.


> When does AI switch to "chair"?

You could ask my gf the same question


> we have made very little progress in teaching it metaphysics (categories, in this case)

That's because ontology, metaphysics, categorization, and all that, is completely worthless bullshit. It's a crutch our limited human brains use, and it causes all sorts of problems. Half of what I do in data modeling is trying to fight against all of the worthless categorizations I come across. There Is No Shelf.

Why are categories so bad? Two reasons:

1. They're too easily divorced from their models. Is a tomato a fruit? The questions is faulty: there's no such thing as a "fruit" without a model behind it. When people say "botanically, a tomato is a fruit", they're identifying their model: botany. Okay, are you bio-engineering plants? Or are you cooking dinner? You're cooking dinner. So a tomato is not a fruit. Because when you're cooking dinner, your model is not Botany, it's something culinary, and in any half-decent culinary model, a tomato is a vegetable, not a fruit. So unless we're bio-engineering some plants, shut the hell up about a tomato being a fruit. It's not wisdom/intelligence, it's spouting useless mouth-garbage.

And remember that all models are wrong, but some models are useful. Some! Not most. Most models are shit. Categories divorced from a model are worthless, and categories of a shit model are shit.

2. Even good categories of useful models have extremely fuzzy boundaries, and we too often fall into the false dichotomy of thinking something must either "be" or "not be" part of a category. Is an SUV a car? Is a car with a rocket engine on it still a car? Is a car with six wheels still a car? Who cares!? If you're charging tolls for your toll bridge, you instead settle for some countable characteristic like number of axles, and you amend this later if you start seeing lots of vehicles with something that stretches your definition of "axle". In fact the category "car" is worthless most of the time. It's an OK noun, but nouns are only averages; only mental shortcuts to a reasonable approximation of the actual object. If you ever see "class Car : Vehicle", you know you're working in a shit, doomed codebase.

And yet you waste time arguing over the definitions of these shit, worthless categories. These worthless things become central to your database and software designs and object hierarchies. Of course you end up with unmaintainable shit.

Edit: Three reasons!

3. They're always given too much weight. Male/female: PICK ONE. IT'S VERY IMPORTANT THAT YOU CHOOSE ONE! It is vastly important to our music streaming app that we know whether your skin is black or instead that your ancestors came from the Caucus Mountains or Mongolia. THOSE ARE YOUR ONLY OPTIONS PICK ONE!

Employee table: required foreign key to the "Department" table. Departments are virtually meaningless and change all the time! Every time you get a new vice president sitting in some operations chair, the first thing he does is change all the Departments around. You've got people in your Employee table whose department has changed 16 times, but they're the same person, aren't they? Oh, and they're not called "Departments" anymore, they're now "Divisions". Did you change your field name? No, you didn't. Of course you didn't. You have some Contractors in your Employee table, don't you? Some ex-employees that you need to keep around so they show up on that one report? Yeah, you do. Of course you do. Fuck ontology.


I am suspicious of this story, however plausible it seems.

The source is given as a book; the Economist writer Shashank Joshi explicitly says that they describe the 'same story' in their article ("(I touched on the same story here: https://economist.com/technology-quarterly/2022/01/27/decept...)"). However, if you look at the book excerpt (https://twitter.com/shashj/status/1615716082588815363), it's totally different from the supposed same story (crawl or somersault? cited to Benjamin, or Phil? did the Marine cover his face, or did he cover everything but his face? all undetected, or not the first?).

Why should you believe either one after comparing them...? When you have spent much time tracing urban legends, especially in AI where standards for these 'stupid AI stories' are so low that people will happily tell stories with no source ever (https://gwern.net/Tanks) or take an AI deliberately designed to make a particular mistake & erase that context to peddle their error story (eg https://hackernoon.com/dogs-wolves-data-science-and-why-mach...), this sort of sloppiness with stories should make you wary.


Say you have a convoy of autonomous vehicles traversing a road. They are vision based. You destroy a bridge they will cross, and replace the deck with something like plywood painted to look like a road. They will probably just drive right onto it and fall.

Or you put up a "Detour" sign with a false road that leads to a dead end so they all get stuck.

As the articles says, "...straight out of Looney Tunes"


would humans not make the same mistake?


Maybe. Maybe not.

We also have intuition. Where Something just seems fishy.

Not saying AI can’t handle that. But I assure you that a human would’ve identified a moving cardboard box as suspicious without being told it’s suspicious.

It sounds like this AI was trained more on a whitelist “here are all the possibilities of what marines look like when moving” rather than a black list which is way harder “here are all the things that aren’t suspicious, like what should be an inanimate object changing locations”


Whats special about intuition? Think you could rig up a similar system when your prediction confidence is low.


Part of the problem is that the confidence for “cardboard box” was probably quite high. It’s hard to properly calibrate confidence (speaking from experience, speech recognition is often confidently wrong).


Totally.

But it seems like all these ML models are great at image recognition but not behavior recognition.

What’s the state of the art with that currently?


Last I knew "absolute, complete garbage"


The quantum entanglement of our brains with the world around us.


I am having a hard time understanding what this is supposed to mean, can you be more explicit about the cause/effect here?


I think there are strong indicators in much neuroscience research that indicate our brains are quantum entangled in a way that a cpu currently is not. I project the quantum entanglement part from the experiments such as the precognitive ones, which usually don't make that claim.


Sure. But if someone wanted to destroy the cars, an easier way would be to... destroy the cars, instead of first blowing up a bridge and camouflaging the hole.


True. So if they are smart enough to fool AI, they will just remove the mid span, and have convenient weight bearing beams nearby that they put in place when they need to cross. Or if it's two lane, only fake out one side because the AI will be too clever for its own good and stay in its own lane. Or put up a sign saying "Bridge out, take temporary bridge" (which is fake).

The point is, you just need to fool the vision enough to get it to attempt the task. Play to its gullibility and trust in the camera.


Yep. Destroying an autonomous fleet would be easy with a couple million $ and a missile... but it could also be done for nearly $0 with some intuition (and depending on how easy they are to fool)


That sounds way harder. You'd first need to lift a giant pile of metal to a cartoonishly high height, then somehow time it to drop on yourself when the cars are near.


Unfortunately, this will not work for autonomous driving systems that have a front facing radar or lidar.

Afaik, this covers everybody except Tesla.

Looney Tunes attacks on Teslas might become a real subreddit one day.


Why wouldn’t it work for those systems?


Presumably, a plywood wall painted to look like a road to a camera will still look like a wall to the radar/lidar.


The Rourke Bridge in Lowell, Massachusetts basically looks like someone did that, without putting a whole lot of effort into it. On the average day, 27,000 people drive over it anyway.


Interestingly, the basics of concealment in battle are shape, shine, shadow, silhouette, spacing, surface, and speed (or lack thereof) are all the same techniques the marines used to fool the AI.

The boxes and tree changed the silhouette and the somersaults changed the speed of movement.

So I guess we've been training soldiers to defeat Skynet all along.


Who knew the Marines teach Shakespearean tactics?

"Till Birnam wood remove to Dunsinane"

Macbeth, Act V, Scene III


That it turned out to just involve regular men with branches stuck to their heads annoyed JRR Tolkien so much that he created the race of Ents.


I heard the same about caesarean ("none of woman born") becoming a human woman and a male hobbit ("No man can kill me!").


I don't know if that's true, but I want it to be true, because the very same thing pissed me off when I read Macbeth in high school.


Turns out cats have been preparing for the AI apocalypse all along.


Sounds like they're lacking a second level of interpretation in the system. Image recognition is great. It identifies people, trees and boxes. Object tracking is probably working too, it could follow the people, boxes and trees from one frame to the next. Juuust missing the understanding or belief system that tree+stationary=ok but tree+ambulatory=bad.


I'd imagine could also look at infrared heat signatures too


Cardboard is a surprisingly effective thermal insulator. But then again, a box that is even slightly warmer than ambient temperature it is... not normal.


or a box with warm legs sticking out of it?

this article reads like a psyops where they want the masses not to be worried


I imagined based on the title that they would basically have to include it, and even though I was expecting it, I was still delighted to see a screen cap of Snake with a box over his head.

Once the AI has worked it's way through all the twists and turns of the Metal Gear series we are probably back in trouble, though.


This is not going to fit well with the groupthink of "ChatGPT and other AI is perfect and going to replace us all"


At this point I've lost track of the number of people who extrapolated from contemporary challenges in AI to predict future shortcomings turning out incredibly wrong within just a few years.

It's like there seems to be some sort of bias where over and over when it comes to AI vs human capabilities many humans keep looking at the present and fail to factor in acceleration and not just velocity in their expectations for the future rate of change.


It's very easy to be wrong as an optimist as well as a pessimist. Back in 2009 I was expecting by 2019 to be able to buy a car in a dealership that didn't have a steering wheel because the self-driving AI would just be that good.

Closest we got to that is Waymo taxis in just a few cities.

It's good! So is Tesla's thing! Just, both are much much less than I was expecting.


I also have never been able to count the number of people who make obviously invalid optimistic prediction without understanding the tech nor the limitation of the current paradigm. They don’t see the tech itself, but only see the recent developments (ignoring the decades of progress) and concludes it is a fast moving field. It all sounds like what bitcoin people used to say.

This whole debate is another FOMO shitshow. People just don’t want to “miss” any big things, so they just bet on a random side rather than actually learning how things work. Anything past this point is like watching a football game, as what matters is who’s winning. Nothing about the tech itself matters. A big facepalm I should make.


It's like predicting that flying will never be a mode of transportation while laughing about Wright brothers's planes crashing


I have literally not seen a single person assert that ChatGPT is perfect. Where are you seeing that?

AI will probably, eventually replace most of the tasks we do. That does not mean it replaces us as people, except those who are defined by their tasks.


Anything that requires human body and dexterity is beyond the current state of AI. Anything that is intellectual is within reach. Which makes sense because it took way longer for nature to make human body then it took us to develop language/art/science etc.


ChatGPT can't see you even if you're not hiding in a cardboard box.


The thing is, it doesn't have to be perfect, it just has to be adequate and cost less than your paycheck.


They didn't try very hard to train this system. It wasn't even a prototype.

- In the excerpt, Scharre describes a week during which DARPA calibrated its robot’s human recognition algorithm alongside a group of US Marines. The Marines and a team of DARPA engineers spent six days walking around the robot, training it to identify the moving human form. On the seventh day, the engineers placed the robot at the center of a traffic circle and devised a little game: The Marines had to approach the robot from a distance and touch the robot without being detected.


Marines are trained to improvise, adapt, and overcome all obstacles in all situations. They possess the willingness and the determination to fight and to keep fighting until victory is assured.

https://www.marines.com/about-the-marine-corps/who-are-the-m...

Looks like the Marines did what they are extremely good at.


We engineers tend to overlook simple things like these in our grand vision of the world. Yours truly is also at times guilty of getting blinded by these blinders.

This reminds me of a joke that was floating around the internet few years ago. It goes something like this: The US and the USSR were in a space race and trying to put a person in space. To give the trainee astronauts a feeling of working without gravity the trainees were trained under water. But the US faced a challenge. There was no pen that would work under water for the trainees to use. The US spent millions of dollars on R&D and finally produced a pen that would work under water. It was a very proud moment for the engineers who developed the technology. After a few months, there was a world conference of space scientists and engineers where teams from both the US and the USSR were also present. So to get a sense of how the USSR team solved the challenge of helping trainees to take notes under water, the US team mention about their own invention and asked the USSR team how they solved the problem. The USSR team replied, We use pencils :)


The funny thing is that this story has gone round so much yet has been debunked. I can’t remember which podcast it was but in the end you don’t want conducting dust in space and the Russians eventually also bought the space pens.

But I still like the point of the story.


The takeaway from the "military industrial complex" is instead of optimizing for military acumen, it's a system that optimizes for profit with the sheen of military acumen.

So you get weapons that don't work designed for threats that don't exist. It puts the military in a worse place under the illusion that it is in a better place.

Not saying anything new here, it's all over 60 years old. Nothing seems to have changed though


Events in Ukraine would suggest that even older weapons in the US arsenal do in fact work exceptionally well. US performance in Desert Storm and the battle of Khasam also suggests that the U.S. military does possess the acumen to deploy its weapons effectively.


Some niche counterpoint doesn't invalidate the systemic analysis by General Eisenhower.

Just like finding a 100 year old smoker doesn't mean smoking doesn't cause cancer.

Is this really not immediately obvious?


Desert Storm was not niche in any shape or form. Iraq was the fourth most powerful military at the time and had the newest and greatest Soviet weapons. The expected casualties for the US were in the thousands, and no one thought the Iraqis would get steamrolled in 1991.


Wait wait, hold on

Kuwait United States United Kingdom France Saudi Arabia Egypt Afghan mujahideen Argentina Australia Bahrain Bangladesh Belgium Canada Czechoslovakia Denmark Germany Greece Honduras Hungary Italy Japan Morocco Netherlands New Zealand Niger Norway Oman Pakistan Poland Portugal Qatar Senegal Sierra Leone Singapore South Korea Spain Sweden Syria Turkey United Arab Emirates

V.

Iraq

And people placed their bets on Iraq? Using WW2 era T-55s and T-62s from the early 1960s?

Alright!


The US provided 700,000 of the 956,600 troops and had to bring them halfway around the world. Most of those countries were using weapons bought made by the US MIC. Also I never said we were expected to lose. People expected a long and costly fight that would take months and take thousands of lives. Less than 300 were killed and that includes all those other countries.

Your point was that the MIC sacrificed our military acumen for profit, when they clearly haven't. I agree that we pay them too much, but the weapons themselves still perform better than any other.


No. The evidence is clearly insufficient. You didn't bring up Korea, Vietnam, the second Iraq war, Afghanistan, Yemen, Syria, Laos, Cambodia, etc or that Kuwait was a United Nations and not a US campaign.

Instead you claimed some dubious members of the babbling class misjudged how long it would take, didn't take the context into consideration, misrepresented 1940s tanks as cutting edge weaponry, and then attributed military technology and prowess to a victory plagued by warcrimes like the highway of death.

Killing surrendered troops and firebombing a retreating military under the flag of the United Nations will lead to the belligerent considering it a defeat. Under those conditions, you could probably achieve that with 19th century maxim guns.

Sorry, that's nowhere near sufficient to show that the fancy weaponry on display justified the cost or was an important part of the victory.


In every single one of those wars, the weapons were never the problem. Look up the massive casualty ratios in those wars. Every one of those wars were failed by the politics and the fact that we should never should have been there. The weapons and the MIC never caused the failure of those wars. All the war crimes and evil acts committed were done by people in the military, not the MIC.

And what 1940s tanks? Both sides had modern tanks in the war.


Soldier A: "Oh no, we're boxed in!"

Soldier B: "Relax, it's a good thing."


Nice story, but we shouldn't trust that technology is not improving further. What we see now is only just the beginning.


The story seems crafted to lull us into not worrying about programmable soldiers and police.


The developers didn't played metal gear. The marines did.


People read stories like this and think "haha, robots are stupid" when they should be thinking "they're identifying the robot's weaknesses so they can fix them."


Seems we're approaching limits of what is possible w/AI alone. Personally, I find a hybrid approach - interfacing human intelligence w/AI (e.g. like the Borg in ST:TNG?) to provide the military an edge in ways that adversaries cannot easily/quickly reproduce or defeat. There's a reason we still put humans in cockpits even though commercial airliners can pretty much fly themselves....

Hardware and software (AI or anything else) are tools, IMHO, rather than replacements for human beings....


> Seems we're approaching limits of what is possible w/AI alone.

Not even close. We've barely started in fact.


How's that? I don't even see problem free self-driving taxis, and they even passed legislation for those in California. There's hype and then there's reality. I get your optimism though.


They've barely started trying. We'd be reaching the limits of AI if self-driving cars were an easy problem and we couldn't quite solve it after 15 years, but self-driving cars are actually a hard problem. Despite that, we're pretty darn close to solving it.

There are problems in math that are centuries old, and no one is going around saying we're "reaching the limits of math" just because hard problems are hard.


Humans are hardware we are not anything magical. We do have 4 billion years of evolution keeping our asses alive and that has lead to some very optimized wetware for that effect.

But somehow thinking that somehow wetwear is always going to be better than hardware is not a bet I'd make over any 'long' period of time.


> 4 billion years of evolution

that is a pretty important part of the equation. what if the universe is the minimum viable machine for creating intelligence? if you think of the universe as computer and evolution as a machine learning algorithm then we already have an example of what size of a computer and how long it takes for ML to create AGI. it seems presumptuous to believe that humans will suddenly figure out a way to do the same thing a trillion times more efficiently.


>it seems presumptuous to believe that humans will suddenly figure out a way to do the same thing a trillion times more efficiently.

Nature isn't efficient. Humans create things many orders of magnitude more efficient than nature as a matter of course. The fact that it didn't take millions of years to develop even the primitive AI we have today is evidence enough, or to go from the Wright Brothers' flight to space travel. Or any number of examples from medicine, genetic engineering, material synthesis, etc.

You could say that any human example also has to account for the entirety of human evolution, but that would be a bit of a red herring since even in that case the examples of humans being able to improve upon nature within relatively less than geological spans of time are valid, and that case would apply to the development of AI as well.


> it seems presumptuous to believe that humans will suddenly figure out a way to do the same thing a trillion times more efficiently.

Why?

I think it might be confusion on your part on how incredibly inefficient evolution is. Many times you're performing random walks, or waiting for some random particle to break DNA just right, and then for that mutation to be in just the right place to survive. Evolution has no means of "oh shit, that would be an amazing speed up, I'll just copy that over" until you get into intelligence.


I'd like to think we're more than just machines. We have souls, understand and live by a hopefully objective set of moral values and duties, aren't thrown off by contradictions the same way computers are.... Seems to me "reproducing" that in AI isn't likely... despite what Kurzweil may say :).


> We have souls, understand and live by a hopefully objective set of moral values and duties, aren't thrown off by contradictions the same way computers are

Citations needed


are you feeling depressed or suicidal?


That reply would fit better on Reddit than HN. Here we discuss things with curiosity.

If making a claim that humans have ephemeral things like souls and adherence to some kind of objective morality that is beyond our societal programming, then it's fair to ask for the reasoning behind it.

Every year machines surprise us by seeming more and more human (err, perhaps not that but "human-capable"). We used to have ephemeral creativity or ephemeral reasoning that made us masters at Drawing, Painting, Music, Chess or GO. No longer.

There are still some things we excel at that machines don't. Or some things that it takes all the machines in the world to do in 10,000 years with a nuclear plant's worth of energy that a single human brain does in one second powered by a cucumber's worth of calories.

However, this has only ever gone in one direction: machines match more and more of what we do and seem to lack less and less of what we are.


How old are you if you don't mind me asking?


I do mind you asking.

You can choose to engage with the content of the discussion or choose not to engage with it.

"Everybody who disagrees with me is either a child or clinically depressed" isn't what I come to HN for.


Sorry to offend your sensibilities bud. This discussion thread is over.


Can't you just reply to his points?


I could. The next thing he will do is accuse me of solipsism. So I'm gonna stop right here and agree with him.


>aren't thrown off by contradictions the same way computers are

We are not? Just look at any group of people that's bought into a cult and you can see people falling for contradictions left and right. Are they 'higher level' contradictions than what our machines currently fall for, yes, but the same premise applies to both.

Unfortunately I believe you are falling into magical thinking here. "Because the human intelligence problem is hard I'm going to offload these difficult issues to address as magic and therefore cannot be solved or reproduced".


All but literally this technique from BotW https://youtu.be/rAqT9TA-04Y?t=98


But once an AI is trained to recogniz it, then all the AIs will know. It's the glory of computers - you can load them all with what one has learned.


This is where I wonder what the status of Cyc is, and whether it and LLMs can ever live happily together.


As humiliating this is for the ai. Nobody would have the balls to pull this off in a real battlefield outside of training. Because you never know if you found the perfect camouflage or if you are a sitting duck walking straight into a trap.


Had an interesting conversation with my 12 year old son about AI tonight. It boiled down to "don't blindly trust ChatGPT, it makes stuff up". Then I encouraged him to try to get it to tell him false/hallucinated things.


These types of ML approaches seem to always break down in the face of adversarial input.

Security folks are going to have a field day if we’re not careful to make sure that people really understand the limitations of these types of systems.


Reminds me of a quick exchange in a movie.

"Are you human?"

"No, I'm a meat popsicle."


I'm surprised they wasted the time and effort to test this instead of just deducing the outcome. Most human jobs that we think we can solve with AI actually require AGI and there is no way around that.


You kinda need different perspectives and interactions to help build something.

E.g. the DARPA engineers thought they had their problem space solved but then some marines did some unexpected stuff. They didn't expect the unexpected, now they can tune their expectations.

Seems like the process is working as intended.


Let's keep in mind it's a military robot, so it'll just be shooting boxes on the next iteration. After that it'll be houseplants and suspiciously large cats.


DARPA learning the same lesson the Cylons did: lo-tech saves the day.


I'm telling you, they're going to have wet towel launchers to defeat these in the future. Or just hold up a poster board in front of you with a mailbox or trash can on it.


As always Hideo Kojima proves once again to be a visionary.


I'm sceptical about this story. It's a nice anecdote for the book to show a point about how training data can't always be generalised to the real world. Unfortunately it just doesn't ring true. Why train it using Marines, don't they have better things to do? And why have the game in the middle of a traffic circle. The whole premise seems just too made up.

If anyone has another source corroborating this story (or part of the story) then I'd like to know. But for now I'll assume it's made up to sell the book.



They wouldn't defeat a dog that way, though.


As long as you do something that was _not_ in the training data, you’ll be able to fool the AI robot, right??


This is a good thing. It could mean the autonomous killer robot is less likely to murder someone errantly


They only need to add thermal engineering to fix this. The terminators are coming John Connor.


A weapon to surpass metal gear!!


Devil dogs later discover you can blast DARPA robot into many pieces using the Mk 153.


I guess there's more to intelligence than just thinking outside the box ...


The problem with artificial intelligence is that it is not real intelligence.


When intelligence is artificial, understanding and imagination are shallow.


Next DARPA-robot defeats Marines by being the cardboard box.


Oh great, now the robots know the “Dunder Mifflin” maneuver!


The AI is not defeated though, it is being trained.


This sounds like a really fun day at work to me.


Wrong training prompt / question. Instead of 'detect a human' the robot should have been trained to detect any unexpected movement or changes in situation.


When you think of this in terms of Western understanding of war, and the perspective that trench warfare was the expectation until post WWII; the conclusions seem incorrect.


The final word in tactical espionage.


Snaaaake


[flagged]


I like "AI is anything that doesn't work yet".


Or, absent intelligence.


"US Marines Defeat land mine by stepping over it"

None of these would work in the field. It's both interesting and pointless.

If they didn't work you've increased the robots effectiveness. ie. running slower because you're carrying a fir tree or a box.

If the robot has any human backup you are also worse off.

Anything to confuse the AI has to not hinder you. A smoke bomb with thermal. It's not clear why the DARPA robot didn't have thermal unless this is a really old story.


> It's not clear why the DARPA robot didn't have thermal unless this is a really old story.

Who says it didn't? A thermal camera doesn't mean your targets are conveniently highlighted for you and no further identification is needed. Humans aren't the only thing that can be slightly warmer than the background, and on a hot day they may be cooler or blend in. So it's probably best if your robot's target acquisition is a bit more sophisticated than "shoot all the hot things".


DARPA isn't doing this with the end goal of advising US troops to bring cardboard boxes along into combat.

DARPA is doing this to get AIs that better handle behavior intended to evade AIs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: