I don't know either but I do know that this collision has already got infinity times more press coverage and police attention than that time a wrong-way driver hit me on Howard and fled the scene. It's a problem with the whole AV discourse.
I can name two times in the last five years where, biking, I had a worse encounter than described (though I reserve judgment of fault until more details come out). Including one that required hospitalization, where the human driver decided to drive away.
The frustrating bit about AV discourse is that AVs are held against the standard of a perfect driver. Which I don't actually object to--all drivers should be compared to perfection. But it's enraging when people do so for AVs without comparing against human drivers, which, as everyone should know, fall far short of perfection. And with AVs we have the capability to iteratively improve them toward perfection, while there's no way to do the same with humans.
SewerAI | SewerAI.com | UX Designer | Bay Area (some days remote) | FT | $130K - $160K + Equity (series A/B company)
We help cities manage their sewer infrastructure. We have a data platform that manages terabytes of video data and an AI tool that helps cities find problems in their sewer inspection videos. Our AI is focused on computer vision - not LLMs. Our data platform helps cities and private business run their inspections quickly, easily and accurately.
We are looking for a UX designer that can help drive customer productivity and be the voice of the customer.
please reach out to jobs at sewerai dot com
I have a very similar issue and likely the same cause. What has helped for me: Cortisone shots (temporarily relief but not a fix), losing weight, core strengthening activities and switching to a standing desk 100% of the time. YMMV. 43yr old male, lifelong developer.
Prompt:
Here is a riddle. It is a common riddle but with some changes that make it more difficult. You are an alien that will live for at least 10000 years. You have no sense of temperature. you can not feel hot or cold. you have eyes and can see. you are in a house. downstairs are 3 light switches that control 3 light bulbs that are upstairs. Each light bulb will last for exactly 3 years. you can tell by looking at a bulb if it has burned out after 3 years. devise a strategy to determine which switch controls which light.
ChatGPT 4
The standard solution to this problem involves using the heat generated by the bulbs, but since you as an alien can't feel temperature, we need to utilize the fact that these bulbs have a specific lifespan of 3 years. Here's a strategy:
Turn on the first switch and wait for 3 years. After 3 years, turn it off.
Right after turning off the first switch, turn on the second switch.
Leave the third switch off throughout this process.
Now, go upstairs to inspect the bulbs:
The bulb that is off and has burned out is controlled by the first switch. This bulb was turned on for 3 years, so it has reached the end of its lifespan and burned out.
The bulb that is on is controlled by the second switch. You turned this on right after turning off the first switch, so it's still functioning.
The bulb that is off and hasn't burned out is controlled by the third switch. This switch was never turned on, so the bulb is still in a brand-new condition.
This strategy ensures that you can confidently identify which switch controls which bulb.
It's telling that it didn't pick up on the fact that the whole "once you look you can't touch the switches anymore" isn't in this version of the riddle. I mean the obvious strategy in this case is turn on the first switch, go upstairs, look at the bulbs. Go back downstairs and try the second switch. You've not got them all mapped out in about 30 seconds.
Using GTP4 I asked if there was a way to do it in less than 3 years, but it couldn't figure this out even if I told it you can look and use the switches as much as you want. Instead it suggested turning on a switch for 10 minutes, then using your "excellent alien vision" determine which 3 year lifespan bulb has 10 minutes of wear on it.
Makes me think GPT4 doesn't really have better reasoning, it just looks like better reasoning because it's been fed way more data.
Variants of common riddles actually can be solved with GPT-4, but you have to rewrite it so it doesn't look like the riddle from memory(sometimes, it's as easy as changing names to something completely different). Turns out Language models trust their memory quite a bit.
Slightly related - they won't actually use the results of tools if it differs a lot from what it expects the output to be - https://vgel.me/posts/tools-not-needed/
All they have is memory, either in the weights or the input prompt. To the extent that these models appear to reason, it is precisely in the ability to successfully substitute information from the prompt into reasoning patterns in the training data. It shouldn't be any surprise that this fails when patterns in the prompt strongly condition the model to reproduce particular patterns of reasoning (eg, many words in the riddle indicate a well known riddle, but the details are different).
I know the impulse to anthropomorphize is almost impossibly seductive, but I find that the best way to understand and use these models is to remember: they are giant conditional probability distributions for the next token.
Code is often just a sequence of steps (sometimes with comments to indicate goals). As such, it is just another form of patterns of reasoning. Many chains of thought that you would utilize in code are useful skeletons to think about other things.
Structure. GPT has seen lots of logical constructions/arguments for things. These are either explicitly in code (in documentation) or are implicitly in code (code is often a linear sequence of steps building to, for example, a return value). ChatGPT learns patterns like this. A prompt may condition the generator to produce something like one of these patterns with elements from the prompt substituted into the generated text. This works relatively often, but fails exactly in the case where the prompt so strongly indicates a pattern that won't work for the prompt given.
I won't say these models can't reason per se, but they can only reason using their memories and the prompt. There is nothing else for them to compute on.
In a hand wavy kind of way, when ChatGPT fails at a riddle phrased in a way as to make it seem similar to a common riddle, you're seeing overfitting. But given the quantity of data these models consume, its hard to imagine how to test for overfitting because the training data contains things similar to almost anything you can imagine. Because of that I'm still very suspicious of claims that they "reason" in any strong sense of the word.
But if you try very hard you can find "held out" data and when you test on it, GPT4 stops looking so smart:
1. the paper in question demonstrates a formal duality between the transformer architecture and gradient descent. If you take this to indicate that the model reasons in some way, then it would be true of the smallest GPT as well as the largest (it is, after all, a consequence of the architecture rather than anything the model has learned to do per se). In any case, the fact that the model can perform the equivalent of a finite number of gradient-like steps on its way to calculating its final conditioned probabilities doesn't really suggest to me that the model reasons in a general way.
2. You are right that no one disputes the model's ability to memorize (and rephrase). What is at question here is whether the model can reason. If it can do 10 code questions it has seen before but fails to do 10 it hasn't (of similar difficulty) then it strongly suggests that it isn't reasoning its way through the questions, but regurgitating/rephrasing.
>If it can do 10 code questions it has seen before but fails to do 10 it hasn't (of similar difficulty) then it strongly suggests that it isn't reasoning its way through the questions, but regurgitating/rephrasing.
First of all, coding is one thing where expecting perfect try on first pass makes no sense. That GPT-4 didn't one-shot those problems doesn't mean it can't solve them.
Moreover, all this says if true is that GPT-4 isn't as good at coding as initially thought. Nothing else. Doesn't mean it doesn't reason. There are many other tasks where GPT-4 performs about as well on out of distribution/unseen data
This really just makes it seem like it's not reason at all. The trick (or rather un-trick) here is that you can look at the bulbs as many times as you want. Even if I explicitly tell GPT4 that, it doesn't get it.
Maybe I'm missing something, but this seems like a really poorly worded riddle. Or just a bad riddle.
Given what you stated, the solution would be walk upstairs look at the 3 bulbs, walk downstairs switch one of the switches. Walk upstairs see which bulb changed on/off. Repeat for the other two switches.
This takes roughly 5 mins rather than your solution which takes 3 years.
Ah thanks, that clears it up. The formatting made me miss that that was the ChatGPT response. I thought that was the user sharing their solution that they look for.
This is the crux of the problem for GPT, enough of the text is from the “standard” riddle that is cannot break from it to actually answer the question.
I’m seeing this all of the time when trying to get GPT to perform certain answering problems. It’s heavily biased towards the “correct” answer - even when the prompt presents directly contradictory information.
The reply is funny. I told ChatGPT that it can do multiple trips and that it can see both switches and light bulbs at the same time.
ChatGPT keeps insisting that you have to turn on a switch, wait a few minutes , flip switch to off, turn other switch to on, then go and look and feel the switches. The classic solution to the riddle.
But if you can see both switches and light bulbs, you turn on one switch, you see which bulb turn on. You then turn on the 2nd switch and see which light turn on. You are done. No wait needs to happen ;-)
Actually, this is a really good point. Asking a question means that you have an internal model or the world and you realize that there are some data that is way off. Is it even possible at all to train an LLM to ask a question? This also gives me an idea for the future where most online text is generated by AI: we will play the game of responding questions with questions and then we know we are human!
Are you looking for: turn one light on, walk upstairs and note the light that is on. Walk downstairs, turn another light on, walk upstairs and note the new light that is on. Now you know which switches control which lights and it only took you 5 minutes instead of 6 years
SewerAI | Walnut Creek CA | Bay Area / Remote | AI Engineer & Backend Engineer | https://www.sewerai.com/
SewerAI is an AI and data management company that is improving the speed, effectiveness and cost for cities and engineering firms to manage infrastructure. Our cutting edge AI lowers the total cost of ownership of cities’ infrastructure by reducing the dirty, dull and dangerous work that must be done to maintain infrastructure.
Come join a growing and A-round-funded startup to help provide real value to the world.
SewerAI is a funded startup (with real revenues) that is helping cities manage their physical infrastructure. Yes we are really named SewerAI and our first product is AI to find problems in sewer inspection videos. Our second product is a cloud data management platform to help cities manage all of their data. We sell these products both to cities and the private service firms that support them.
We are looking for our first design hire. We worked with a consulting firm to roll out the first version of our site but we now like to bring that expertise internally. We are looking for someone that can roll up their sleeves to work with our product and engineering team to add new features and capabilities to the site. There is a lot of work to do!
We are a technology company that is looking to make improvements in the physical world. Helping the environment and the citizens these cities serve.