Seems like the person who wrote the blog works in "classical" deep learning. So do I, so here's the fairest take I can come up with: "AI" has for recent memory been a marketing term anyway. Deep learning and variations have had a good run at being what people mean when they refer to AI, probably overweighting towards big convolution based computer vision models.
Now, "AI" in people's minds means generative models.
That's it, it doesn't mean generative models are replacing CNNs, just like CNNs don't replace SVMs or regression or whatever. It's just that pop culture has fallen in love with something else.
Spot on. I work with deep learning systems in industrial control, and generative models are simply ill-suited for this sort of work. Wrong tool for the job.
But neither the traditional nor generative models are "AI" in the sense that normal people think when they hear "AI".
To me what’s exciting about Chat/GPT type of tech, is that they can be the “coordinators” of other models.
Imagine asking an AI assistant to perform a certain industrial control task. The assistant, instead of executing the task “itself”, could figure out which model/system should perform the task and have it do it. Then even monitor the task and check it’s completion.
Also, even if a LLM could do that, so could a shell script, without the risks involved in using "AI" for it, or for now the ridiculous external dependence that would involve.
I wonder if in 10 years people will be stuck debugging Rube-Goldberg machines composed of LLM api calls doing stuff that if-statements can do, probably cobbled together with actual if-statements
> I wonder if in 10 years people will be stuck debugging Rube-Goldberg machines composed of LLM api calls doing stuff that if-statements can do, probably cobbled together with actual if-statements
Sounds like an extension of https://en.wikipedia.org/wiki/Wirth%27s_law.
How many times have I done some simple arithmetic by typing it into my browser's bar and checking out the google calculator results? When a generation ago I would have plugged it into a calculator on my desk (or done it in my head, for that matter...). I would be entirely unsurprised to hear that in another generation we're using monstrously complicated "AI" systems to perform tasks that could be done way more simply/efficiently just because it's convenient.
My son regularly uses Alexa as a calculator, and also asks Alexa all kinds of things without a thought as to whether the output triggers a simple pattern match and gets fed to a specialised process or triggers a web search or is processed some other way. It's all conversational anyway. So the day Amazon plugs an LLM into it, it's not a given he'll even notice the difference for some time.
It's not wrong. It's how modern systems operate. E.g. look at Google's SayCan (https://say-can.github.io/) which operates exactly like this (an LLM ordering a Robot around).
With the limit of 25k words it might actually be reasonable to test out a prompt for an expert system… but I’d still leave reasoning to something else, for now. Z3, prolog or some forward chaining tool like clips, but have the LLM hallucinate some of the rules?
LLMs are already taking over these sorts of systems in industry.
There are lots of systems where you're taking some information about a user and making a best guess at what action the system should take. Even without a need for super high accuracy these rule systems can get surprisingly complex and adding in new possible decisions can be tricky to maintain. In LLM world you just maintain a collection of possible actions and let the LLM map user inputs to those.
Sure, maybe you can use a shell script, but now the AI assistant can write it based on your verbal/text description, and then the assistant can also run it for you after you’ve checked it.
What your are saying is: “why use the washing machine, if I my clothes are even cleaner when I wash them myself - I also spend less detergent and less water”.
Spare me the shitty analogies. We write shell scripts because it’s cheap, fast, and the behavior is very predictable.
Like it or not, an AI’s behavior is a black box and can’t be “proven” to execute exactly the same every time for the scenarios you are targeting.
A shell script will do exactly what it has been written to do every time, unless tampered with. And if changes need to be made, it can be done quickly without need for retraining, god knows how long that would take for an AI to learn something new. God help you if you need to maintain “versions” of your AI, trained for different things.
Face it, AI are pointless and slow for certain classes of problems.
> A shell script will do exactly what it has been written to do every time, unless tampered with.
Or unless some magic environment variable changes, or one of the runtime dependencies changes, or it is run on a different operating system, or permissions aren't setup right, or one of its tasks errors out.
Shell scripts are digital duct tape, the vast majority of shell scripts do not come close to being reliable software.
> god knows how long that would take for an AI to learn something new
Did you watch OpenAI's demo yesterday? They pasted in new versions of API docs and GPT4 updated its output code. When GPT forgot a parameter, the presenter fed back the error message and GPT added the parameter to the request.
The big thing everyone in this single thread is missing is that AI is a metaheuristic.
I wouldn't expect to use AI to run_script.py. That's easy. I'd expect it to look at the business signals and do the work of an intern. To look at metrics and adjust some parameters or notify some people. To quickly come up with and prototype novel ways to glue new things together. To solve brand new problems.
To do the work of an intern an AI must go on Jira, read a ticket, then find the appropriate repositories where code needs to be modified, write tests for its modification, submit for code review, respond to feedback in code review, deploy its changes.
It always feels achievable in five years. People were saying exactly this 30 years ago.
Sooner or later it may (or may not) be a true statement, but it's awfully hard for me to say that it's any different right now than it has been before.
I've had ChatGPT write code from vague statements that got close enough that it'd take the typical intern days of research to figure out. I've also had it fail spectacularly before prompted more extensively. But there are tasks I'd rather hand of to ChatGPT already today than hand to an intern, because it does the job faster and is able to correct misunderstandings and failures far faster.
E.g. I posted a while back how I had it write the guts of a DNS server. It produced a rough outline after the first request, and would fill out bit by bit as I asked it to elaborate or adjust specific points. The typical intern would not know where to start and I'd need to point them to the RFC, and they'd go off and read them and produce something overwrought and complex (I've seen what even quite experienced software devs produce when given that task; and I know how much work it took me the first time I did it).
So it may not exactly replace an intern, in that there are classes of problems that require low-level reasoning and a willingness and ability to go off and research that it's just not set up for yet and that will be harder to replace. But the problem set will change. Both in that what gets to the intern will be things where LLMs don't produce good result fast enough (I wouldn't ask an intern to do something what ChatGPT can do well with little prompting), and that interns will be more likely to go off and learn a bit and then spend more time prompting LLMs and in that sense produce more value than they could before.
What's different is that each subtask now feels like a weekend hackathon. Plus a bunch of engineering to ensure high quality results and build the appropriate UI/UX for human drivers.
You could solve 5% of cases now and over five years drive it up beyond 100% (where you're getting new customers and startups that had never even tried the previous methods).
What software do people envision themselves creating with these AI slaves?
More shitty CRUD apps? Which get easier and easier to pump out everyday with the growing numbers of frameworks, libraries, copying and pasting snippets from stack overflow?
Or will AI really write all the code for all our critical systems, operating systems, financial markets, planes, factories, vehicles, spacecraft? And will they do it so confidently and accurately that humans can safely just forget how to code?
Sure, perhaps. But by then AI will also be so advanced and independent in its problem solving that it will have no need to listen to human prompts.
What developer working on anything meaningful does not rely on documentation? You certainly have to make the documentation available similar to how you would have to make it "available" to an LLM. I think you might be missing the point about what the potential use-cases for these systems are.
I think you're fighting an uphill battle because of what you picked to defend here - shell scripts are very easy to write, and I have a hard time imagining a future where someone tells an LLM,
"Write me a shell script that runs run_control.py with the speed argument set to one hundred."
That's a trivial example, sure. Think of saying "make this image more red, flip it vertically, then crop the bottom 25%, and finally output it with 80% compression as a jpg". That would take 15 minutes to figure out how to write that with imagick, or just have an LLM do it for you.
I had to try, seems reasonable, haven't tested the code
As an AI language model, I do not have direct access to image editing software, but I can provide you with a shell script that could achieve the described image manipulation using ImageMagick, assuming it is installed on your system.
Here's an example shell script that you could use as a starting point:
bash
Copy code
#!/bin/bash
# Set the input image filename
input_file="input.jpg"
# Set the output image filename
output_file="output.jpg"
# Increase the redness of the image
convert "$input_file" -modulate 100,150,100 -channel R -evaluate multiply 1.2 +channel temp1.jpg
# Flip the image vertically
convert temp1.jpg -flip temp2.jpg
# Crop the bottom 25% of the image
convert temp2.jpg -gravity South -chop 0x25% temp3.jpg
# Compress the image to 80%
convert temp3.jpg -quality 80 "$output_file"
# Clean up temporary files
rm temp*.jpg
To use this script, save it to a file (e.g., "image_manipulation.sh"), make it executable (e.g., "chmod +x image_manipulation.sh"), and run it in the same directory as your input image. The output image will be saved in the same directory with the name "output.jpg".
Please note that this is just an example script and may need to be modified depending on your specific image manipulation needs.
This is a prime example of what I mean by my remark above that the original commenter was totally undermining their own argument by using "shell script" instead of "python script".
Shell scripts can't create intermediate bitmap images in memory and operate on them with normal drawing and filtering apis like opencv, without lossily serializing and deserializing them as jpg files, or even operate on json structures directly.
It would be a much stronger argument if your example used python and opencv or any other normal library, instead of incredibly inefficient and hard to write and maintain invocations to full blown unix commands run in separate processes instead of direct library calls to manipulate in-memory images.
There's no reason "AI" code generation has to use the worst possible language and technique to generate code to solve a problem.
It's like having a food replicator from Star Trek: TNG, and asking it to make you rancid dog food instead of lobster.
Not to weigh in on any other aspect of this discussion, but when you say:
> I have a hard time imagining a future where someone tells an LLM,
"Write me a shell script that runs run_control.py with the speed argument set to one hundred."
I'll point out that we already live in a world where single lines of pure function code are distributed as an NPM packages or API calls.
It’s not ‘write me a shell script to run this python code’, it’s ‘okay, the test part looks good, run the print again with the feed speed increased to 100, and make six copies. And Jarvis, throw a little hot-rod red on it.’
I've been a developer for a long-ass time, though I don't have super frequent occasion where I find it worthwhile to write a shell script. It comes up occasionally.
In the past 2 weeks I've "written" 4 of them via ChatGPT for 1-off cases I'd have definitely found easier to just perform manually. It's been incredible how much easier it was to just get a working script from a description of the workflow I want.
Usually I'd need to double check some basic things just for the scaffolding, and then, maybe double check some sed parameters too, and in one of these cases look up a whole bunch of stuff for ImageMagick parameters.
Instead I just had a working thing almost instantly. I'm not always on the same type of system either, on my mac I asked for a zsh script but on my windows machine I asked for a powershell script (with which I'd had almost no familiarity). Actually I asked for a batch file first, which worked but I realized I might want to use the script again and I found it rather ugly to read, so I had it do it again as a powershell script which I now have saved.
Sure though, someone won't tell an LLM to write a shell script that just calls a python script. They'd have it make the python script.
I think one effect of LLMs and their limited context will be the end of DRY. I’ve already found myself getting gpt to write stuff for me that could have been part of or leveraged existing code with a little more thinking. But the barrier to just starting from scratch to do exactly what I want, right now, just got a whole lot lower.
What? There are a lot of non-coders out there, and they could absolutely use an LLM to ask it to create scripts to run. In fact I along with a few of my friends already do this, I recently asked ChatGPT to figure out how to integrate two libraries together after I copy-pasted to docs from each (now with the GPT-4 32k token limit).
You are getting a surprising amount of backlash from this, but I think you are right. There may be better tools for the job, but general tools tend to win out as they get "good enough"
mediocre is acceptable for most things. i'd rather have 1000 free photos from my wedding than 32 perfect ones. i still ended up with more than 32 perfect ones.
The central question is that a controller is assumed to be specifiable and thus formally verifiable through model checking in principle.
With a neural network you have a black box and for example with ChatGPT it doesn't even have a specification. It turns the verification process upside down.
I'm not sure how the likes of ChatGPT could accomplish that even in theory, but I won't say it's not possible at some point in the future. Gpt itself, perhaps, someday.
Already ChatSpot is doing it. Their system is essentially a ChatGPT-enhanced Hubspot management system using chatux.
ChatSpot can understand your commands and then perform actions in the system for you, for example add a lead, change their contact info, write a blog post, publish it, add an image…
Edit: but if you connected it with physical actions, it could control your house, maybe check your smart refrigerator, order food on Instacart, send you recipe, schedule the time to cook in your calendar, request an Uber to pick you up from work, invite someone over, play music…
Perform tasks that humans do now, but at scale, automatically.
We are going to be able to automate everything and anything with the proper feedback loops.
For example, you could have an app that writes itself, deploys itself, tests itself, receives feedback, updates itself based on the feedback, writes additional tests, does CI/CD.
At that point you will be just creating and directing. Or you can choose whatever you actually want to execute.
And then if those same kind of processes are given access to physical tools, they could do all of our manufacturing, design and build their own machines and infrastructure.
We could essentially collaborate with our systems in the most amazingly seamless way.
I'm curious about your work, because I worked on something similar during my grad school. What kind of applications in industry do you use deep learning systems for? Process control?
Yes, process control. It's used in coordination with vision systems to analyze work pieces, determine the best way of processing them, and direct other machinery how to do that processing.
That's cool. If you don't mind me asking, would you have any shallow level stuff that I could read on about this? Even a website or a blog post would be great.
In my grad school, we were working on something similar - using computer vision to analyze reactor flows to then change process variables. The results would be fed back into the system for RL. Too bad the project sorta froze after I graduated.
Your grad school project sounds very similar, yes, although we work with discrete objects rather than fluids. Fluid dynamics is much, much more complicated.
We actually use more than one neural network for this. The software is designed so the NN component is a plugin. The reason we do this is because some types of neural nets work better for some tasks than others.
Most (but not all) of our nets are convolutional.
Since you've already done some work with this sort of thing, I'm unsure about what level of overview would be of value to you, but this looks reasonable for a technically competent person who is new to the topic:
like, regression, sure - because it's a tool to measure how well a hypothesis (polynomial function) matches the data (points.) and CNNs are still foundational in computer vision. but the first and last time I heard of SVMs was in college, by professors who were weirdly dismissive of these newfangled deep neural networks, and enamored by the "kernel trick."
but aren't SVMs basically souped up regression models? are they used in anything ML-esque, i.e. besides validating a hypothesis about the behavior of a system?
> but the first and last time I heard of SVMs was in college, by professors who were weirdly dismissive of these newfangled deep neural networks, and enamored by the "kernel trick."
LOL. Exact same experience in my college courses. Glad to know it's universal.
The Generative AI is the AI for the masses. While people were getting overhyped with all the possibilities and promises of AI and deep learning etc. it is for the first time that they can also tinker and get surprised by its results. People feel creative interacting with it.
Isn’t most of the mathematics of AI old, as in really old?
Regression, both linear and logistic are from the mid 1800s to early 1900s. Neural networks, at least the basics are from around 1950.
What has really changed is the engineering, the data volume and the number of fields we can apply the mathematics to. The math itself (or what is the basis of AI) is really old.
backpropagation didn't get solved until the '80s, weirdly. before then people were using genetic algorithms to train neural networks.
and it was only in the last decade that the vanishing gradients problem was tamed.
my impression is that ML researchers were stumbling along in the mathematical dark, until they hit a combination (deep neural nets trained via stochastic gradient descent with ReLU activation) that worked like magic and ended the AI winter.
Right, and the practice of neural networks has significantly overshot the mathematical theory. Most of the aspects we know work and result in good models have poorly understood theoretical underpinnings. The whole overparamiterized thing for example, or generalization generally. There's a lot that "just works" but we don't know why, thus the stumbling around and landing on stuff that works
As the old joke goes, "AI" is anything that doesn't work yet.
Once an "AI" system becomes reliable, we quickly take it for granted and it no longer seems impressive or interesting. It's just a database. Or an image classifier. Or a chatbot.
I'd argue a database is possibly as far from an AI as you get. Indexing and data structures and storage systems that go into them are very deterministic data structures you can model out and roughly know the behaviour of before writing a single line of code. Image classifiers and Chatbots you don't know what you're getting out till you train it and deploy it.
People calling neural-net classifiers "old-school" AI confused me. For a second I thought they were talking about the really old "expert systems" with everything being a pile of hard-coded rules.
It still feels like there's a place for these rule based systems(Prolog?) to at least place some constraints on the output of non-deterministic, generative AI. If nothing else, have a generative AI generate the ruleset so you have some explicit rules you can audit from time to time.
Yeah, i think one potential way to use blackbox ai in newer systems is having guardrails that are validated as safe (but perhaps non-optimal) and ensuring that the ai takes action within that sample space. Obviously this is hard problem, but might open the doors for policies (in self-driving cars, for example) to be entirely ai driven.
A friend of mine was just telling me how he asked GPT-3 to write a simple program in Prolog and it seemed to get it right. He didn't try compiling it, but he has enough experience w/ Prolog to say that it was more or less correct.
I'm pretty cynical on LLMs(i.e. they're not intelligent and won't take all our jobs soon), but am coming around on their importance and capabilities.
I m not sure it's overrated, but the concerns are very real.
We love the model because it speaks our language as if it's "one of us", but this may be deceiving, and the complete lack of model for truth is disturbing. Making silly poems is fun but the real uses are in medicine and biology, fields that are so complex that they are probably impenetrable to the human mind. Can Reinforcement learning alone create a model for the truth? The Transformer does not seem to have one, it only works with syntax and referencing. How much % of truthfulness can we achieve, and is it good enough for scientific applications? If a blocker is found in the interface between the model and reality, it will be a huge disappointment
Without sensing/experiencing the world, there is no truth.
The only truth we can ever truly know, is the present moment.
Even our memories of things that we “know” that happened, we perceive them in the now.
Language doesn’t have a truth. You can make up anything you want with language.
So the only “truth” you could teach an LLM, is your own description of it. But these LLMs are trained on thousands or even million different versions of “truth”. Which is the correct one?
There is a paper showing you can infer when the model is telling the truth by finding a direction in activation space that satisfies logical consistency properties, such as that a statement and its negation have opposite truth values. Apparently we can detect even when the model is being deceitful.
Another approach - a model can learn the distribution - is this fact known or not in the training set, how many times does it appear, is the distribution unimodal (agreement) or multi-modal (disagreement or just high variance). Knowing this a model can adjust its responses accordingly, for example by presenting multiple possibilities or avoiding to hallucinate when there is no information.
I think for practical purposes you could hold that text from wikipedia or scientific papers if true, for example. The issue I think OP is referring to is if a LLM can refer back to these axiomatically true sources to ground and justify its outputs like a human would.
If you can trust the model is at least as accurate as wikipedia then it becomes a drop in replacement for every task you do that requires wikipedia.
There are a whole range of tasks that can’t be done today with an LLM because of the hallucination issues. You can’t rely on the information it gives you when writing a research paper, for example.
For starters because one of the first products people decided to use these models for is a search engine, and I don't think it is a stretch to argue that search engines should have a positive relationship, rather than indifference, towards facts and the truth.
You can make up any reality that you want, just consume these first and don’t ask me where I found them.
In all seriousness though, what you are asking is whether an objective reality exists which is not a settled debate. There is also the whole solipsism thing though many disregard as a valid view of the world because it can be used to justify anything and is not a particularly interesting position.
Of course there is also the whole local realism thing with QM and of course the whole relativity thing and time flowing at different speeds destroying a universal “now”.
Then there is the whole issue with our senses being fallible and our brains hallucinating reality in a manner that is as confident as GPT3.5 is when making up facts.
In fact, it’s all just information and information doesn’t need a medium.
Our senses lie to us all the time. What we perceive may have strong to almost no correlation to reality. Can you see in the ultraviolet? No human can. Flowers look completely different. Same goes for sounds and smells.
It can be exact and self-consistent, you can teach the rules of mathematics . There are some things that are provably unprovable but thats a known fact.
In exact domains you can often validate the model with numerical simulations, or use the simulations for reinforcement learning or evolution. The model can learn from outcomes, not only from humans. In biology it is necessary to validate experimentally, like any other drug or procedure.
there seems to be accumulating evidence that "finding the optimal solutions" means (requires) building a world model. Whether it's consistent with ground truth probably depends on what you mean by ground truth.
Given the hypothesis that the optimal solution for deep learning presented with a given training set, is to represent (simulate) the formal systemic relationships that generated that set, by "modeling" such relationships (or discovering non-lossy optimized simplifications),
I believe an implicit corollary, that the fidelity of simulation is only bounded by the information in the original data.
Prediction: a big enough network, well enough trained, is capable of simulating with arbitrary fidelity, an arbitrarily complex system, to the point that lack of fidelity hits a noise floor.
The testable bit of interest being whether such simulations predict novel states and outcomes (real world behavior) well enough.
I don't see why they shouldn't, but the X-factor would seem to be the resolution and comprehensiveness of our training data.
I can imagine toy domains like SHRDLU which are simple enough that we should be able to build large models well enough already to "model" them and tease this sort of speculation experimentally.
> there seems to be accumulating evidence that "finding the optimal solutions" means (requires) building a world model.
Was this ever in doubt? This has been the case forever (even before "AI"), and I thought it was well-established. The fidelity of the model is the core problem. What "AI" is really providing is a shortcut that allows the creation of better models.
But no model can ever be perfect, because the value of them is that they're an abstraction. As the old truism goes, a perfect map of a terrain would necessarily be indistinguishable from the actual terrain.
But no model can ever be perfect, because the value of them is that they're an abstraction. As the old truism goes, a perfect map of a terrain would necessarily be indistinguishable from the actual terrain.
Not sure why but I find this incredibly insightful…
> Prediction: a big enough network, well enough trained, is capable of simulating with arbitrary fidelity, an arbitrarily complex system, to the point that lack of fidelity hits a noise floor.
That is a pretty good description of human brains/bodies. You could also say that quantum physics is where our noise floor might be.
Here's an alternative to a model for truth. There is no truth, only power. Suppose we completely abandon logical semantics and instead focused on social semantics. Instead of the usual boolean True/False variables and logic relations, we'll have people valued variables and like/dislike relations. I system entirely for reasoning about the amount of pull and persuasion is present without ever circuiting down to any ground truth reasons. In other words, a bullshit reasoning system. Can ordinary truth reasoning be jerryrigged out of this system?
This was rhetorical. My point was that a system or model which cares about something other than the truth can, upon reaching a certain level of sophistication, be able to handle reasoning about truth. Eg, an AI that cares entirely about approval for what it says rather than the actuality of what it says could still end up reasoning about truth, given that truth is most heavily correlated with approval. I reject the premise that there has to be an a priori truth model under the hood.
RLHF is basically just applying social power to the machine. It’s used for good (ChatGPT won’t help you spread Nazi memes) and hegemony (ChatGPT won’t help you overthrow capitalism).
We are all struck with the novelty of generative AI, it needs time to settle. People will throw the universe at the wall and see what really sticks.
To my mind generative AI is great at finding needles in the haystack of stuff we already know. Of course it just as often gives you a fake needle right now, just to see if you notice.
On the other hand "traditional"/predictive AI is often better at the things we don't already know or understand.
I mean, the only thing GPT does is predict the next word, which makes it not so different from a compression algorithm. And diffusion models (the image generating stuff) are essentially fancy denoisers.
Depending on how you assemble the big building blocks, you get generation or you get prediction.
GPT-3.5 is not a Markov chain, this is trivially true. While ‘predicts the next word’ is true, the mechanism of it is of interest and that is most certainly not trivial.
Depends how far you take the word 'fundamental', on the one hand yeah most DL systems are trying to predict something, and they generally have some concept of compression built in. But in terms of the steps to curate a dataset, train, test, iterate and actually use the model for a given end goal - they are pretty fundamentally different.
I think the thing is though in Large multi models you give it all the data and test it against everything. And it generally does better across most of the benchmarks.
That depends entirely on the use-case - for example if you wanted to build an AI to operate a self-driving car, just training on unlabelled data scraped from the internet is only going to get you so far. It doesn't learn how to do EVERYTHING (not yet at least).
That's not wrong, but an ideal autocompleter is a near-omniscient superintelligence. "The optimal approach to curing Alzheimer's is ______". "The proof of the Riemann hypothesis is as follows: ______". "The best way for me to improve my life is _______".
I think the big difference is just being an Autocompleter is less concerned with generating something that is truthful, as in reflects the real world as we understand it described by physics, vs simply spitting out something that sounds good.
Although we do have a litmus test in asking it "What is the meaning of life the universe and everything?"
Yes, exactly. An autocompleter is saying what the next words probably would be, not what it should be. It's like a chess program that tries to find the most likely move that a huan would make in the position rather than the best move.
That’s why I said “supervised” - in other words, someone competent in the domain and context is examining the output and correcting or discarding as necessary before use.
When the generative model is autoregressive (autocomplete), it can easily be used as a predictor. All of the state of the art language models are tested against multiple choice exams and other types of prediction tasks. In fact, it's how they are trained...masking - https://www.microsoft.com/en-us/research/blog/mpnet-combines...
For GPT4: "Pricing is $0.03 per 1,000 “prompt” tokens (about 750 words) and $0.06 per 1,000 “completion” tokens (again, about 750 words)."
Meanwhile, there are off-shelf models that you can train very efficiently, on relevant data, privately, and you can run these on your own infrastructure.
Yes, GPT4 is probably great at all the benchmark tasks, but models have been great at all the open benchmark tasks for a long time. That's why they have to keep making harder tasks.
Depending on what you actually want to do with LMs, GPT4 might lose to a BERTish model in a cost-benefit analysis--especially given that (in my experience), the hard part of ML is still getting data/QA/infrastructure aligned with whatever it is you want to do with the ML. (At least at larger companies, maybe it's different at startups.)
The real innovation will come one someone uses a Generative AI to make something, and then use a predictive AI to rate it's accuracy, making it go again until it passes the predictive AI.
Basically a form of adversarial training/generation.
Bilateral "thinking" makes sense, and you can even feed generative AI back into itself for simple error correction.
I believe that we'll see the most success/accuracy once you have generative AI compare itself to itself, monitored by a GAN, which then spits out it's answer while retaining some knowledge as to how it came to the conclusion. A tricameral mind.
I hadn't thought about human feedback being an adversarial system, but I guess that makes sense, since it's basically a classifier saying "you got this wrong".
I think there was a breakdown in communication here.
If I train a classic deep net as a classifier and there are 5 possible classes, it will only ever output those 5 classes (unless there's a bug).
With ChatGPT, for example, it could theoretically decide to introduce a 6th class - what I would call an alien failure mode, even if you explicitly told it not to.
I think formally / provably constraining the output of LLM APIs will help mitigate these issues, rather than needing to use an embedding API / use the LLM as a featurizer and train another model on top of it.
Formal proof is problematic because English has no formal specification. Some people are working on this, it's a nascent area bringing formal methods (model checking) to neural network models of computation. But it's an interesting fundamental issue that arises there, if you can't even specify the design intentions then how do you prove anything about it.
I'm working on an old-school AI personal project right now. I don't know how long that lasts. The generative stuff is more and more tempting. It rewards the horrible micromanager in me like nothing else.
Yes! Just like HN is anti blockchain but super pro AI. It seems most applications of generative AI at scale will havd a huge negative for society, far worse than anything blockchain could have brought about.
"So has generative AI been overhyped?
Not exactly. Having generative models capable of delivering value is an exciting development. For the first time, people can interact with AI systems that don’t just automate but create an activity of which only humans were previously capable."
Good answer but I feel that most users/people do not understand the difference between generative and predictive machine learning and that will probably cause unpredictable failures and false flags.
So yes it has been overhyped in my opinion
IMO, it has been underhyped. We're seeing things with LLMs that a decade ago I'd say was multiple decades out, if not more.
We're just years into generative approaches. And I think we'll more combinations of methods used in the future.
The goal of AI has never been to build an all knowing perfect system. It has also never been to replicate the way the human brain works. But its been to build an artificial system that can learn -- and AGI specifically to be able to give the appearance of human learning.
I feel like we've turned this corner where the question now is, "Can we build something that knows everything that has been documented and can also synthesize and infer all of that data at a level of a very smart human". The fact that this has become the new bar is IMO one of the biggest tech changes in history. Not the biggest, but up there.
Trying to imagine this stuff being even more hyped and I just don't think its possible. People around here are practically ready to sell their first born child to OpenAI/Microsoft at this point.
> Can we build something that knows everything that has been documented and can also synthesize and infer all of that data at a level of a very smart human
The word "know" is doing some heavy lifting there, as is "synthesize" and "infer".
By "know" I meant has access to. This is a very "database" sense of the word "know".
Now "infer" and "synthesize" I meant the standard human definition of "synthesize" and "infer". In my interactions with relatively bright people, they really expect ChatGPT to be able to synthesize text at the level of a very sharp HS/college student. They don't want simple regurgitation of a text or a middel school analysis -- they want/expect ChatGPT to analyze nuance, and pull in its vast database to make connections to things that maybe aren't apparent at first glance.
The bar has raised so high so quickly -- it's crazy.
I think the issue is more with people marketing/talking about them as "AI". When I think AI I think of something like Skynet. I would assume something like Skynet would be good at chess, able to generate new text, and synthesize new images. I think when shown novel algorithms that can do those things and told by the people selling the algorithms that they are "AI", it's hard to disagree since they quack like an AI so it's easy to accept that these are the same "artificial intelligence" concept in our brains which we previously only had examples of from fiction.
Basically I think it's overhyped by the use of the term "AI" and how easy we are to accept it generally. Some aspect of them being generative models could have been the term used to market/describe them, but instead a much broader term is used.
There is much more to generative models than building out language models and image models.
Generative models are about characterising probability distributions. If you ever predict more than just the average of something using data, then you are doing generative modelling.
The difference between generative modelling and predictive modelling is similar to the difference between stochastic modelling and deterministic modelling in the traditional applied mathematical sciences. Both have their place. Neither is overrated.
Or could it be possible that it was always going to end up like a black box it seems is it not? We will never truly understand the inner workings while it solves every problems that can be numbered by which it couldn't be solved with hardline algorithms previously. It's literally calling higher dimensional egrigores for answers or some blood magic Genie.
As stated by John McCarthy--"I invented [AI] because we had to do something when we were trying to get money for a summer study" (the Lighthill debate)--this article passes the AI sniff test, or "please remember us predictive AI folks when you go to dole out your money" as all that is solid melts into PR.
"Don't be dazzled by AI computer vision's creative charm! Classical computer vision, though less flashy, remains crucial for solving real-world challenges and unleashing computer vision's true potential."
Meant for those in classical computer vision before ML ate the field.
> I’m not sure I understand a definition of AI that doesn’t include the ability to generate things.
It depends how you define "generate." For example, is software that controls a robot arm generating anything? I guess it's generating the movements of the arm. But when people use the term "generative" with regards to machine learning models right now, they generally mean content—e.g. text or images for consumption.
Generative has a more technical meaning than that.
Generative AI is essentially the opposite of a classifier. You give it a prompt that could mean many different things, and it gives you one of those things. A robotic arm could use generative AI, because there are many different sets of electrical signals that would result in success for, say, catching a ball.
Classification is an example of a non-generative AI in that there is only 1 correct answer, but it still requires machine learning to acquire the classification function.
You can use AI to validate things, i.e. to check that they conform to some specification.
You may twist the language to say that they are generating a list of validations and errors, but even then it's definitely a different use case than merely creating new items.
I would add that there are logic deductive and constraint systems that are more classical and work in some areas. It is not about a single method but we should he aware that AI is a superset of what we see.
TLDR; Don't be dazzled by generative AI's creative charm! Predictive AI, though less flashy, remains crucial for solving real-world challenges and unleashing AI's true potential. By merging the powers of both AI types and closing the prototype-to-production gap, we'll accelerate the AI revolution and transform our world. Keep an eye on both these AI stars to witness the future unfold.
There's about ~10% point improvement left (i.e, from 80% to 90%) before it starts to stagnate. We've seen the same with predictive models benchmarked on ImageNet et. al.
It's funny to me we look at GPT4 scoring high on all these tests and think it's worth anything when educators and a lot of us here have been lamenting the standardized tests since Bush made it a preeminent feature of our country's education system. They are not a good measure of intelligence. They measure how well you can take a test.
Funny -- I literally had someone tell me this same thing this morning... but the exact same guy last week was arguing with me against the reduced importance of these same tests for college admissions. Last week he was arguing how critical these tests were for the college admissions process, but this morning the same tests are basically worthless.
Not saying you hold the same opinions -- but I wouldn't be surprised if people's take on these tests is more about what is convenient for their psyche than any actual principled position.
In principle I agree. On one hand, we can positively conclude that IQ is indeed important, but at the same time are horrible at measuring it. That being said, there is a country mile difference between most of these tests suitability for the purposes they are being used.
We mean beating humankind at the task, swiftly followed by humankind declaring that task wasn't a sign of proper intelligence anyway, and moving it's goalposts to a different field.
There's no way there's only 10% left to improve in those models. New versions are coming out regularly that are clearly improved. Midjourney v5 and GPT-4 were just released showing huge improvements, for example.
Not only that, but the innovation around this tech is also just getting started. It's immediately applicable for business use. The classical techniques still have their uses, of course.
It's not that there's only 10% left to improve. It's that the data needed, compute requirements, and model size are as intensive, getting from 0 to 80 as they are getting from 80 to ~85 or ~90. See https://paperswithcode.com/sota/image-classification-on-imag...
> People are excited because there's so much room to improve
That is hype due to OpenAI's excellent marketing and it is clearly overrated. Microsoft essentially has acquired OpenAI and is using AI safety and competition excuses to close source everything and sell their AI snake-oil.
> these are still early days.
Neural networks is not an early concept and LLMs still share the same eternal problems as neural networks. Neither is the way that they have been trained on which still hasn't changed for a decade. Even so, that explains the lack of transparent reasoning and more sophistry that it generates all for more data, more GPUs to incinerate the planet to produce a black box 'AI' model that can easily get confused due to adversarial attacks.
No , but the first MLPs from the 1960's famously couldn't solve the XOR problem , they threw a hidden layer in there and fixed it, and now we're in the 'how many layers can we jam in there' phase.
My point being although neural networks are not new, they keep adding fun new things to it to create novel new features.
Now, "AI" in people's minds means generative models.
That's it, it doesn't mean generative models are replacing CNNs, just like CNNs don't replace SVMs or regression or whatever. It's just that pop culture has fallen in love with something else.