Of course, it will require exponential data for zero shot. The keyword here is zero shot. If you think about it for a second, this applies to humans too. We also need exponential training data to do things without examples.
When we learn the grammar of our language, the teacher does not stand in front of the class and proceed to say a large corpus of examples of ungrammatical sentences, only the correct ones are in the training set.
When we learn to drive, we do not need to crash our car a thousand times in a row before we start to get it.
When we play a new board game for the first time, we can do it fairly competently (though not as good as experienced players) just by reading and understanding the rules.
I must say, understanding how transformers work is arguably the most important research problem in history, assuming that AGI can be achieved by just scaling up current LLM models on text, video, audio, etc.
"Better than the average human at most profitable tasks" is a much lower bar than most people on HN might think.
I have vendors who instead of filling out a web form which remembers their inputs and eventually even fills everything out for them instead print it out and fax it back in.
We're probably only about 2-3 years away from transformers being self-optimizing enough in prompts and evaluations to outpace the average worker in most tasks in most roles. (It won't necessarily be that much cheaper after the multiple passes and context windows required, and crucially probably won't be better at all tasks in most roles.)
If you define AGI as "better than any human at profitable tasks" or "better than average at all tasks" then yes, we're a long ways off and transformers alone probably won't get us there.
> "Better than the average human at most profitable tasks"
I think the HN crowd forgets that what really runs the world are min wage workers running around and doing real world things, not code monkeys and glorified type writers filling excel sheets. So yes, replacing the bullshit jobs we invented to keep people busy will be relatively easy, that's if you don't account for the fact you'll now have to create bullshit+ jobs to keep them busy
And even then we're far away, sure it can shit out code for a todo webapp and create semi realistic images of a monkey eating a burrito but that's about it. More than a year ago someone betted against me here that chatgpt would revolutionise the world in the next year, nothing happened really, geeks are excited, execs are buying the hype, tons of money is transferred, yet there was no 4th industrial revolution.
What happened though is that the web is flooded with absolutely useless content, amazon is full of ai generated books, students rely more and more on chatgpt to generate homeworks, thesis, "find" solutions, &c. it might very well end up being a net negative for the average joe in the long run
>I think the HN crowd forgets that what really runs the world are min wage workers running around and doing real world things, not code monkeys and glorified type writers filling excel sheets.
This is not true at all. How many products do you use that come primarily from minimum wage workers?
If a few people responsible for Google maps running stopped working the GDP loss would be much bigger than if magnitudes more minimum wage workers did the same.
Farm work, especially work that doesn't require specialization (planting, maintaining, harvesting), is pretty much minimum wage work where I live, in Spain. Minimum wage here is ~1300 EUR / month. But it also differs wildly by region here, as some regions are really poor while others rich (relatively).
Besides the farm work, there is food processing workers (cutting, cleaning, basically assembly lines), packaging, workers at warehouses, people who work at the counters of the store, and all the support roles for those positions. If you go outside and eat, you have all the restaurant personnel to take into account as well.
There is a lot of low skilled labor that goes into making what we eat today. I'm not sure how you could possibly claim that none of those people are on minimum wage.
Not all of the work you cited is essential. Would society crumble without retail?
Minimum wage in Spain is significantly more money than anything I've made in my life. It's a very comfortable position for the vast majority of the world.
>There is a lot of low skilled labor that goes into making what we eat today. I'm not sure how you could possibly claim that none of those people are on minimum wage.
People doing essential work that isn't trivially replaceable have the bargaining power to charge more than the minimum wage in a moderately free market of human work, usually they do.
> Not all of the work you cited is essential. Would society crumble without retail?
Did I miss the part where the other comment mentioned retail, or where you respond to the half dozen other examples of essential work?
> Minimum wage in Spain is significantly more money than anything I've made in my life. It's a very comfortable position for the vast majority of the world.
Instead of moving the bar some more, could you just define what minimum wage would be an acceptable bar for you in this conversation?
France is one of if not the biggest agriculture power in Europe, most farmers can't even generate a 35hr min wage equivalent while working 80+ hours a week.
20% of them live in poverty, half of them make less than 22k euros a year
Truck drivers earn between min wage and 150% of min wage, while being on the road every day and not having a social life, they drive 8 hours per day and sleep in their fucking truck while some code monkey makes 300k+/year coding memeojis at apple. Guess which ones will be automated first by openai lmao
>Truck drivers earn between min wage and 150% of min wage
Where are you getting this information? It's absolutely wrong. Long haul truckers (the one's you're saying don't have social lives because they drive 8 hours per day) make $71,196 on average in the US[1].
He is talking about France in the sentence before. There are barely any truckers in Germany with a German nationality. They are simply not competitive. Same goes for package delivery.
Just imagine what would happen to a trucker's salary in the US if it were to create a unified market with Mexico and all of Central America.
It's not necessarily a bad thing. Economies of Eastern European countries have been growing after all and Western Europe does not have enough workers because of its demographics anyway. My take is, that everybody is winning, there is less poverty than before, but some sideffects look ugly for a while.
The people picking trash in my street stopped working for 2 days and it looks like I live in some third world country now, two fucking days and it looks like I live in the middle of an open air dump
If trucks stopped deliveries every city would die in a week
If construction workers stopped building / maintaining we'd be in deep shit in 6 months or less
If the people in warehouses stopped working for a week the economy would tank like it rarely does
Nurses, doctors, bus/tram/train drivers, police, firefighters, ambulances, janitors, trash pickers, plumbers, sewer workers, electricians, people taking care of water treatment plants, power plants, teachers, social workers, ...
You could delete facebook, openai, instagram, twitter, netflix, tesla and 90% of startups from the face of the earth right now and I'd have the exact same life as yesterday. Remove any of the people I mentioned above and society would crumble in no time
And none of these are even remotely close to being automated at all, nobody cares about most of these jobs. But hey, here is a dancing kangaroo: https://www.youtube.com/watch?v=Zuivg5rz_aA
>You could delete facebook, openai, instagram, twitter, netflix, tesla and 90% of startups from the face of the earth right now and I'd have the exact same life as yesterday. Remove any of the people I mentioned above and society would crumble in no time
Growing up my trash was picked up by a human and the truck crew had two or three people on it jogging house to house to pick up trash as the driver slow rolled through the neighborhood.
Now my trash is serviced by one person who mostly never leaves the cab and who would be better described as a skilled machine operator than as a menial labor role. The work isn't completely automated but technology has reduced the job market for trash truck crews by two-thirds. I'm guessing the barrier is higher now too, requiring training and certifications to run the robotics on the truck instead of physical fitness being the primary prior qualification.
Even further, maybe the world would actually be better without these companies.
Now that there are great inventions like TikTok, teenagers are depressed as hell, and they don't go to meet each other to play soccer together, because the "social" networks are giving the illusion of having that connection.
> I think the HN crowd forgets that what really runs the world are min wage workers running around and doing real world things
Is this really true? It's certainly a nice soundbyte when you're making class arguments or trying to dunk on the "HN crowd", but I think it falls apart under any level of scrutiny.
Who keeps your lights on? Who drives your bus? Who manages your sewage? Who teaches your kids? Who builds your roads? None of them make minimum wage and would probably be a little insulted to be characterized as such.
It's pretty reductionist to call anyone outside our realm of tech a "min wage worker", they're just workers like or I. I think it's a pretty stupid and pointless exercise to subdivide people into useful or non-useful workers, serving no purpose but to further pet the smugness of HN AI skeptics.
I think this comment focuses too much on the “minimum wage” aspect - the core of the argument is that those are roles not at risk to AI in its present state, not necessarily the compensation aspect
I'm actually all in on people making up new definitions for vague terms at the start of an argument as long as they're explicit about it.
And I particularly like this one, which is much more clearly measurable. If you feel AGI is taken, maybe we should coin this one as APGI or something like that
It's only in contention because 1 of the sides has a tonne of money, a hurt ego, and is willing to pay lawyers to argue the sky is red in order to get revenge on his former colleagues. I don't think anyone would seriously claim OpenAI has achieved AGI today.
No that's the economically dominating definition. The philosophical one will happen much later or may never happen, but human society may change beyond recognition with the first one alone.
"The philosophical one" seems to get updated with every new breakthrough. 20 years ago, GPT3 would have been considered AGI (or "strong AI", as we called it back then).
Dennett describes it as real magic. The magic that can be performed is not considered real magic (it's merely a trick of confidence), whereas real magic is that which couldn't possibly be done.
My understanding is that AGI has no formal definition as it means different things to different people.
The poster here created his own definition, but what is wrong with that? He set a very specific bar to achieve, something that most "high-level" thinkers in the space have not really done. Isn't the point of discourse to bring your ideas to the table?
This is the correct way to approach the answer to when and how to achieve AGI. Otherwise please present here your Engineer Specification for defining AGI...
There is no single definition of AGI. Performing most intellectual tasks human perform today is both general and a form of intelligence, so I too agree with it.
Robotics is more important to AGI, because the bulk of human intelligence comes from manipulating and navigating the physical world. Which includes a large amount of social interaction. Allan’s are tools to assist humans. They aren’t automating most jobs away anytime soon.
> I have vendors who instead of filling out a web form which remembers their inputs and eventually even fills everything out for them instead print it out and fax it back in.
Somewhere along the way we built computer that are so intuitive people find printing and faxing easier than our web apps. This isn't completely the fault of any single web app, users have a lot of learned avoidance because of bad experiences with many apps.
In the end completely automating the job ended up being easier than building a good interface for a human to do the job.
Need to be better than an average expert. Humans are general intelligences since you can train a human to do anything, so a general intelligent machine needs to be able to be trained and become equal to human experts, matching an average untrained human isn't worth much.
Which might turn out to be correct. Might be wrong also. We have no priors to AGI developing. Only NGI, and we know preciously little about how to achieve NGI too, except the bedroom way.
We already understand how transformers work: their architecture can learn to approximate a very large class of functions-on-sequences (specifically continuous sequence-to-sequence functions with compact support: https://arxiv.org/abs/1912.10077). It can do it more accurately than previous architectures like RNNs because transformers don't "forget" any information from prior items in the sequence. Training transformers to predict the next item in sequences eventually forces them to learn a function that approximates a world model (or at least a model of how the world behaves in the training text/data), and if they're large enough and trained with enough data then this world model is accurate enough for them to be useful for us.
If you're asking for understanding the actual internal world model they develop, it's basically equivalent to trying to understand a human brain's internal world model by analysing its neurons and how they fire.
I hope you are not part of the founding team but if you are, you truly are doing your startup a disservice. Sharing your startup/ideas is great but doing it in the form of an advertisement "underlying approach introduced in Reexpress as among the more significant results of the first quarter of the 21st century" is just weird.
I'll defend the idea that it was obvious. (Although, it wasn't obvious to me until someone pointed it out, so maybe that's not obvious.)
If you watch this video[0], you'll see in the first frame that there is a clear boundary between learning rates that converge or not. Ignoring this paper for a moment, what if we zoom in really really close to that boundary? There are two possibilities, either (1) the boundary is perfectly sharp no matter how closely we inspect it, or (2) it is a little bit fuzzy. Of those two possibilities, the perfectly sharp boundary would be more surprising.
I don't think it's obvious per se, but people who have studied numerical methods at the graduate level have likely seen fractal boundaries like this before - even Newton's method produces them [0]. The phenomenon says more about iterative methods than it says about neural networks.
I think the "obvious" comment was a bit snarky, but out of curiosity, I posed the question to the Groq website which currently happens to be on the front page right now. (It claims to run Mixtral 8x7B-32k at 500 T/s)
And indeed, the AI response indicated that the boundary between convergence and divergence is not well defined, has many local maxima and minima, and could be quote: "fractal or chaotic, with small changes in hyperparameters leading to drastically different outcomes."
> 5. It has deep implications for the trajectory of a technology that many see as heralding a revolution at least as significant as — if not more than — agriculture or industry, with truly existential implications for humanity.
Yes, this is likely to be one of the most important events in human history. We are living through a special period of evolution on Earth.
In another thread such a measurement has been discussed.
Unfortunately such measurements are not enough for definitive conclusions, because all that they can show is that below some temperature the resistance becomes smaller than the error of the measuring equipment.
Moreover, because the samples obtained so far are extremely inhomogeneous, they do not show a definite transition temperature, which would have made superconductivity much more plausible.
In any case, the results obtained so far are enough for showing that it is worthwhile to invest time and money for developing a method of producing samples that are larger and more pure, because whatever they are, they must have unusual properties.
The best would be to develop a method for making monocrystals, because this material has an asymmetric crystal structure, which might have very anisotropic properties.
The most interesting properties of semiconductors could not be discovered during the first century after their discovery by Faraday, because they were not available as pure monocrystals. Only after the methods for growing pure crystals have been developed during WW2 (for radar diodes), it became possible to measure the intrinsic properties of the semiconductors and to design new devices using them, and then the semiconductor industry has grown exponentially.
With this kind of non-metallic superconductor, the problems may be similar.
"Analysing the agent’s internal representations, we can say that by taking this approach to reinforcement learning in a vast task space, our agents are aware of the basics of their bodies and the passage of time and that they understand the high-level structure of the games they encounter."
Wow, really amazing if true.
P.S.: After looking into their paper, it's not that impressive. They use agent's internal states (LSTM cells, attention outputs, etc.) to predict whether it is
early in the episode, or whether the agent is holding an object.
> it's not that impressive. They use agent's internal states (LSTM cells, attention outputs, etc.) to predict whether it is early in the episode, or whether the agent is holding an object.
That seems like a decent definition of awareness to me. The agent has learned to encode information about time and its body in its internal state, which then influences its decisions. How else would you define awareness? Qualia or something?
I think it would be perfectly reasonable to describe any RNN as being "aware" of information that it learned and then used to make a decision.
"Possess awareness" seems like loaded language though, evoking consciousness. In that direction I'd just quote Dijkstra: "The question of whether a computer can think is no more interesting than the question of whether a submarine can swim."
"Aware" is probably overly anthropomorphized language there. What they mean to say is that all these things have become parameterized within the model.
It would be interesting to see what would happen if they added social dynamics between the agents...like some space for theory of mind (what is that agent thinking), mimicry, communication, etc.
From the article: "Because the environment is multiplayer, we can examine the progression of agent behaviours while training on held-out social dilemmas, such as in a game of “chicken”. As training progresses, our agents appear to exhibit more cooperative behaviour when playing with a copy of themselves. Given the nature of the environment, it is difficult to pinpoint intentionality — the behaviours we see often appear to be accidental, but still we see them occur consistently."
I think this is a key insight. Human exceptionalism is, in my opinion, an extremely flawed assertion based on a sample size of one, yet it is widely accepted. Actual evidence does not support the idea that awareness of self and other “hallmarks of intelligence “ require anything more advanced than an insect, or perhaps even fungi.
This is a well known problem. The noise is due to mu-law compression. The 16 bit audio samples are compressed to 8, 9, or 10 bits before feeding to the neutral net. The reason is because predicting a categorical distribution of 2^16 values requires too many parameters. The noise was also in samples from the famous Wavenet from Deepmind (they used 8 bit mu law).
There are two ways to avoid this: 1. predict 8 high (coarse) bits, 8 low (fine) bits separately as in the original waveRNN paper.
2. use a mixture of logistic distributions as the predictive output as in the recent Lyra vocoder from Google.
How does the number of parameters scale with resolution?
Specifically, how much slower this would be if the audio was, say, 10 bits?
I recall a lab exercise in college where we were supposed to increase the resolution of a quantizer until we reached a decent tone and 10 bits were the point at which we reached satisfying quality.
It is a single matrix multiplication to predict probabilities of all possible outputs. For example, with a hidden state of 1024 dimensions, and 8 bits output, it is 1024x256 parameters. 10 bits will need 1024x1024 params.