More

xcodevn · 2024-06-03T18:37:28 1717439848

tldr: mamba is not as good as transformer.

metalloid · 2024-06-06T09:20:43 1717665643

Can you elaborate more?

xcodevn · 2024-05-09T17:30:45 1715275845

Of course, it will require exponential data for zero shot. The keyword here is zero shot. If you think about it for a second, this applies to humans too. We also need exponential training data to do things without examples.

IIAOPSW · 2024-05-09T18:25:28 1715279128

When we learn the grammar of our language, the teacher does not stand in front of the class and proceed to say a large corpus of examples of ungrammatical sentences, only the correct ones are in the training set.

When we learn to drive, we do not need to crash our car a thousand times in a row before we start to get it.

When we play a new board game for the first time, we can do it fairly competently (though not as good as experienced players) just by reading and understanding the rules.

xcodevn · 2024-05-09T18:35:01 1715279701

please help yourself and do a quick Google search about "zero shot" and "few shot" learning.

elicksaur · 2024-05-09T23:31:02 1715297462

You could explain them instead of being snarky.

https://xkcd.com/1053/

xcodevn · on March 12, 2024

I must say, understanding how transformers work is arguably the most important research problem in history, assuming that AGI can be achieved by just scaling up current LLM models on text, video, audio, etc.

kindking · on March 12, 2024

That is a very big assumption.

kromem · on March 12, 2024

It really depends on the definition.

"Better than the average human at most profitable tasks" is a much lower bar than most people on HN might think.

I have vendors who instead of filling out a web form which remembers their inputs and eventually even fills everything out for them instead print it out and fax it back in.

We're probably only about 2-3 years away from transformers being self-optimizing enough in prompts and evaluations to outpace the average worker in most tasks in most roles. (It won't necessarily be that much cheaper after the multiple passes and context windows required, and crucially probably won't be better at all tasks in most roles.)

If you define AGI as "better than any human at profitable tasks" or "better than average at all tasks" then yes, we're a long ways off and transformers alone probably won't get us there.

lm28469 · on March 12, 2024

> "Better than the average human at most profitable tasks"

I think the HN crowd forgets that what really runs the world are min wage workers running around and doing real world things, not code monkeys and glorified type writers filling excel sheets. So yes, replacing the bullshit jobs we invented to keep people busy will be relatively easy, that's if you don't account for the fact you'll now have to create bullshit+ jobs to keep them busy

And even then we're far away, sure it can shit out code for a todo webapp and create semi realistic images of a monkey eating a burrito but that's about it. More than a year ago someone betted against me here that chatgpt would revolutionise the world in the next year, nothing happened really, geeks are excited, execs are buying the hype, tons of money is transferred, yet there was no 4th industrial revolution.

What happened though is that the web is flooded with absolutely useless content, amazon is full of ai generated books, students rely more and more on chatgpt to generate homeworks, thesis, "find" solutions, &c. it might very well end up being a net negative for the average joe in the long run

hotdogscout · on March 12, 2024

>I think the HN crowd forgets that what really runs the world are min wage workers running around and doing real world things, not code monkeys and glorified type writers filling excel sheets.

This is not true at all. How many products do you use that come primarily from minimum wage workers?

If a few people responsible for Google maps running stopped working the GDP loss would be much bigger than if magnitudes more minimum wage workers did the same.

diggan · on March 12, 2024

Forget "how many", without those people, you'd end up without food on your table. Then the rest of the products you use wouldn't matter.

hotdogscout · on March 12, 2024

Also not true, I know the big farms of my state and they don't depend on minimum wage workers and I live in a third world country.

I doubt agricultural wages or truck drivers are minimum wage jobs where you live. Assuming US https://www.nass.usda.gov/Charts_and_Maps/Farm_Labor/fl_allw...

diggan · on March 12, 2024

Farm work, especially work that doesn't require specialization (planting, maintaining, harvesting), is pretty much minimum wage work where I live, in Spain. Minimum wage here is ~1300 EUR / month. But it also differs wildly by region here, as some regions are really poor while others rich (relatively).

Besides the farm work, there is food processing workers (cutting, cleaning, basically assembly lines), packaging, workers at warehouses, people who work at the counters of the store, and all the support roles for those positions. If you go outside and eat, you have all the restaurant personnel to take into account as well.

There is a lot of low skilled labor that goes into making what we eat today. I'm not sure how you could possibly claim that none of those people are on minimum wage.

hotdogscout · on March 12, 2024

Not all of the work you cited is essential. Would society crumble without retail?

Minimum wage in Spain is significantly more money than anything I've made in my life. It's a very comfortable position for the vast majority of the world.

>There is a lot of low skilled labor that goes into making what we eat today. I'm not sure how you could possibly claim that none of those people are on minimum wage.

People doing essential work that isn't trivially replaceable have the bargaining power to charge more than the minimum wage in a moderately free market of human work, usually they do.

squigz · on March 12, 2024

> Not all of the work you cited is essential. Would society crumble without retail?

Did I miss the part where the other comment mentioned retail, or where you respond to the half dozen other examples of essential work?

> Minimum wage in Spain is significantly more money than anything I've made in my life. It's a very comfortable position for the vast majority of the world.

Instead of moving the bar some more, could you just define what minimum wage would be an acceptable bar for you in this conversation?

hotdogscout · on March 12, 2024

Yes you missed retail, read it again.

https://uk.indeed.com/career/warehouse-worker/salaries

Do you really need me to Google every single essential position known before conceding that society is not maintained by minimum wage workers?

squigz · on March 12, 2024

No, but I sure would like it if you defined it a bit better, since apparently we're now talking about UK numbers.

lm28469 · on March 12, 2024

France is one of if not the biggest agriculture power in Europe, most farmers can't even generate a 35hr min wage equivalent while working 80+ hours a week.

20% of them live in poverty, half of them make less than 22k euros a year

Truck drivers earn between min wage and 150% of min wage, while being on the road every day and not having a social life, they drive 8 hours per day and sleep in their fucking truck while some code monkey makes 300k+/year coding memeojis at apple. Guess which ones will be automated first by openai lmao

qvrjuec · on March 12, 2024

>Truck drivers earn between min wage and 150% of min wage

Where are you getting this information? It's absolutely wrong. Long haul truckers (the one's you're saying don't have social lives because they drive 8 hours per day) make $71,196 on average in the US[1].

[1] https://www.ziprecruiter.com/Salaries/LONG-HAUL-Truck-Driver...

allendoerfer · on March 12, 2024

He is talking about France in the sentence before. There are barely any truckers in Germany with a German nationality. They are simply not competitive. Same goes for package delivery.

Just imagine what would happen to a trucker's salary in the US if it were to create a unified market with Mexico and all of Central America.

https://en.wikipedia.org/wiki/2004_enlargement_of_the_Europe...

It's not necessarily a bad thing. Economies of Eastern European countries have been growing after all and Western Europe does not have enough workers because of its demographics anyway. My take is, that everybody is winning, there is less poverty than before, but some sideffects look ugly for a while.

hotdogscout · on March 12, 2024

>half of them make less than 22k euros a year

I'd be extremely happy making this amount. Some people are just accustomed to easier lives.

>while some code monkey makes 300k+/year coding memeojis at apple.

Meme position for a privileged caste statistically irrespective of skill in an institution that can piss money on anything and still succeed.

CamperBob2 · on March 12, 2024

Take a look at what happened to farm employment figures over the last 100 years.

It was a good thing.

lm28469 · on March 12, 2024

The people picking trash in my street stopped working for 2 days and it looks like I live in some third world country now, two fucking days and it looks like I live in the middle of an open air dump

If trucks stopped deliveries every city would die in a week

If construction workers stopped building / maintaining we'd be in deep shit in 6 months or less

If the people in warehouses stopped working for a week the economy would tank like it rarely does

Nurses, doctors, bus/tram/train drivers, police, firefighters, ambulances, janitors, trash pickers, plumbers, sewer workers, electricians, people taking care of water treatment plants, power plants, teachers, social workers, ...

You could delete facebook, openai, instagram, twitter, netflix, tesla and 90% of startups from the face of the earth right now and I'd have the exact same life as yesterday. Remove any of the people I mentioned above and society would crumble in no time

And none of these are even remotely close to being automated at all, nobody cares about most of these jobs. But hey, here is a dancing kangaroo: https://www.youtube.com/watch?v=Zuivg5rz_aA

hotdogscout · on March 12, 2024

Are any of the positions you cited minimum wage workers where you live? Again, assuming US:

https://money.cnn.com/2016/02/24/news/economy/trash-workers-...

https://money.usnews.com/careers/best-jobs/garbage-collector...

>You could delete facebook, openai, instagram, twitter, netflix, tesla and 90% of startups from the face of the earth right now and I'd have the exact same life as yesterday. Remove any of the people I mentioned above and society would crumble in no time

Yes because you picked non-essential work. (?)

lm28469 · on March 12, 2024

> Yes because you picked non-essential work. (?)

Then again tell me who we're automating out of the work force right now ? Trash pickers or code monkeys ? Truck drivers or artists ?

evilduck · on March 12, 2024

Growing up my trash was picked up by a human and the truck crew had two or three people on it jogging house to house to pick up trash as the driver slow rolled through the neighborhood.

Now my trash is serviced by one person who mostly never leaves the cab and who would be better described as a skilled machine operator than as a menial labor role. The work isn't completely automated but technology has reduced the job market for trash truck crews by two-thirds. I'm guessing the barrier is higher now too, requiring training and certifications to run the robotics on the truck instead of physical fitness being the primary prior qualification.

fkyoureadthedoc · on March 12, 2024

All of them? We've been working on and succeeding at automating physical tasks for decades.

hotdogscout · on March 12, 2024

Essential and non-replaceable are different concepts.

rvnx · on March 12, 2024

Even further, maybe the world would actually be better without these companies.

Now that there are great inventions like TikTok, teenagers are depressed as hell, and they don't go to meet each other to play soccer together, because the "social" networks are giving the illusion of having that connection.

joenot443 · on March 12, 2024

> I think the HN crowd forgets that what really runs the world are min wage workers running around and doing real world things

Is this really true? It's certainly a nice soundbyte when you're making class arguments or trying to dunk on the "HN crowd", but I think it falls apart under any level of scrutiny.

Who keeps your lights on? Who drives your bus? Who manages your sewage? Who teaches your kids? Who builds your roads? None of them make minimum wage and would probably be a little insulted to be characterized as such.

It's pretty reductionist to call anyone outside our realm of tech a "min wage worker", they're just workers like or I. I think it's a pretty stupid and pointless exercise to subdivide people into useful or non-useful workers, serving no purpose but to further pet the smugness of HN AI skeptics.

weakfish · on March 12, 2024

I think this comment focuses too much on the “minimum wage” aspect - the core of the argument is that those are roles not at risk to AI in its present state, not necessarily the compensation aspect

_Algernon_ · on March 12, 2024

>there was no 4th industrial revolution.

Yet. The industrial revolution didn't happen in a year either.

CamperBob2 · on March 12, 2024

"More than a year ago"? Really? What did anyone think was going to happen in a year?

This sort of thing usually takes longer than you expect it to, and then it usually happens faster than you expect it to.

ecoquant · on March 12, 2024

This all reminds me of my asshole great uncle making fun of me as a teenager circa 1997 while I was at the computer and on the internet.

Sarcastically asking me "Can you meet girls on that?" and then laughed.

He wasn't wrong in the short term but laughably wrong in the long term.

shafyy · on March 12, 2024

> "Better than the average human at most profitable tasks"

This is not the definition of AGI. You can't just make up a random definition to fit your argument, lol.

falcor84 · on March 12, 2024

I'm actually all in on people making up new definitions for vague terms at the start of an argument as long as they're explicit about it.

And I particularly like this one, which is much more clearly measurable. If you feel AGI is taken, maybe we should coin this one as APGI or something like that

worldsayshi · on March 12, 2024

I don't think the main intention was to define AGI but to zoom in on an interpretation of AGI that would provide enough value to be revolutionary.

ibejoeb · on March 12, 2024

What is the definition? The definition of AGI is one of the central points of contention in the biggest industry legal battle.

SilverBirch · on March 12, 2024

It's only in contention because 1 of the sides has a tonne of money, a hurt ego, and is willing to pay lawyers to argue the sky is red in order to get revenge on his former colleagues. I don't think anyone would seriously claim OpenAI has achieved AGI today.

exe34 · on March 12, 2024

No, what they have is several narrow ASIs.

exe34 · on March 12, 2024

No that's the economically dominating definition. The philosophical one will happen much later or may never happen, but human society may change beyond recognition with the first one alone.

hnben · on March 12, 2024

"The philosophical one" seems to get updated with every new breakthrough. 20 years ago, GPT3 would have been considered AGI (or "strong AI", as we called it back then).

https://en.wikipedia.org/wiki/Artificial_general_intelligenc...

exe34 · on March 12, 2024

Dennett describes it as real magic. The magic that can be performed is not considered real magic (it's merely a trick of confidence), whereas real magic is that which couldn't possibly be done.

infecto · on March 12, 2024

My understanding is that AGI has no formal definition as it means different things to different people.

The poster here created his own definition, but what is wrong with that? He set a very specific bar to achieve, something that most "high-level" thinkers in the space have not really done. Isn't the point of discourse to bring your ideas to the table?

belter · on March 12, 2024

This is the correct way to approach the answer to when and how to achieve AGI. Otherwise please present here your Engineer Specification for defining AGI...

On a timeframe for achieving AGI: https://youtu.be/cEg8cOx7UZk?t=1658

kromem · on March 13, 2024

This is literally Sam Altman's definition:

https://www.businessinsider.com/sam-altman-thinks-agi-replac...

johnthewise · on March 12, 2024

There is no single definition of AGI. Performing most intellectual tasks human perform today is both general and a form of intelligence, so I too agree with it.

goatlover · on March 12, 2024

Robotics is more important to AGI, because the bulk of human intelligence comes from manipulating and navigating the physical world. Which includes a large amount of social interaction. Allan’s are tools to assist humans. They aren’t automating most jobs away anytime soon.

Buttons840 · on March 12, 2024

> I have vendors who instead of filling out a web form which remembers their inputs and eventually even fills everything out for them instead print it out and fax it back in.

Somewhere along the way we built computer that are so intuitive people find printing and faxing easier than our web apps. This isn't completely the fault of any single web app, users have a lot of learned avoidance because of bad experiences with many apps.

In the end completely automating the job ended up being easier than building a good interface for a human to do the job.

kromem · on March 13, 2024

AI as a reconstruction of UX is one of the more interesting angles in the next few years.

Jensson · on March 12, 2024

Need to be better than an average expert. Humans are general intelligences since you can train a human to do anything, so a general intelligent machine needs to be able to be trained and become equal to human experts, matching an average untrained human isn't worth much.

ImHereToVote · on March 12, 2024

Which might turn out to be correct. Might be wrong also. We have no priors to AGI developing. Only NGI, and we know preciously little about how to achieve NGI too, except the bedroom way.

furyofantares · on March 12, 2024

We have a lot of priors - everything we've ever done has not produced AGI.

Maybe scaling transformers is the best way forward. I'm hopeful. But it's a big assumption that it will produce AGI.

joaogui1 · on March 12, 2024

In vitro fertilization too!

logicchains · on March 12, 2024

We already understand how transformers work: their architecture can learn to approximate a very large class of functions-on-sequences (specifically continuous sequence-to-sequence functions with compact support: https://arxiv.org/abs/1912.10077). It can do it more accurately than previous architectures like RNNs because transformers don't "forget" any information from prior items in the sequence. Training transformers to predict the next item in sequences eventually forces them to learn a function that approximates a world model (or at least a model of how the world behaves in the training text/data), and if they're large enough and trained with enough data then this world model is accurate enough for them to be useful for us.

If you're asking for understanding the actual internal world model they develop, it's basically equivalent to trying to understand a human brain's internal world model by analysing its neurons and how they fire.

theGnuMe · on March 12, 2024

>Training transformers to predict the next item in sequences eventually forces them to learn a function that approximates a world model

There is absolutely no proof for this statement.

H8crilA · on March 12, 2024

Have you used ChatGPT? It can do (at least) simple reasoning, for example simple spatial reasoning or simple human behavior reasoning.

theGnuMe · on March 12, 2024

I suggest you start here:

https://ahtiahde.medium.com/limits-of-turing-machines-and-al...

If you don't have a CS background, I would suggest reading the wikipedia entries referenced in the medium article as well.

zamfi · on March 12, 2024

> it's basically equivalent to trying to understand a human brain's internal world model by analysing its neurons and how they fire

A major challenge here is that it's very hard/expensive to analyze these neurons, especially at any kind of scale within one human.

Not so with LLMs.

seydor · on March 12, 2024

We shouldn't assume that either of those tasks is impossible

xxs · on March 12, 2024

>most important research problem in history

That has to be some extremely narrowed version of all research that has happened (or will happen?)

seydor · on March 12, 2024

Considering that knowing how knowing works is at the top of the ordo cognoscendi, it s not that narrow.

resource_waste · on March 12, 2024

>assuming that AGI can be achieved by just scaling up current LLM models on text, video, audio, etc.

Is any sane person actually trying this?

I can't imagine an LLM ever going AGI.

otabdeveloper4 · on March 12, 2024

> the most important research problem in history

Probably not.

> assuming that AGI can be achieved by just scaling up current LLM models

Lmao.

xcodevn · on March 12, 2024

> Probably not.

Lmao.

on March 12, 2024

[dead]

infecto · on March 12, 2024

I hope you are not part of the founding team but if you are, you truly are doing your startup a disservice. Sharing your startup/ideas is great but doing it in the form of an advertisement "underlying approach introduced in Reexpress as among the more significant results of the first quarter of the 21st century" is just weird.

blackoil · on March 12, 2024

Yeah, I am all in for hustling but this post is way over the top, particularly for this forum.

pfd1986 · on March 12, 2024

Given his username I'd say.. he is.

Agreed: disclaimer needed.

12345hn6789 · on March 12, 2024

Re: this is an advertisement for a product by its own employees.

xcodevn · on Feb 19, 2024

I am not trying to downplay the contribution of the paper, but isn't it obvious that this is the case?

Buttons840 · on Feb 19, 2024

I'll defend the idea that it was obvious. (Although, it wasn't obvious to me until someone pointed it out, so maybe that's not obvious.)

If you watch this video[0], you'll see in the first frame that there is a clear boundary between learning rates that converge or not. Ignoring this paper for a moment, what if we zoom in really really close to that boundary? There are two possibilities, either (1) the boundary is perfectly sharp no matter how closely we inspect it, or (2) it is a little bit fuzzy. Of those two possibilities, the perfectly sharp boundary would be more surprising.

[0]: https://x.com/jaschasd/status/1756930242965606582

barbarr · on Feb 19, 2024

I don't think it's obvious per se, but people who have studied numerical methods at the graduate level have likely seen fractal boundaries like this before - even Newton's method produces them [0]. The phenomenon says more about iterative methods than it says about neural networks.

[0] https://en.wikipedia.org/wiki/Newton_fractal

eapriv · on Feb 19, 2024

Not only it is not obvious; it is not known to be true.

teaearlgraycold · on Feb 19, 2024

Obvious to whom?

bloaf · on Feb 19, 2024

I think the "obvious" comment was a bit snarky, but out of curiosity, I posed the question to the Groq website which currently happens to be on the front page right now. (It claims to run Mixtral 8x7B-32k at 500 T/s)

And indeed, the AI response indicated that the boundary between convergence and divergence is not well defined, has many local maxima and minima, and could be quote: "fractal or chaotic, with small changes in hyperparameters leading to drastically different outcomes."

xcodevn · on Nov 20, 2023

Even if OpenAI falls apart, this is still a good move.

xcodevn · on Nov 18, 2023

> 5. It has deep implications for the trajectory of a technology that many see as heralding a revolution at least as significant as — if not more than — agriculture or industry, with truly existential implications for humanity.

Yes, this is likely to be one of the most important events in human history. We are living through a special period of evolution on Earth.

xcodevn · on Oct 6, 2023

So this is what it looks like inside a car's dreams.

throw310822 · on Oct 6, 2023

Do autopilots dream of electric cars?

xcodevn · on Aug 4, 2023

Can anyone conduct a proper measurement of resistivity versus temperature and determine whether it's a superconductor or not?

adrian_b · on Aug 4, 2023

In another thread such a measurement has been discussed.

Unfortunately such measurements are not enough for definitive conclusions, because all that they can show is that below some temperature the resistance becomes smaller than the error of the measuring equipment.

Moreover, because the samples obtained so far are extremely inhomogeneous, they do not show a definite transition temperature, which would have made superconductivity much more plausible.

In any case, the results obtained so far are enough for showing that it is worthwhile to invest time and money for developing a method of producing samples that are larger and more pure, because whatever they are, they must have unusual properties.

The best would be to develop a method for making monocrystals, because this material has an asymmetric crystal structure, which might have very anisotropic properties.

The most interesting properties of semiconductors could not be discovered during the first century after their discovery by Faraday, because they were not available as pure monocrystals. Only after the methods for growing pure crystals have been developed during WW2 (for radar diodes), it became possible to measure the intrinsic properties of the semiconductors and to design new devices using them, and then the semiconductor industry has grown exponentially.

With this kind of non-metallic superconductor, the problems may be similar.

xcodevn · on July 27, 2021

"Analysing the agent’s internal representations, we can say that by taking this approach to reinforcement learning in a vast task space, our agents are aware of the basics of their bodies and the passage of time and that they understand the high-level structure of the games they encounter."

Wow, really amazing if true.

P.S.: After looking into their paper, it's not that impressive. They use agent's internal states (LSTM cells, attention outputs, etc.) to predict whether it is early in the episode, or whether the agent is holding an object.

modeless · on July 27, 2021

> it's not that impressive. They use agent's internal states (LSTM cells, attention outputs, etc.) to predict whether it is early in the episode, or whether the agent is holding an object.

That seems like a decent definition of awareness to me. The agent has learned to encode information about time and its body in its internal state, which then influences its decisions. How else would you define awareness? Qualia or something?

woeirua · on July 27, 2021

By that definition wouldn't a regular RNN or LSTM also possess awareness?

modeless · on July 27, 2021

I think it would be perfectly reasonable to describe any RNN as being "aware" of information that it learned and then used to make a decision.

"Possess awareness" seems like loaded language though, evoking consciousness. In that direction I'd just quote Dijkstra: "The question of whether a computer can think is no more interesting than the question of whether a submarine can swim."

pcl · on July 27, 2021

Ooh, that’s a great quote.

I’d say that it’s no less interesting, either.

Tarq0n · on July 27, 2021

"Aware" is probably overly anthropomorphized language there. What they mean to say is that all these things have become parameterized within the model.

jcims · on July 27, 2021

It would be interesting to see what would happen if they added social dynamics between the agents...like some space for theory of mind (what is that agent thinking), mimicry, communication, etc.

leesec · on July 27, 2021

From the article: "Because the environment is multiplayer, we can examine the progression of agent behaviours while training on held-out social dilemmas, such as in a game of “chicken”. As training progresses, our agents appear to exhibit more cooperative behaviour when playing with a copy of themselves. Given the nature of the environment, it is difficult to pinpoint intentionality — the behaviours we see often appear to be accidental, but still we see them occur consistently."

phreeza · on July 28, 2021

There is also some other work from deepmind in this direction: https://deepmind.com/research/publications/machine-theory-mi...

jcelerier · on July 27, 2021

the main question of course being, aren't we anthropomorphizing ourselves too much ?

K0balt · on July 27, 2021

I think this is a key insight. Human exceptionalism is, in my opinion, an extremely flawed assertion based on a sample size of one, yet it is widely accepted. Actual evidence does not support the idea that awareness of self and other “hallmarks of intelligence “ require anything more advanced than an insect, or perhaps even fungi.

futureshock · on July 27, 2021

When people say this kind of stuff, I wonder whether there might not be philosophical zombies among us.

the8472 · on July 28, 2021

[ ] To prove that you are human please describe how you are observably different from embodied, general, adaptive agents in 200 words.

xcodevn · on April 14, 2021

This is a well known problem. The noise is due to mu-law compression. The 16 bit audio samples are compressed to 8, 9, or 10 bits before feeding to the neutral net. The reason is because predicting a categorical distribution of 2^16 values requires too many parameters. The noise was also in samples from the famous Wavenet from Deepmind (they used 8 bit mu law).

There are two ways to avoid this: 1. predict 8 high (coarse) bits, 8 low (fine) bits separately as in the original waveRNN paper. 2. use a mixture of logistic distributions as the predictive output as in the recent Lyra vocoder from Google.

Tade0 · on April 14, 2021

How does the number of parameters scale with resolution?

Specifically, how much slower this would be if the audio was, say, 10 bits?

I recall a lab exercise in college where we were supposed to increase the resolution of a quantizer until we reached a decent tone and 10 bits were the point at which we reached satisfying quality.

xcodevn · on April 14, 2021

It is a single matrix multiplication to predict probabilities of all possible outputs. For example, with a hidden state of 1024 dimensions, and 8 bits output, it is 1024x256 parameters. 10 bits will need 1024x1024 params.