They’re both exploring the same space of optimizing the memory needed by the KV cache which is essentially another name for the context window (no one elides the KV cache as otherwise you’re doing N^2 math to do attention). They’re exploring different approaches to achieve the same goal and they may be both possible to apply simultaneously to reduce the attention mechanism to almost 0 memory usage which would be really cool, but I’m curious how they compare against each other individually.
The only memory mechanism within an LLM as far as I know is the attention mechanism where it compares all previous tokens to generate a probability distribution for the next token to generate. The attention mechanism has a thing called a KV cache to take the O(n^2) matrix math down to O(n) by caching and reusing the results of some math from previous tokens. The size of how many tokens the context will cover is called the context window (e.g. 128k for Llama).
The articles use very similar verbiage.
> The context window can be considered the model’s working memory
Snip
> Universal transformer memory optimizes prompts using neural attention memory models (NAMMs), simple neural networks that decide whether to “remember” or “forget” each given token stored in the LLM’s memory.
snip
> Meanwhile, by discarding unnecessary tokens, NAMM enabled the LLM model to save up to 75% of its cache memory while performing the tasks.
You just have to be familiar with the wording in the space and read enough literature. Here’s more direct wording from the NAMM paper:
> NAMMs use evolution to optimize the performance of LMs by pruning their KV cache memory. Evolved NAMMs can be zero-shot transferred to other transformers, even across input modalities and task domains.
This is all related work about shrinking the size of the KV cache as the context grows both due to memory and it also has a speed up effect since you’re not having to attend all the tokens (O(n) -> sublinear with the size of the context).
Context is critical in the LLM answering correctly and remembering all the information given to it + everything it said. Typical limits for open models these days are 128k but with techniques like this it could scale even further allowing better performance on thing like code completion.
I thought the context would also have floating point numbers so that tokens would be included in a more fuzzy way, and that when requests are sent it would result in loading slightly different tokens into the cache. Yeah my understanding certainly is limited and I’d like to study it more. Thanks for the response, I see more similarity now.
The word you're looking for is latent space and yes, everything in the compute graph, including context cache & compute is done in latent space. Literal input tokens are first converted to latent space through the embedding layer and literal output tokens are generated by converting the last compute tensor into token probabilities & taking the most probable token. Everything in the middle though happens in the "floating point" latent space.
When you hear something like "it's attending all previous tokens" IMHO it's not strictly the correct explanation since you're attending through latent space which doesn't actually correspond 1:1 with tokens but is a multidimensional representation of that token & all preceding tokens as understood by that attention head. But conceptually it's how it's described because the size of your context goes up by 1 tensor for every token you process, even though applying attention actually ends up changing all tensors in the KV cache (hence self-attention). Also important to note that each attention head within each layer has it's own KV cache. LLMs are an autoregressive family of models where the output of each layer feeds into the input of the next and each layer has a transformer performing attention. That's another reason why it's not strictly correct to think of it as tokens make up your context because there's actually many many contexts within a transformer model. That's why your 128k context window can be ~15 GiB for a naiive inference implementation - 128k context window * 1024 * 1024-element tensor * 2 bytes per tensor * 8 attention heads * 8 layers (or something along those lines). And that's what this work is talking about shrinking (as does the HeadKV).
> tokens would be included in a more fuzzy way, and that when requests are sent it would result in loading slightly different tokens into the cache
The entire process of LLMs is generally actually 100% deterministic based on the same inputs & given a fixed seed for the RNG (modulo bugs in the inference math / bugs in HW/SW for the accelerator). Some inference implementations don't guarantee this property in the face of concurrent requests & you can't control the seed for hosted LLMs which is why it seems like random responses for the same query.
The KV cache feels more like a graph to me, like in the RDF sense. Each parameter could be numbered and given a URL it seems. I have some studying to do. I think building a simple neural net and looking at raw data for context in whatever LLM I’m playing with in Ollama are good things to try.
This isn't like lossless compression. Both techniques involve throwing lots of information away, with the justification that doing so does not significantly affect the end result.
The extent to which using both the techniques together will help will depend on how much overlap there is between the information each ends up discarding.
Modern LLMs are still quite inefficient in their representation of information. We're at like the DEFLATE era and we've still yet to invent zstd where there's only marginal incremental gains; so right now there's a lot of waste to prune away.
Is it possible that after 3-4 years of performance optimizations, both algorithmic and in hardware efficiency, it will turn out that we didn’t really need all of the nuclear plants we’re currently in the process of setting up to satisfy the power demands of AI data centers?
Nobody is building nuclear power plants for data centres. A few people have signed some paperwork saying that they would buy electricity from new nuclear plants if they could deliver it at a certain price, a price mind you that has not been done before. Others are trying to restart an existing reactor at three mile island (a thing that has never been done before, and likely won't be done now since the reactor was shut down due to being too expensive to run).
And certainly nobody is building one in the next 3-4 years; they'd be lucky to finish the paperwork in that time.
Electric cars are causing exactly the same problem.
Also the "recs" appear to be based on a lie. Overall, an increase in load can only be green if there is added new green generation to service that load.
Except the alternative to electric are petrol/diesel cars which are worse than electric cars run on gas or coal. The pollution no longer occurs in population zones, and the grid can be cleaned up without changing the car.
The alternatives for these data centres are either build renewables or not build the data centres, both of which are better.
> Nobody is building nuclear power plants for data centres. A few people have signed some paperwork saying that they would buy electricity from new nuclear plants if they could deliver it at a certain price, a price mind you that has not been done before.
Not building new, but I think Microsoft paying to restart a reactor at Three Mile Island for their datacenter is much more significant than you make the deals sound:
Microsoft isn’t paying to restart. A PPA is a contract saying they will purchase electricity for a specified price for a fixed term. Three Mile Island needs to be able to produce the electricity for the specified price for Microsoft to pay buy it. If it’s above that price Microsoft is off the hook.
> Constellation closed the adjacent but unconnected Unit 1 reactor in 2019 for economic reasons, but will bring it back to life after signing a 20-year power purchase agreement to supply Microsoft’s energy-hungry data centers, the company announced on Friday.
The reactor they’re restarting was operational just five years ago. It’s not a fully decommissioned or melted down reactor and it’s likely all their licensing is still valid so the red tape, especially environmental studies, is mostly irrelevant. Getting that reactor back up and running will be a lot simpler than building a new one.
The largest problem will be finding qualified and vetted personnel. All the people who worked at the plant when it closed five years ago had to find jobs elsewhere. Even though the plant was an important employer in Middletown, I don't know if those former employees will be willing the quit there current jobs to go back, especially if there is a risk the plant will just be shut down again when it once again becomes too expensive to operate.
Look, I agree that nuclear is difficult, but Google and Microsoft have publicly committed to those projects you’re mentioning. I don’t understand your dismissive tone that all of it is hogwash? This is one of those HN armchair comments.
My tone is because this is a simple predatory delay strategy.
Tomorrow, tomorrow, I’ll decarbonize tomorrow.
Instead of paying to buy wind and solar plants, which can go up today they are signing a meaningless agreement for the future.
A PPA isn’t worth the paper it’s written on if the seller can’t produce electricity at the agreed upon price by the date required.
Take Three Mile Island. It was closed in 2019 since it was uneconomical to run. Since then renewables have continued getting substantially cheaper, while the reactor has been in the process of decommissioning.
Instead of spending money on building wind and solar, Microsoft saw how well Vogtle went and decided that another first of it’s kind nuclear project is the best way to make it appear like they’re doing something.
The logic is pretty straightforward I’m not sure what your complaint is. They don’t need the power now, but they calculate that they’d need much more power in the future than non nuclear ways of power generation would be able to give them in the same timeframe.
The US is already adding record amount of solar and wind power to replace coal and natural gas plants. What makes you believe Microsoft can just buy more renewable energies? Did you privy to the terms and conditions of the PPA?
The alternative to Three Miles Island restart would be to add natural gas plants, or to buy renewable energy at higher price. I’m sure they have plan B.
I've been told my entire life that it's too late for nuclear, we should have been building them 20 years ago.
I think now's fine, even if it takes time. these companies already buy a ton of power from renewable sources, and it's good to diversify - nuclear is a good backup to have.
what does it mean that "the west tried" - was it a technical failure or was it that people didn't want it in their backyard? just because people hate something doesn't mean that they don't need it. children hate spinach.
There was talk of an ongoing nuclear renaissance in the early 2000s. [1]
American companies and utilities announced 30 reactors. Britain announced ~14.
We went ahead and started construction on 7 reactors in Vogtle, Virgil C. Summer, Flamanville, Olkiluoto and Hanhikivi to rekindle the industry. We didn't believe renewables would cut it.
The end result of what we broke ground on is 3 cancelled reactors, 3 reactors which entered commercial operation in the 2020s and 1 still under construction.
The rest are in different states of trouble with financing with only Hinkley Point C slowly moving forward.
In the meantime renewables went from barely existing to dominating new capacity (TWh) in the energy sector.
Today renewables make up 2/3rds of global investment in the energy sector.
The failure of nuclear power is that it is horrifically expensive and the timelines are insane compared to the competition.
Steam locomotives technically work, but are like nuclear power uncompetitive.
Lately nuclear power has caught the imagination of conservative politicians as a method to delay the renewable disruption of the fossil industry and have an answer to climate change.
When their plans, like in Australia, get presented they don’t care the slightest about nuclear power and it is only a method to prolong the life of the coal and gas assets.
> American companies and utilities announced 30 reactors. Britain announced ~14.
Lots of projects get announced, they aren't meant to be promises.
> The end result of what we broke ground on is 3 cancelled reactors, 3 reactors which entered commercial operation in the 2020s and 1 still under construction.
So there are three operational reactors and another one almost ready. I'm surprised we got that after Fukushima.
> Today renewables make up 2/3rds of global investment in the energy sector.
So we should not invest in anything else?
> Steam locomotives technically work, but are like nuclear power uncompetitive.
This is a terrible analogy.
> Lately nuclear power has caught the imagination of conservative politicians as a method to delay the renewable disruption of the fossil industry and have an answer to climate change.
People who have been advocating for more nuclear power should stop because it is a conservative issue now?
Which would have moved forward towards completion if the economic calculus made sense.
We should of course continue with basic research. But, without some incredible breakthrough nuclear power will only serve climate change deniers agenda in delaying the renewable buildout.
This is what you sign up for when proposing investing in nuclear power in 2024:
> The opposition last week released modelling of its “coal-to-nuclear” plan that would slow the rollout of renewable energy and batteries and instead rely on more fossil fuel generation until a nuclear industry could be developed, mostly after 2040.
in other words, it's too late to build nuclear, let's bury our heads in the sand and hope somehow we have enough renewable in 20 years and we're not still using the coal/gas.
The bury our heads in the sand part seems to be you projecting.
The research disagrees with you. Whenever new built nuclear power is included in the analysis the results becomes prohibitively expensive.
> Focusing on the case of Denmark, this article investigates a future fully sector-coupled energy system in a carbon-neutral society and compares the operation and costs of renewables and nuclear-based energy systems.
> The study finds that investments in flexibility in the electricity supply are needed in both systems due to the constant production pattern of nuclear and the variability of renewable energy sources.
> However, the scenario with high nuclear implementation is 1.2 billion EUR more expensive annually compared to a scenario only based on renewables, *with all systems completely balancing supply and demand across all energy sectors in every hour*.
> For nuclear power to be cost competitive with renewables an investment cost of 1.55 MEUR/MW must be achieved, which is substantially below any cost projection for nuclear power.
It may cost more, but it is constant generation, and we should invest in as many carbon neutral alternatives as possible that are feasible. The fact that you have a political opposition to it because of conservative opportunists using it for their own agenda is irrelevant.
That is a down right hostile environment for nuclear power which relies on being able to output at 100% 24/7 all year around to only be horrifically expensive.
In the land of infinite resources and infinite time "all of the above" is a viable answer. In the real world we neither have infinite resources nor infinite time to fix climate change.
Lets focus our limited resources on what works and instead spend the big bucks on decarbonizing truly hard areas like aviation, construction, shipping and agriculture.
'Plenty of places' is not all places and you want to completely count out a significant energy generating ability because you are annoyed that it doesn't agree with your politics. If it isn't feasible then they won't build it -- by going around and advocating against it you are doing the same thing that happened in the 70s and 80s -- removing a perfectly valid option for energy that we need and will otherwise be fulfilled in any other way if not provided -- almost always with fossil fuels. If you can guarantee every place for all time will be fine with renewables, I'd like to see it, otherwise, why not step back and let engineers and scientists evaluate instead of grandstanding against an option?
What places aren’t covered by the spectrum with Denmark for higher latitudes and Australia for the near the equator?
I’m advocating against wasting public money on nuclear power pretending it is a solution to climate change.
Have at it with your own money.
I already provided you with the scientists and engineers, but you seem to have completely disregarded them because they did not align with what you wanted.
I can do it again:
The research disagrees with you. Whenever new built nuclear power is included in the analysis the results becomes prohibitively expensive.
> Focusing on the case of Denmark, this article investigates a future fully sector-coupled energy system in a carbon-neutral society and compares the operation and costs of renewables and nuclear-based energy systems.
> The study finds that investments in flexibility in the electricity supply are needed in both systems due to the constant production pattern of nuclear and the variability of renewable energy sources.
> However, the scenario with high nuclear implementation is 1.2 billion EUR more expensive annually compared to a scenario only based on renewables, *with all systems completely balancing supply and demand across all energy sectors in every hour*.
> For nuclear power to be cost competitive with renewables an investment cost of 1.55 MEUR/MW must be achieved, which is substantially below any cost projection for nuclear power.
I agreed that it costs more and read the study you linked. You are having a hard time accepting that some people might have a different opinion than you and are taking it like they are being obstinate. Sorry it costs more, but I don't think we need to be uniformly opposed to a viable option due to cost.
I'm not making this political, I said that the politics are irrelevant. I am not advocating for more nuclear -- I am advocating keeping options on the table regardless of politics or cost, because the issue is important to the progress of our species and condensing things down by referencing single studies and talking points is short-sighted -- we have been down that road, it didn't work, let's not bind our hands needlessly.
in practice, 20 years of walking away from nuclear meant that Germany brought coal-fired stations back last year. I'm sure renewables will stop it happening again in 20 years _this time_.
Germany brought a few coal plants out mothball to prevent the collapse of the French grid when half the French nuclear fleet was off line at the height of the energy crisis.
Which then were promptly mothballed again when the French got their nuclear power under control.
Have you considered googling and checking your assumptions? May help clear up the cynical misunderstandings you appear to have.
If you had, you would’ve read that both Microsoft and Google invest heavily into wind and solar, and that Google is the largest corporate purchaser of renewables in the world. I’m not advocating for these companies, just trying to show that tech is one of the few industries that does actually care and invest into clean energy.
> Have you considered googling and checking your assumptions? May help clear up the cynical misunderstandings you appear to have.
I don't have any such misunderstanding. Perhaps consider seeing my original comment which links to an article describing Google building out solar and wind farms for its data centres.
My cynicism, which I argue is well founded, is based around tech companies signing such agreements with nuclear companies, especially when it involves doings things that have never been done before (restarting reactors and building economical SMRs, see Nuscale...).
All these agreements are likely to amount to nothing more than positive PR, greenwashing, or predatory delay. Yes, they also build out solar and wind, but their nuclear PPAs are given equal standing with projects which actually are likely to be built; so instead of having to build more solar and wind today for more real money, they can promise to buy nuclear tomorrow for no cost today.
Not one of your comments amidst this sprawling thread has a single positive fact in it. You’re blindly arguing “nuclear bad” and claiming that nuclear PPAs amount to nothing based on zero evidence? The ones we’re all discussing are the first of their kind…
> Google and Microsoft have publicly committed to those projects you’re mentioning
Google and Microsoft, or their current CEOs, today?
Amazon's CEO committed to their office employees having flexibility regarding their workplace, only about 2 years ago, yet here we are now, with said employees soon having the flexibility to be 5 days in the office, or quit the company.
CEO promises are not worth the screen time they're provided.
Have these companies signed contracts with major penalties if they back out? Those would basically be the only "close to" unbreakable bonds for them.
Because it's all marketing and greenwashing. They are training these models today using fossil fuels. By the time those nuclear reactors are online, they will have gobbled up literally every human creation to train their models multiple times and dried up several water sources.
I feel like taking Google’s commitment to something seriously is one of this things that I can very uncontroversially respond to with “is this your first day?”
All but the biggest Google fanboys know that Google is incredibly indecisive and will cut plans at a moment’s notice.
Yeah, but what I found thought provoking is what if you send the solar panels and the datacenter as well for training. No need to transmission of power down to earth. I guess then it becomes a heat dissipation and hardware upgrade and maintenance. But again, thought provoking.
Heat dissipation becomes a _huge_ problem when you deploy a data center inside a perfect insulator, the vacuum of space.
Currently about a third of the energy consumption of a data center spent on cooling (heat dissipation)? And that's with the use of a huge heat sink, the earth.
Plus, I feel like GP hasn't ever seen an actual data center. One does not simply strap on on top of a rocket (even a SpaceX Starship) and toss it into LEO.
No. This is a classic case of Jevon's paradox. Increased efficiency in resource use can lead to increased consumption of that resource, rather than decreased consumption.
Example:
1. To decrease total gas consumption, more fuel efficient vehicles are invented.
2. Instead of using less gas, people drive more miles. They take longer road trips, commute farther for work, and more people can now afford to drive.
3. This increased driving leads to higher overall gasoline consumption, despite each car using gas more efficiently.
It is a paradox because there is an apparent contradiction in the fact that higher efficiency leads to higher consumption. By definition the opposite should be true.
I don't share that intuition. If I earn more money, I won't necessarily save more. I'll buy better food, better clothes, better everything and live a materially more prosperous life. My savings rate may even go down. Or up. It depends on the specifics. When a tech gets more efficient, it causes people to do more. To shape their surroundings and bend reality more to their will. If you can travel easier, you can realize your travel wishes better.
Also I think this will play out for AI as a productivity multiplier. Instead of people having less work there will be more to do since more things are worth doing now. For the following few years at least.
Because economists only call it demand to the extent that people are willing and able to make a purchase.
If someone has a need or wants something real bad but can't afford to buy the desired quantity at the prevailing price then economists don't call it demand.
Yes, the term is a bit clumsy. The way I think of it, people have desires (to drive on the highway), but are dissuaded from doing so by disincentives (it’s too busy). Adding a lane reduces the disincentive, so that latent desire is satisfied, until it reaches a new equilibrium.
But sometimes the social environment adapts and now you have to drive that amount because it got factored in and things are now built further away, so whether you want to go far is not up to you. See long commutes becoming the norm. Now, arguably long commutes lead to better job allocations and more efficient land use as people can live in one place and work in any of the workplaces within a large radius. So it's a bit more complicated than "want", but ultimately more value seems to be produced.
Is more value produced or are costs just shifted off the balance sheet onto the public commons? Driving instead of walking/public transit has certainly been profitable for some people/companies. But it has also been less than ideal from a public health standpoint. And the time spent commuting is unpaid, so while the business saves money on rent, the increase in travel time is still a cost borne by society as a whole. I would describe this as the opposite of 'efficient land use' personally.
But in the case of highways, they probably would have still gotten where they want to go by another route. The folly is treating highway capacity as being a "market" when really the decision making is much more dynamic and nuanced.
Like a market it's very complicated with many feedbacks and value judgements. For example "How much of my time is it worth sitting in traffic to get to my preferred store across town vs the closer one?"
It's a bit like queueing. The cost isn't monetary.
Ok, not maybe instead of calling it "induced", call it "latent demand" if you prefer.
People will do whatever's more convenient, so if you make driving far more convenient than everything else ("cheaper"/"more available"), they will drive.
However convenience should not be the only factor for social decisions. To take this to extremes, it would be much more convenient for J. Doe to steal a car than to buy it, so we definitely do not want to make theft convenient.
Congrats, you have independently reinvented the Hardware Overhang hypothesis: that early AGI could be very inefficient, undergo several optimization passes, and go from needing a datacenter of compute to, say, a single video game console's worth: https://www.lesswrong.com/posts/75dnjiD8kv2khe9eQ/measuring-...
In that scenario, you can go from 0 independent artificial intelligences to tens of millions of them, very quickly.
it would seem perfectly reasonable to expect the first AIs to be very unoptimized and if the AIs are any good they will be able to optimize themselves a lot and even help design ASICs to help run them.
Are we setting up nuclear plants for AI data centers? If so, I see that as a win all around. We need to rely more on nuclear power, and I'll take whatever we can get to push us in that direction.
No, as the things using that power get better (newer models keep getting less garbagey) and cheaper (faster hardware and more efficient use of power), people will keep coming up with more things to use them for.
Jevons paradox says as things get more efficient, usage goes up. In this case, even if AI data centers don't pan out, I think we'll still find use for the electricity they generate.
We don't even know a tighter lower bound for matrix multiplies than O(n²). Naive is O(n³), strassen is O(n^2.8). And those are simple, low-level kernels. At the higher level we also do not know tight lower bounds. But we do know some loose bounds from nature, e.g. how much data and energy a human consumes over its lifetime.
No, not really. AWS getting more power and space efficient chips didn't reduce total power demand, they just added more cores.
Even if the data centers didn't keep up with available capacity, energy demanding industry move to and expand with sources of power, like aluminum production.
Jevons Paradox will take care of it[1]. The more efficiently a resource is used, the more demand there is for it.
The grave implication of Jevons paradox is that the fundamental conflict between sustainability and economic progress is not resolved solely by using resources more efficiently. It's a theory of supply chain constraints essentially. Once a resource is used more efficiently, its use is increased until the next most economically constrained resource hits its economically useful limit.
I guess power demands will slowly grow. The same happened with compute in general. Compared to 1960, we have several orders of magnitude more compute but also several orders of magnitude more efficient compute. Data centers are currently about 0.4% of total energy use (electricity is about 20% of total energy use and of the electricity about 2% goes to data centers, so 20% * 2% = 0.4%).
It seems (feels?) likely that demand for LLM is elastic, especially when it comes to specialized niche. Less power requirements just mean we run more of them in parallel for stuffs, so the power needs is gonna be growing anyway.
That'd give a lot of extra power which can be used for other - and probably better - purposes so I'd say let them build those plants. The more power available the better after all?
It's called the rebound effect, at no point in modern history efficiency reduced our energy needs, we just use the extra energy to either run more of the same thing or run other things
And that’s what matters the most! To me, at small model sizes (1-8B), anyway. A few thousans tokens already bog my RAM down quite a lot and I’d love to have more - I’d go as far as saying that context greatly determines LLM capability at this point.
Title is perfect. Their typical audience probably understands "memory" better than "context window", but then if you've actually deployed these systems it's not difficult to go the other way, from "memory" to "context window" since the context window specifically is known to take additional VRAM over the model itself
It’s mind bogglingly crazy that language models rivaling ones that used to require huge GPUs with a ton of VRAM to run now run on my upper-mid-range laptop from 4 years ago. At usable speed. Crazy.
I didn’t expect capable language models to be practical/possible to run loyally, much less on hardware I already have.
I would argue that our sensor package is losing its lead very quickly-- audio performance is already on par with current tech and image processing is closing the gap very quickly as well (it helps a lot that silicon-based technology is much less constrained on bandwidth). Tactile sensing is still lightyears ahead, and I don't see that situation improving anytime soon...
If you don't care about docker packages being used as installers and your home directory invisibly used to store massive weight files in exchange for not having to deal with learning any configuration: ollama or lmstudio.
If you just want to play for a bit: llamafile
If you want granular control with ease of execution in exchange for having to figure out what the settings mean and figure out which weights to download: koboldcpp. (check out bartowski on huggingface for the weights)
These are all based on llamacpp as a backend, by the way.
I run ollama off a symlink to an external volume. It just feels neater that way, and can run any GGUF off of HuggingFace. I would like to know what configuration I'm missing out on, though.
Given that the algorithms powering present LLM models hadn't been invented ten years ago, I have to think that they are (potentially) far from optimal.
Brains have gone through millions of iterations where being efficient was a huge driver of success. We should not be surprised if someone finds a new ML method that is both wildly more efficient and wildly more effective.
Very clever, very meta, and it seems to work really well.
The two big take-aways for me are:
* It's possible to train a model to learn to summarize context from the attention matrix, based only on dot-product scores (k @ q.T * mask), regardless of how tokens are embedded.
* Once the model is trained, it will work with any attention matrix, even if it's the attention matrix of another model.
I've added this to my ever-growing list of things to try.
Only the model’s view, doesn’t have to be yours, just like you can participate in a long conversation without perfect memory that might in retrospect slightly differ from a recording.
It's for KV caching. In most conversations that will mean inference. But you can do reinforcement learning using sampled sequences, and you could use KV caching to speed that up too, so that would be an instance where training could get a slight boost.
I dunno, software seems to be getting worse, hardware is getting more expensive and both Microsoft and Apple are distracted by AI, not to mention NVIDIA who seem to have bet the farm on Deus Ex Shovel
Hardware is still getting cheaper all the time as far as I can tell.
Though I had thought you were talking about stuff like eg producing more corn on a given piece of land, or making more furniture from less wood or so. Or even just making better batteries and solar cells.
Oh, I don’t know, how about reducing the search space/accelerating the search speed for potential room temperature superconductors? Or how about the same for viable battery chemistries?
Yeah, but it’s not an even sharing of resources. LLMs are consuming a vast amount of human attention (no pun intended) at the expense of technological pursuits that will more certainly generate value. As far as can be told LLMs are reaching a plateau in terms of real world value, batteries and superconductors have calculably more potential.
Ok, done. I can report to you that it helped me cut down my personal search space. Imagine what such a tool could do in the hands of a subject matter expert with rudimentary critical thinking ability and the faintest hint of a grasp of using the scientific method to verify claims, wow..
You're making a fundamental error in your reasoning. An LLM's training corpus being fixed doesn't limit the system's total information processing capability when used as a tool by a human researcher. While the LLM itself can't generate truly novel information (per the data processing inequality), a human researcher using it as a dynamic search and analysis tool can absolutely generate new insights and discoveries through their interaction with it. The human-LLM system is open, not closed.
This is analogous to how a calculator cannot output any number that isn't computationally derivable from its programming, yet humans using calculators have discovered new mathematical proofs. The tool augments human capability without being AGI.
Your argument is essentially claiming that because a microscope can't generate new cellular structures, it can't help biologists make new discoveries.
You have a fundamental misunderstanding of how LLMs work, which is why you think they are magical.
Of course if you play the LLM Pachinko machine you can get all sorts of novel output from it, but it’s only useful for certain tasks. It’s great for translation, summarizing (also a kind of translation), and to some degree it can recall from its training corpus an interesting fact. And yes, it can synthesize novel content such as poetry, or adapt an oft-used coding pattern in a flavor specified by a prompt.
What it can’t do is come up with a new idea. At least not in a way better than rolling a dice. It may come up with an idea that you, dear reader, may not have encountered, which makes it great for education.
I don’t have anything more to say, but you’re welcome to continue this discussion with an agent of your choice.
Google Trends make it seem like we're out of the exponential growth phase for LLMs-- search interest is possibly plateauing.
A decline in search interest outside of academia makes sense. The groups who can get by on APIs don't care so much how the sausage is made and just want to see prices come down. Interested parties have likely already found tools that work for them.
There's definitely some academic interest outside of CS in producing tools using LLMs. I know plenty of astro folks working to build domain specific tools with open models as their backbone. They're typically not interested in more operational work, I guess because they operate under the assumption that relevant optimizations will eventually make their way into public inference engines.
And CS interest in these models will probably sustain for at least 5-10 more years, even if performance plateaus, as work continues into how LLMs function.
All that to say, maybe we're just seeing the trend die for laypeople?
Well, Google Search trends are also only an imperfect proxy for what we are actually interested in.
Eg tap water is really, really useful and widely deployed. Approximately every household is a user, and that's unlikely to change. But I doubt you'll find much evidence of that in Google Search trends.
well gary marcus a non lay person is helping spread word that ai winter is again upon us.
but maybe statistical learning from pretraining is near its limit. not enough data or not enough juice to squeeze more performance out of averages.
though with all the narrow ais it does seem plausible you might be able to cram all what these narrow ais can do in on big goliath model. wonder if reinforcement learning and reasoning can manage to keep the exponential curve of ai going if there are still hiccups in the short term.
the difficulty in just shoehorning llms as they are in any and every day task without a hitch might be behind the temporary hype-dying down trend.
But "Large language model" as a topic in google trends is still in its peak. Maybe just everyone who would be the audience is already knowledgeable about LLMs so why would Google Search trends be able to keep rising?
ChatGPT is at it's peak, and something like Claude is still rising.
True. Microsoft's all in, Apple's all in, Nvidia is selling shovels, insurance companies are all in, police & military are all in, education is all in, office management is all in. Who is left to pump line up?
Rubbish. I built a pipeline to handle document classification that successfully took care of ~70TB of mostly unstructured and unorganized data, by myself, in a couple weeks, with no data engineering background whatsoever. This was quite literally impossible a couple years ago. The amount of work that saved was massive and is going to save us a shit ton of money on storage costs. Decades worth of invoices and random PDFs are now siloed properly so we can organize and sort them. This was almost intractable a few years ago.
We came up with different categories of tags. I should clarify, the AI didn't actually do the sorting, it did tagging so sorting was tractable. After the tagging it's just a matter of grouping, either by algorithm or human.
But obviously it would be far from accuracy that LLM would be able to do. E.g. generate search keywords, tags, other type of meta data for a certain document.
Yup that's exactly it. By being able to tag things with all sorts of in house meta data we were then able to search and group things extremely accurately. There was still a lot of human in the mix, but this made the whole task going from "idk if we can even consider doing this" to "great, we can break this down and chip away at it over the next few months/throw some interns at it".
Yeah, I don't know - hearing arguments that this was already done by ML algorithms is to me hearing like "moving from place A to B existed already before cars". But it seems like a common sentiment. So much that simple ML attempted to be doing required massive amount of training and training data specific to your domain before you could use it, and LLM can do it out of the box, and actually consider nuance.
I think organizing and structuring data from unorganized data from the past is a massive use case that seems heavily underrated by so many right now. People spend a lot of time on figuring out where to find some data, internally in companies, etc.
Sure, there’s lots of room for LLMs in helping to do clerical work, HR, that kind of thing. I was actually thinking of the direct management of funds and investments. So yeah, like probably all businesses, the ancillary functions can probably improve productivity using Generative AI with a minimal hit to quality.
You are right about the clerical work, but even pure finance is a lot more than 'direct management of funds and investments'. Have a look at Matt Levine's Money Stuff newsletter for a taste.
And I'm not quite sure why you mention determinism in the grandfather comment? Finance people have been using Monte Carlo simulations for ages. (And removing non-determinism from LLMs by fixing the seed of any pseudo-random number generator used wouldn't really change anything, would it?)
At the end of the day, the hard limit in finance is defaulting. Everything outside that is financial poetry (or engineering :-p).
I know every segment of finance loves to pretend that's not the case, because their jobs (and high salaries) frequently rely on that not being true (see the subprime mortgage crisis).
> At the end of the day, the hard limit in finance is defaulting. Everything outside that is financial poetry (or engineering :-p).
You are forgetting all about regulations and taxation (and how to work with / around them). And how to cleverly read documents, and exploit loop holes in contracts.
There's so much more to finance.
(And for eg stocks or commodities, there's not even any notion of defaulting. Defaulting only really makes sense when you have fixed obligations. 'Fixed income' is only one part of finance.)
Finance is all in on reading 10-Ks and generating summaries. If you have decisions in mind, I’ll be referring to IBM 1979 slide until an HR LLM fires me.
[1] https://arxiv.org/html/2410.19258v3
reply