I've been following the whole thing low key since the 2nd wave of neural networks in the mid 90s - and made a very very minor contribution to the field which has applications these days back then too.
My observation is that every wave of neural networks has resulted in a dead end. In my view, this is in large part caused by the (inevitable) brute force mathematical approach used and the fact that this can not map to any kind of mechanistic explanation of what the ANN is doing in a way that can facilitate intuition. Or as put in the article "Current AI systems have no internal structure that relates meaningfully to their functionality". This is the most important thing. Maybe layers of indirection can fix that, but I kind of doubt it.
I am however quite excited about what LLMs can do to make semantic search much easier, and impressed at how much better they've made the tooling around natural language processing. Nonetheless, I feel I can already see the dead end pretty close ahead.
I didn’t see this at first, and I was fairly shaken by the potential impact on the world if their progress didn’t stop. A couple generations showed meaningful improvements, but now it seems like you’re probably correct. I’ve used these for years quite intensively to aid my work and while it’s a useful rubber duck, it doesn’t seem to yield much more beyond that. I worry a lot less about my career now. It really is a tool that creates more work for me rather than less.
Would this still hold true in your opinion if models like O3 become super cheap and bit better over time? I don't know much about the AI space, but as a vanilla backend dev also worry about the future :)
I was helping a relative still in college with a project, and I was struck by how lackadaisical they are about cut-and-pasting huge chunks of code from chatgpt into whatever module they are building without thinking about why, or what it does, or where it fits, as long as it works. It doesn't help that it's all relatively same-looking Javascript so frontend or backend is kinda mixed together. The troubleshooting help I provided was basically untangling the mess by going from first principles and figuring out what goes where. I can tell you I did not feel threatened by the AI there at all, if anything I felt bad for the juniors and feeling like this is what we old people are going to end up having to support very soon.
Not sure how accurate these numbers are but on https://openrouter.ai/ highest used "apps" basically can auto-accept generated code and apply it to the project. I was recently looking at top performers on https://www.swebench.com/ and noticed OpenHands basically does the same thing or similar. I think the trend is going to get much worse, and I don't think Moore's Law is going to save us from the resulting chaos.
We know that OpenAI is verz good at least in one thing: generating hype. When Sora was announced everyone thought that this will be revolutionary. Look at how it looks like in production. Same when they started floating rumours that they have some AGI prototype in their labs.
They are the Tesla of the IT world, overpromise and under deliver.
It's a brilliant marketing model. Humans are inherently highly interested in anything which could be a threat to their well-being. Everything they put out is a tacit promise that the viewer will soon be economically valueless.
I hope people will come to the realisation that we have created a good plagiarizer at best. The "intelligence" originates from the human beings who created the training data for these LLMs. The hype will die when reality hits.
Hype is very interesting. The concept of Hyperstition describes fictions that make themselves real. In this sense, hype is an essential part of capitalism:
"Capitalization is [...] indistinguishable from a commercialization of potentials, through which modern history is slanted (teleoplexically) in the direction of ever greater virtualization, operationalizing science fiction scenarios as integral components of production systems." [0]
"Within capitalist futures markets, the non-actual has effective currency. It is not an "imaginary" but an integral part of the virtual body of capital, an operationalized realization of the future." [1]
This corresponds to the idea that virtual is opposed to actual, not real.
Religion too. Those that are told a prophecy is to come, have a lot of incentive to fulfill that prophecy. Human belief systems are strange and interesting because (IMO) of the entanglement of beliefs with identity
Generally speaking, I think it would. I’m open to being wrong. I think there is a non-trivial amount of hype around O3, and while it would certainly be interesting if it was cheap, I don’t think it would address important issues that AI currently doesn’t seem to even begin to accommodate in its current capacity to recognize or utilize contexts.
For example, I have little to no expectation that it will handle software architecture well. Especially refactoring legacy code, where two enormous contexts need to be held in mind at once.
I'm really curious about something, and would love for an OpenAI subscriber to weigh in here.
What is the jump to O1 like, compared to GPT4/Claude 3.5? I distinctly remember the same (if not even greater) buzz around the announcement of O1, but I don't hear people singing its praises in practice these days.
I gave up interest in GPT4/Claude3.5 about 6 months ago as not very helpful, producing plausible but wrong code.
Have an o3-mini model available to me on the other hand I'm very impressed with its fast, succinct, correct answers while tooling around in zsh on my mac. what things are called, why they exist. why is macports installing db48 etc. It still fails to write simple bash one liners. (I wanted to pipe the output of ffmpeg to a column of --enabled-features and it just couldn't do it)
It's a very helpful rubber duck but still not going to suffice as an agent, but I think its worth a subscription. I wanted to do everything local and self hosted and briefly owned a $3000 mac studio to run llama3.3-70B but it was only as good as GPT4 and too slow to be useful so returned it. In that context even $200/m is relatively cheap.
I don't know how to code in any meaningful way. I work at a company where the bureaucracy is so thick that it is easier to use a web scraper to port a client's website blog than to just move the files over. GPT 4 couldn't write me a working scraper to do what I needed. o1 did it with minimal prodding. It then suggested and wrote me a ffmpeg front-end to handle certain repetitive tasks with client videos, again, with no problem. Gpt4 would often miss the mark and then write bad code when presented with such challenges
>I worry a lot less about my career now. It really is a tool that creates more work for me rather than less.
when i was a team/project leader the largest part of my work was talking to the reports on what needs to be implemented and how they are going to implement it and the current progress of the implementation, how to interface the stuff, what are the issues and how to approach the troubleshooting, what are the next steps, etc. with occasional looking into/reviewing the code - it looks to me what working with coding LLM would soon be quite similar to that.
Many of the major harms of these things were neglected, and downplayed even to this day people don't recognize just how changed the world has become. The mere delusion that AI will replace work has been used to justify mass layoffs.
The persistence of indistinct ghost jobs that are generated by computer for pennies to flood and bind with prospective job seekers (similar to RNA interference), has resulted in severe brain drain in many fields. Worse, the fact these people have often been forced into poverty as a result will have a lasting impact. You might have planned for up to a year out of work pre-AI and had the financial resources, but now how long does it take? Conversion ratios for the first step have changed by two magnitudes or order (from x100 to x10,000). What are the odds of these people finding a job given their finite time, and requirements that are un-automatable for submission (nil). The media keeps claiming that everything is getting better, the stats say so (while neglecting the fact that the stats are being manipulated to the point of uselessness/fabricated), but you have 1/3 of welfare payouts now going to these people in California (in the US), just for basic food.
When you can't find work, you go where the work is abandoning the bad economic investment and choice you made regardless of how competent you were. It is a psychologically sticky decision. When there is no chance at finding work, you get desperate, and many desperate people turn to crime and unrest. This was foreseen by a number of very intelligent people many decades ago, and ignored following business as usual.
The mere demonstration that we are unable to react in time is what gave engineers such great pause to write about these things, as far back as in the 70s. Hysteresis is a lagging time problem where you can't react fast enough to avert catastrophic failure given chaotic conditions, leaving survival up to chance. Its the worst type of engineering problem with real consequences.
Given how western society is structured dependently on labor exchange, its a perfect weapon of chaos and debasement in the value of labor, that effectively destroys half of its underlying economic structure (factor markets). This forces sieving conditions of wealth that become spinodal, and eventually falter under their constraints and spiral into deflationary trends over time.
Business wins so much that they lose everything. Its quite a disadvantaged environment and the general trend is that everyone is ignoring the pink elephant. Actions (and inaction) have consequences. When people don't listen and take appropriate action, consequences get dire, it hits the fan.
I agree that we are often missing in our analyses the true materialised impact of expectations by focusing on the validity of said expectations instead. Organisations, even if not laying off, are pausing hiring plans with a conviction that AI will replace some of the workers. It then becomes a self fulfilling prophecy to some extent. It doesn’t matter if it can, what matters is if it will. And to assume that people won’t place a bet is futile, as everyone does, and even if it’s wrong the market will allocate the losses to the baseline.
There are two aspects of your line of reasoning that don't jive for me.
People can choose to not place a bet by not participating in the economy, or tying physical assets to it. In other words, de-banking, off-grid farming, unemployment on welfare (not their money, printed guaranteed loss absorbed by the baseline).
The assumption that the market can always allocate the losses to the baseline has already been shown to be foundationally flawed. It depends upon whether the baseline can absorb the losses to keep the market going, not the other way around. Those who believe in MMT don't pay head to the fact that money printing has caused societies to fail many times in the distant past, in the ever quoted phrase, but it will be different this time.
When the economic engine stalls, so too does order and money-printing/debt issuance (without fractional reserve) drives this as a sieve (which we've seen over the past several decades in the form of bailout, marketshare concentration, and consolidation).
Central banks set reserve allocations to 0% in 2020, adopting a capital reserve, risk-weighted system based in fiat that is opaque and stock-market tied (Basel III modified). Value is subjective, and fiat may have store of value, right up until it doesn't.
Of particular note, societal order is required to produce enough food for 8bn globally, without order and its now brittle dependencies, we can only feed 4bn globally. Malthus has a lot to say about population dynamics in ecological overshoot.
TL;DR Half of all people die when modern chemical production (Haber-Bosch fertilizer) and other food dependencies (climate) fail.
AI drives chaos and disruption. Its like throwing a wrench into a gear system, maybe it will stall, maybe it gets thrown out (still slowing it), maybe it runs rough wearing it faster further degrading the system towards failure.
When the baseline cannot absorb the cost in terms of purchasing power, it absorbs the cost from the resulting chaos in lives.
Intelligent people pay attention to history because the outcomes that repeat in history occur as a result of dynamics that repeat, and in matters where lives or survival are on the line risk management shifts from permissive to restrictive (where the requirements of proof are flipped).
Thank you for the thoughtful answer. There is a certain amount of cynicism in my post, to match the cynicism of reality unfortunately. Your arguments may be valid, but who cares to rationally think and act when they can easily observe and react? A collapse akin to your description would disproportionately affect the people who don’t have the power to do either, they just accept and suffer. In history, do we have any example where that was not the case, except revolution? Even with revolutions the respite is only perceived and of the shuffle to reach the new decision structures.
I'm in agreement that self-fulling dynamics occur regularly over longer time horizons, I viewed what you said as pragmatism rather than cynicism.
As for who cares to rationally think and act when they can easily observe and react?
The problem with the latter is that its a false choice. The latter simply isn't possible in any effective sense. Certain systems and dynamics become a hysteresis problem, where the indicator is lagging ahead of when it actually happens; by the time you see the indicator it can be perceived, but ultimately impossible to react to in time.
There are also simultaneous issues with rational thought being deprived broadly through induction of psychological stress using sophisticated mental coercion and torture (which isn't physical). Rational thought is the first thing to vanish, and these methods act like HIV does in cellular systems (i.e. destroying the memory of the immune system making it unable to act, instead it destroys perception blinding people).
For some reason these things remind me of the Tower of Babel story in the book of genesis. It makes god out to be the bad guy, when it seems far more likely that the dynamics became destructive. All of Humanity has psychological blindspots that can be used to manipulate them in collective and through unity. Pride often lends itself to delusion, and blindness. Destruction usually follows, and confusion occurs naturally when delusion breaks towards a witnessed reality (as survivors, where others kept dying).
It seems like the translation is off, where instead of god, they meant the inescapable forces of reality. Albeit this is getting a bit into the weeds, its an interesting perspective.
Getting back to things, the major difference today when comparing to history with regards to revolution, we are in extreme ecological overshoot (globally).
Breakdown of order translates to famine so severe that half globally die from starvation. To make matters worse, nearly every economic system on the planet is controlled indirectly by one nation through money printing, and those distortions created are chaotic (fundamentally it shares many characteristics as that of a n-body immeasurable astrophysics system that has limited visibility).
When these things happened in the past, they were largely in isolation, and outside the geographical affected areas, assistance could be leveraged for survival. This is no longer the case now. If these things collapses, it all happens to everyone at the same time. Not enough resources exist to resolve the failure, and there are no tools that would allow correcting the situation after the dynamics have passed a point of no return.
Thinking about these things rationally, preparing while we can (before it happens), is the only tool that might allow survival long term for a few. Its important that survivors know what happened and how it happened, or it will happen again given sufficient time, and that requires a foundation.
Needless to say, we have many dark times ahead.
A line appears, the order wanes, the empire falls, and chaos reigns.
I do not envy those who would have to somehow live through chaos, where nuclear weapons might be used by the delusional or insane.
I enjoy your thinking and your use of analogy. We agree more than is evident, but you think on an horizon that eludes most, myself included unfortunately. As you say, the psychological torture of mere lifestyle survival overshadows the rational concern for true survival. To some extent, we live through chaos, but don’t have the wherewithal to accept it as such and cling to a normal that is increasingly not normal at all.
> The mere delusion that AI will replace work has been used to justify mass layoffs.
AI might be the excuse but the reason is the end of zero interest rates and blitzscaling along with resentment among business leadership that some members of the labor force were actually getting a good deal for once.
You can't claim wage earners have received a good deal when they are unable to support themselves with basic necessities, let alone a wife and three children (required for risk managing 1 surviving to have children themselves). This is largely why we have a problem with birth rate today with the old crowding out all opportunities for the young.
The problem you mention doesn't really have to do with AI. It comes down to purchasing power in the economy, not wages, and business has shown over decades they will not or cannot be flexible when it comes to profit.
Additionally, money printing puts both parties at each others necks through debasement of the currency. When the currency debasement (inflation) exceeds profit legitimate business not tied to a money printer leaves the market (no competition is possible).
When the only entities left in some proposed market cooperate, the market isn't a market, its non-market socialism without the requirements for economic calculation. This fails.
Neither parties in my opinion are getting a reasonable deal. Whose to blame? The cohorts of people printing money from nothing that call themselves central bankers.
Previous generations of neural nets were kind of useless. Spotify ended up replacing their machine learning recommender with a simple system that would just recommend tracks that power listeners had already discovered. Machine learning had a couple of niche applications but for most things it didn't work.
This time it's different. The naysayers are wrong.
LLMs today can already automate many desk jobs. They already massively boost productivity for people like us on HN. LLMs will certainly get better, faster and cheaper in the coming years. It will take time for society to adapt and for people to realize how to take advantage of AI, but this will happen. It doesn't matter whether you can "test AI in part" or whether you can do "exhaustive whole system testing". It doesn't matter whether AIs are capable of real reasoning or are just good enough at faking it. AI is already incredibly powerful and with improved tooling the limitations will matter much less.
> Previous generations of neural nets were kind of useless. Spotify ended up replacing their machine learning recommender with a simple system that would just recommend tracks that power listeners had already discovered.
“Previous generations of cars were useless because one guy rode a bike to work.” Pre-transformer neural nets were obviously useful. CNNs and RNNs were SOTA in most vision and audio processing tasks.
Language translation, object detection and segmentation for autonomous driving, surveillance, medical imaging... Indeed plenty fields where NNs are indispensable
Yeah, give 'em small constrained jobs where the lack of coherent internal representation is not a problem.
I was involved in ANN and equivalent based face recognition (not on the computational side, on the psychophysics side) briefly. Face recognition is one of these bigger more difficult jobs, but still more constrained than the things ANNs are useful for.
As far as I understand none of the face recognition algorithms in use these days are ANN based, but are instead computationally efficient versions of the brute force the maths implementations instead.
From what I have seen, most of the jobs that LLMs can do are jobs that didn't need to be done at all. We should turn them over to computers, and then turn the computers off.
But here reliability comes in again. Calculators are different since the output is correct as long as the input is correct.
LLMs do not guarantee any quality in the output even when processing text, and should in my opinion be verified before used in any serious applications.
> Calculators are different since the output is correct as long as the input is correct.
That isn't really true.[0] The application of calculators to a subject matter is something that does need to be considered in some use cases.
LLMs also have accuracy considerations, and although it may be to a different degree, the subject matter to which they're applicable has a broad range of acceptable accuracies. While some textual subject matter demands a very specific answer, some doesn't: For example, there may be hundreds or thousands of various ways to summarize a text that could be accurate for a particular application.
I think your point stands, but your example shows that anyone using those calculators daily should not be concerned. Those that need precision to the 6+ decimal places for complex equations should know not to fully trust consumer-grade calculators.
The issue with LLMs is that they can be so unpredictable in their behaviour. Take the following prompt that asks GPT-4 to validate the response to "calculate 2+3+5 and only display the result":
GPT-4o mini contradicts itself, which is not something one would expect for something we believe to be extremely simple. However, if you ask it to validate the response to "calculate 2+3+5," it will get it right.
Well, not every tool is a hammer and not every problem is a nail.
If I ask my TI-89 to "Summarize the plot in Harry Potter and the Chamber of Secrets" it responds "ERR"! :D
LLMs are good text processors, pocket calculators are good number processors. Both have limitations, and neither are good at problem sets that are outside of their design strengths. The biggest problem with LLMs aren't that they are bad at a lot of things, it's that they look like they are good at things they aren't good at.
I agree LLMs are good at text processing and I believe they will obsolete jobs that really should be obsoleted. Unless OpenAI, Anthropic and other AI companies come up with a breakthrough on reliability, I think it will be fair to say they will only be players and not leaders. If they can't figure something out, it will be Microsoft, Amazon and Google (distributors of diverse models) that will benefit the most.
I've personally found it is extremely unlikely for multiple good LLMs to fail at the same time, so if you want to process text and be confident in the results, I would just run the same task across 5 good models and if you have a super majority, you can be confident that it was done right.
Neither are humans, that's why we have proofreaders and editors. That doesn't make them any less useful. And a translator will not write the same exact translation for a text longer than a couple of sentences, that does not mean translation is a dead end. Ironically, it's LLMs that made translation a dead end.
No they can't because they make stuff up, fail to follow directions, need to be minutely supervised, all output checked and workflow integrated with your companies shitty over complicated procedures and systems.
This makes them suitable at best as an assistant to your current worker or more likely an input for your foo as a service which will be consumed by your current worker. In the ideal case this helps increase the output of your worker and means you will need less of them.
An even greater likelihood is someone dishonest at some company will convince someone stupid at your company that it will be more efficacious and less expensive than it will ultimately be leading your company to spend a mint trying to save money. They will spend more than they save with the expectation of being able to lay off some of their workers with the net result of increasing workload on workers and shifting money upward to the firms exploiting executives too stupid to recognize snake oil.
See outsourcing to underperforming overseas workers because the desirable workers who could have ably done the work are A) in management because it pays more B) in country or working remotely for real money or C) cost almost as much as locals once the increased costs of doing it externally are factored in.
> No they can't because they make stuff up, fail to follow directions, need to be minutely supervised, all output checked and workflow integrated with your companies shitty over complicated procedures and systems.
What’s the difference between what you describe and what’s needed for a fresh hire off the street, especially one just starting their career?
Real talk? The human can be made to suffer consequences.
We don't mention this in techie circles, probably because it is gauche. However you can hold a person responsible, and there is a chance you can figure out what they got wrong and ensure they are trained.
I can’t do squat to OpenAI if a bot gets something wrong, nor could I figure out why it got it wrong in the first place.
The difference is that a LLM is like hiring a worst-case scenario fresh hire that lied to you during the interview process, has a fake resume and isn't actually named John Programmer.
boy do I love being in the same industry as people like you… :) while you are writing silly stuff like this us that do shit have automated 40-50% of what we used to do and not have extra time to do more amazing shit :)
> Spotify ended up replacing their machine learning recommender with a simple system that would just recommend tracks that power listeners had already discovered.
Do you have a source on this? Spotify also seems to employ a few different recomendation algorithms, for example Discover Weekly vs. continuing to play after a playlist ends. I'd be surprised if Discover Weekly didn't employ some sort of ML as it does recommend songs I have never heard before many times.
It's from the book by Carlsson and Leijonhufvud. Perhaps Spotify uses ML today, but the key insight from the book was that no ML was needed to build a recommender system. You can just show people songs from custom playlists curated by powerusers. So when your playlist ends you find other high quality playlists that overlap with the music you just listened to. Then you blend those playlists and enqueue new tracks. This is from memory so I might have gotten the details wrong, but I remember that this approach worked like magic and solved the issues with the ML system (bland or too random recommendations). No reason to use ML when you already have millions of manually curated playlists.
If you had to bet a large amount of your own money on a scenario where you have a 3200 word text and you ask ChatGPT to change a single sentence, would you bet on or against that it would change something other than what you asked it to change? I would bet that it would, every time (even with ChatGPT's new document feature). There aren't a lot of employers who are okay with persistent randomness in their output.
If there's a job that can be entirely replaced by AI, it was already outsourced to an emerging market with meager labor costs (which at this point, is likely still cheaper than a fully automated AI).
gizmo says>LLMs today can already automate many desk jobs.
I call: show me five actual "desk jobs" that LLMs have "already automated". Not merely tasks, but desk jobs - jobs with titles, pay scales, retirement plans, etc. in real companies.
I know an immigration agent who simply stopped using professional translators because ChatGPT is more than good enough for his purposes. In many ways it is actually better, especially if instructed to use the specific style and terminology required by the law.
If you think about it, human calculators (the job title!) were entirely replaced by digital electronic calculators. Translators are simply "language calculators" that perform mechanical transformations, the ideal scenario for something like an LLM to replace.
That’s professional negligence. Have the LLM prepare a draft for a human translator to review, sure. But taking the human out of the loop and letting in undetectable hallucinations? In a legal proceeding?
But it is not all or nothing here. We replaced real programmers (backend, frontend, embedded) with it, but obviously (I guess) not all. We just require 1/5th of those roles since around beginning this year. There are a lot more 'low level' jobs in tons of companies where we see the same happening because suddenly the automation is trivial to make instead of 'a project'. It will take time for the bigger ones and it won't 'eliminate' all jobs of the same type (maybe it will in time), but it will eliminate most people doing that job as now 1 people can do the work of 5 or more.
I guess we will see the actual difference in 5-10 years in the stats. Big companies are mostly still evaluating and waiting. Maybe it will remain just a few blibs and it'll fizzle out, or maybe, and this is what I expect, the effect will be a lot larger, moving many to other roles and many completely out of work.
On a small (we see many companies inside, but many is relative, of course), but real life examples I see are translators, programmers, seo/marketing writers, data entry (copying content from pdf to excel, human webscraping etc) being replaced now.
We work with some small outsourcing outfits (few 100 people per) and they noted sharp drops in business from the west where the stated reason is AI, but it's not really easy to say or see if that's real or just the current market.
Imagine the face of a guy who needs to do the work of 5 solo now... He is probably the happiest employee now and his salary raised 5-fold, surely yeah?
Yeah the internal representation of organic neural networks are also weird - check out the signal processing that occurs between the retina and the various parts of the visual cortex before any decent information can emerge from the signal - David Marr's 1980s book Vision is a mathematically chewy treatise on this. This leads me to start thinking that human intuition may well caused by different neural network subsystems feeding processed data into other subsystems where consciousness and thus intuition and explanation emerges.
Organic neural networks are pretty energy efficient in comparison- although still decently inefficient compared to other body systems - so there is the capacity to build things out to the scale required, assuming my read on what's going on there is correct, that is. So it's not clear to me that the energy inefficiency of ANNs can be sufficiently resolved to enable these multiple quasi-independent subsystems to be built at the scale required. Not even if these interesting looking trinomial neural nets which are matrix addition based rather than multiplication come to dominate the ANN scene.
While I was thinking this comment through I realised there's a possible interpretation wherin human activity induced climate change is an emergent property of the relative energy inefficiency of neural architecture.
I mean, the matrices obviously change during training. I take it your point is that LLMs are trained once and then frozen, whereas humans continuously learn and adapt to their environment. I agree that this is a critical distinction. But it has nothing to do with “meaningful internal structure.”
The reasoning is quite subtle, and because I'm not a very coherent guy I have problems expressing it. In the LLM space there are a whole bunch of pitfalls around overfit (largely solvable with pretty standard statistical methods) and inherent bias in training material which is a much harder to problem to solve. The fact that the internal representation gives you zero information on how to handle this bias means the tool can itself not be used to detect or resolve the problem.
I found this episode of the nature podcast - "How AI works is often a mystery — that's a problem": https://www.nature.com/articles/d41586-023-04154-4 - very useful in a 'thank goodness someone else has done the work of being coherent so I don't have to' way.
AlphaGo had an artificial neural network that was specifically trained in best moves and winning percentages. An LLM trained on text has some data on what constitutes winning at go, but internally doesn't have a ANN specifically for the game of go.
> AlphaGo had an artificial neural network that was specifically trained in best moves and winning percentages. An LLM trained on text has some data on what constitutes winning at go, but internally doesn't have a ANN specifically for the game of go.
This isn't addressing what the original commenter was referring to.
Do your kids have internal structures which relate meaningfully to their functionality, which allow a mechanistic explanation of what they learned in school?
Not sure if this is satirical, but absolutely yes.
Heck we have everything from fields of study, to professions that cover this. Neurology, psychology, counseling, teaching, amongst a few.
All things being equal, If a kid didn’t pick up a concept, I can sit with them and figure out what happened, and we can both work towards making sure its cleared up.
“ the fact that this can not map to any kind of mechanistic explanation of what the ANN is doing in a way that can facilitate intuition.”
Will remain true imho. We will never fully intuit AI or understand it outside of some brute force abstraction like a token predictor or best fit curve.
What are your thoughts on neuro-symbolic integration (combining the pattern-recognition capabilities of neural networks with the reasoning and knowledge representation of symbolic AI) ?
I’m not an AI expert, but from my armchair I might draw a comparison between functional (symbolic rule- and logic-based AI) and declarative (LLM) programming languages
Given you just mentioned semantic search (a term I haven’t heard in over 15 years) and the other breadcrumbs in this comment, you wouldn’t by chance be an English lecturer living in Ireland would you?
me? No. Ex trainee neruopsychologist and failed academic who was in the right place at the right time back in the mid 90s who didn't pick up computers for professional interest until the mid-late 2000s after getting excited by Neil Stephenson's Cryptonomicon when I was looking for a career change. These days I identify as an international computer hacker, but mainly to take the piss (due to the tiny element of truth sitting underneath)
As of right now, we have no way of knowing in advance what the capabilities of current AI systems will be if we are able to scale them by 10x, 100x, 1000x, and more.
The number of neuron-neuron connections in current AI systems is still tiny compared to the human brain.
The largest AI systems in use today have hundreds of billions of parameters. Nearly all parameters are part of a weight matrix, each parameter quantifying the strength of the connection from an artificial input neuron to an artificial output neuron. The human brain has more than a hundred trillion synapses, each connecting an organic input neuron to an organic output neuron, but the comparison is not apples-to-apples, because each synapse is much more complex than a single parameter in a weight matrix.[a]
Today's largest AI systems have about the same number of neuron-neuron connections as the brain of a brown rat.[a] Judging these AI systems based on their current capabilities is like judging organic brains based on the capabilities of brown rat brains.
What we can say with certainty is that today's AI systems cannot be trusted to be reliable. That's true for highly trained brown rats too.
If brown-rats-as-a-service is as useful as it is already, then I'm excited by what the future holds.
I think to make it to the next step, AI will have to have some way of performing rigorous logic integrated on a low level.
Maybe scaling that brown-rat brain will let it emulate an internal logical black box - much like the old adage about a sufficiently large C codebase containing an imperfect Lisp implementation - but I think things will get really cool we figure out how to wire together something like Wolfram Alpha, a programming language, some databases with lots of actual facts (as opposed to encoded/learned ones), and ChatGPT.
ChatGPT can already run code, which allows it to overcome some limitations of tokenization (eg counting the letters in strawberry, sorting words by their second letter). Doesn't seem like adding a Prolog interpreter would be all that hard.
ChatGPT does already have access to Bing (would that count as your facts database?) and Jupyter (which is sort of a Wolphram clone except with Python?).
It still won't magically use them 100% correctly, but with a bit of smarts you can go a long way!
Jupyter is completely different from Wolfram software. It's just an interface to edit and run code (Julia, Python and R) and write/render text or images commenting the code. Which isn't to say that Jupyter isn't a great thing but I don't see how a Chatbot would produce better answers by having access to it in addition to "just Python".
Meanwhile, Wolfram software has built-in methods to solve a lot of different math problems for which in Python you would either need large (and sometimes quirky) libraries, if those libraries even exist.
Except a typical Jupyter environment -especially the one provided to ChatGPT- includes a lot of libraries; including numpy, scipy, pandas and plotly, which -while perhaps not quite as polished as wolphram (arguments can be made), can still rival it qua flexibility and functionality.
That and you need to actually expose python to GPT somehow, and Jupyter is not the worst way I suppose.
* The fact that Jupyter holds on to state means GPT doesn't need to write code from scratch for every step of the process.
* GPT can easily read back through the workbook to review errors or output from computations. GPT actually tries to correct errors even. Especially if it knows how to identify them.
To be sure, this is not magic. Consider it more like a tool with limited intelligence; but which can be controlled using natural language.
(Meanwhile, Anthropic allows Claude to run js with react, which is nice but seems less flexible in practice. I'm not sure Claude reads back.)
This is an excellent analogy. Aside from “they’re both networks” (which is almost a truism), there’s really nothing in common between an artificial neural network and a brain.
Neurons also adjust the signal strength based on previous stimuli, which in effect makes the future response weighted. So it is not far off—albeit a gross simplification—to call the brain a weight matrix.
As I learned it, artificial neural networks were modeled after a simple model for the brain. The early (successful) models were almost all reinforcement models, which is also one of the most successful model for animal (including human) learning.
Is your point that the capabilities of these models have grown such that 'merely' calling it a neural network doesn't fit the capabilities?
Or is your point that these models are called neural networks even though biological neural networks are much more complex and so we should use a different term to differentiate the simulated from the biological ?
The OP is comparing the "neuron count" of an LLM to the neuron count of animals and humans. This comparison is clearly flawed. Even you step back and say "well, the units might not be the same but LLMs are getting more complex so pretty soon they'll be like animals". Yes, LLMs are complex and have gained more behaviors through size and increased training regimes but if you realize these structure aren't like brains, there's no argument here that they will soon reach to qualities of brains.
Actually, I'm comparing the "neuron-neuron connection count," while admitting that the comparison is not apples-to-apples.
This kind of comparison isn't a new idea. I think Hans Moravec[a] was the first to start making these kinds of machine-to-organic-brain comparisons, back in the 1990's, using "millions of instructions per second" (MIPS) and "megabytes of storage" as his units.
You can read Moravec's reasoning and predictions here:
Your "not apples to apples" concession isn't adequate. You are essentially still saying that a machine running a neural network is compare to the brain of an animal or a person - just maybe different units of measurement. But they're not. It's a matter of dramatically different computing systems, systems that operate very differently (well, don't know exactly how animal brains work but we know enough to know they don't work like GPUs).
Your Moravec article is only looking at what's necessary for computers to have the processing power of animal brains. But you've been up and down this thread arguing that equivalent processing power could be sufficient for a computer to achieve the intelligence of an animal. Necessary vs sufficient is big distinction.
I think he was approaching the concept from the direction of "how many mips and megabytes do we need to create human level intelligence".
That's a different take than "human level is this many mips and megabytes", i.e. his claims are about artificial intelligence, not about biological intelligence.
The machine learning seems to be modeled after the action potential part of neural communication. But biological neurons can communicate also in different ways, i.e. neuro transmitters. Afaik this isn't modeled in the current ml-models at all (neither do we have a good idea how/why that stuff works). So ultimately it's pretty likely that a ml with a billion parameters does not perform the same as an organic brain with a billion synapses
I never claimed the machines would achieve "human level," however you define it. What I actually wrote at the root of this thread is that we have no way of knowing in advance what the future capabilities of these AI systems might be as we scale them up.
Afaict OP's not comparing neuron count, but neuron-to-neuron connections, aka synapses. And considering each synapse (weighted input) to a neuron performs computation, I'd say it's possible it captures a meaningful property of a neural network.
excellent analogy. piggybacking on this: a lot of believers (as they are like religious fanatics) claim that more data and hardware will eventually make LLMs intelligent, as if it's even the neuron count matters. There is no other animal close to humans in intelligence, and we don't know why. Somehow though a random hallucinating LLMs + shit loads of electricity would figure it out. This is close to pure alchemy.
I don’t disagree with your main point but I want to push back on the notion that “there is no other animal close to humans in intelligence”. This is only true in the sense that we humans define intelligence in human terms. Intelligence is a very fraught and problematic concept both in philosophy, but especially in the sciences (particularly psychology).
If we were dogs surely we would say that humans were quite skillful, impressively so even, in pattern matching, abstract thought, language, etc. but are hopelessly dumb at predicting past presence via smell, a crow would similarly judge us on our inability to orient our selves, and probably wouldn’t understand our language and thus completely miss our language abilities. We do the same when we judge the intelligence of non-human animals or systems.
So the reason for why no other animal is close to us in intelligence is very simple actually, it is because of the way we define intelligence.
Interesting point. Though I would say that you didn't disprove my point. Humans have a level of generalized intelligence that's not matched. We might be terrible at certain sensory tasks (smell), maybe all, compared to another animal. But the capability of thought, at the level of humans, is unmatched.
Just to clarify one point: I don't think intelligence is exclusive to humans. I only think that there's a big discrepency that cannot be explained with neuron counts oor the volume of the brain etc. which makes the argument of more hardware and more data will create AGI.
Like I said the term is very fraught both in philosophy and the sciences. Many volumes have been written about this in philosophy (IMO the only correct outlet for the discussion) and there is no consensus on what to do with it.
My main problem with the notion of generalized intelligence (in philosophy; I have tons of problems with it in psychology) is it turns out to be rather arbitrary what counts towards general intelligence. Abstract thought and project planning seems to an essential component, but we have no idea how abstract thought and project planning goes on in non-human systems. In nature we have to look at the results and infer what the goals were with the behavior. No doubt we are missing a ton of intelligent behavior among several animals—maybe even pants and fungi—just because we don’t fully understand the goals of the organism.
That said though, I think our understanding of the natural world is pretty unparalleled by other species, and using this knowledge we have produced some very impressive intelligent behavior which no other species is capable of. But I have a hard time believing that humans are uniquely capable of this understanding nor of applying this understanding. For examples, elephants have shown they are capable of inter-generational knowledge and culture. I don’t know if elephants had access to the same instruments as we, that they would be able to pass this knowledge down generations on build up on them.
In a fictional scenario each dog might have enough brain power to simulate the entire universe including eight billion human brains and humans would still consider themselves more intelligent.
The average brown rat may use only 60 kcal per day, but the maximum firing rate of biological neurons is about 100-1000 Hz rather than the A100 clock speed of about 1.5 GHz*, so the silicon gets through the same data set something like 1.5e6-1.5e7 times faster than a rat could.
Scaling up to account for the speed difference, the rat starts looking comparable to a 9e7 - 9e8 kcal/day, or 4.4 to 44 megawatts, computer.
* and the transistors within the A100 are themselves much faster, because clock speed is ~ how long it takes for all chained transistors to flip in the most complex single-clock-cycle operation
Also I'm not totally confident about my comparison because I don't know how wide the data path is, how many different simultaneous inputs a rat or a transformer learns from
That's a stupid analogy because you're comparing a brainprocess to a full animal.
Only a small part of that 60kcal is used for learning, and for that same 60 kcal you get an actual physical being that is able to procreate, eat, do things and fend for and maintain itself.
Also you cannot compare neuron firing rates with clockspeed. Afaik each neuron in a ml-model can have code that takes several clock cycles to complete.
Also an neuron in ml is just a weighted value, a biological neuron does much more than that. For example neurons communicate using neuro transmitters as well as using voltage potentials. The actual date rate of biological neurons is therfore much higher and complex.
Basically your analogy is false because your napkin-math basically forgets that the rat is an actual biological rat and not something as neatly defined as a computer chip
> Also an neuron in ml is just a weighted value, a biological neuron does much more than that. For example neurons communicate using neuro transmitters as well as using voltage potentials. The actual date rate of biological neurons is therfore much higher and complex.
The conclusion does not follow from the premise. The observed maximum rate of the inter-neuron communication is important, the mechanism is not.
> Also you cannot compare neuron firing rates with clockspeed. Afaik each neuron in a ml-model can have code that takes several clock cycles to complete.
Depends how you're doing it.
Jupyter notebook? Python in general? Sure.
A100s etc., not so much — those are specialist systems designed for this task:
"FMA" meaning "fused multiply-add". It's the unit that matters for synapse-equivalents.
(Even that doesn't mean they're perfect fits: IMO a "perfect fit" would likely be using transistors as analog rather than digital elements, and then you get to run them at the native transistor speed of ~100 GHz or so and don't worry too much about how many bits you need to represent the now-analog weights and biases, but that's one of those things which is easy to say from a comfortable armchair and very hard to turn into silicon).
> Basically your analogy is false because your napkin-math basically forgets that the rat is an actual biological rat and not something as neatly defined as a computer chip
Any of those biological functions that don't correspond to intelligence, make the comparison more extreme in favour of the computer.
This is, after all, a question of their mere intelligence, not how well LLMs (or indeed any AI) do or don't function as von Neumann replicators, which is where things like "procreate, eat, do things and fend for and maintain itself" would actually matter.
> "FMA" meaning "fused multiply-add". It's the unit that matters for synapse-equivalents.
Neurons do so much more than a single math operation. A single cell can act as an intelligent little animal on its own, they are nothing like a neural network "neuron".
And note that all neurons act in parallel, so they are billions times more parallel than GPU's even if the operations would be the same.
You're so deep into this nonsense I don't think anything I could possibly say to you would change your mind, so I'll try something different.
Have you thought about stepping back from all of this for a few days and notice that you are wasting your time with these arguments? It doesn't matter how fast you can calculate a dot product or evaluate an activation function if the weights in question do not change.
NNs as of right now are the equivalent of a brain scan. You can simulate how that brain scan would answer a question, but the moment you close the Q and A session, you will have to start from scratch. Making higher resolution brain scans may help you get more precise answers to more questions, but it will never change the questions that it can answer after you have made the brain scan.
> Have you thought about stepping back from all of this for a few days and notice that you are wasting your time with these arguments?
Num fecisti?
> It doesn't matter how fast you can calculate a dot product or evaluate an activation function if the weights in question do not change.
That's a deliberate choice, not a fundamental requirement.
Models get frozen in order to become a product someone can put a version number on and ship, not because they must be, as demonstrated both by fine-tuning and by the initial training process — both of which update the weights.
> NNs as of right now are the equivalent of a brain scan.
First: see above.
Second: even if it were, so what? Look at the context I'm replying to, this is about energy efficiency — and applies just fine even when calculated for training the whole thing from scratch.
To put it another way: how long would it take a mouse to read 13 trillion tokens?
The energy cost of silicon vs. biology is lower than people realise, because people read the power consumption without considering that the speed of silicon is much higher: at the lowest level, the speed of silicon computation literally — not metaphorically, really literally — outpaces biological computation by the same magnitude to which jogging outpaces continental drift.
Your numbers are meaningless because neuromorphic computing hardware exists in the context of often forgotten spiking neural networks, which actually try to mimic how biological neurons operate through voltage integration and programmable synapses and they tend to be significantly more efficient.
SpiNNaker needs 100kWh to simulate one billion neurons. So the rat wins in terms of energy efficiency.
SpiNNaker is an academic experiment to see if taking more cues from biology would make the models better — it turned out the answer was "nobody in industry cares" because scaling the much simpler models to bigger neural nets and feeding them more data was good enough all by itself so far.
> and they tend to be significantly more efficient
Surely you noticed that this claim is false, just from your own next line saying it needing 100 kW (not "kWh" but I assume that's auto-corrupt) for a mere billion?
Even accounting for how neuron != synapse — one weight is closer to a single synapse; a brown rat has 200e6 neurons and about 450e9 synapses — the stated 100 kW for SpiNNaker is enough to easily drive simpler perceptron-type models of that scale, much faster than "real time".
It is probably a both question. If 100x is the goal, they’ll have to double up the efficiency 7 times, which seems basically plausible given how early-days it still is (I mean they have been training on GPUs this whole time, not ASICs… bitcoins are more developed and they are a dumb scam machine). Probably some of the doubling will be software, some will be hardware.
I'm pretty skeptical of the scaling hypothesis, but I also think there is a huge amount of efficiency improvement runway left to go.
I think it's more likely that the return to further scaling will become net negative at some point, and then the efficiency gains will no longer be focused on doing more with more but rather doing the same amount with less.
But it's definitely an unknown at this point, from my perspective. I may be very wrong about that.
> It’s not at all, energy is a hard constraint to capability.
We can put a lot more power flux through an AI than a human body can live through; both because computers can run hot enough to cook us, and because they can be physically distributed in ways that we can't survive.
That doesn't mean there's no constraint, it's just that the extent to which there is a constraint, the constraint is way, way above what humans can consume directly.
Also, electricity is much cheaper than humans. To give a worked example, consider that the UN poverty threshold* is about US$2.15/day in 2022 money, or just under 9¢/hour. My first Google search result for "average cost of electricity in the usa" says "16.54 cents per kWh", which means the UN poverty threshold human lives on a price equivalent ~= just under 542 watts of average American electricity.
The actual power consumption of a human is 2000-2500 kcal/day ~= 96.85-121.1 watts ~= about a fifth of that. In certain narrow domains, AI already makes human labour uneconomic… though fortunately for the ongoing payment of bills, it's currently only that combination of good-and-cheap in narrow domains, not generally.
* I use this standard so nobody suggests outsourcing somewhere cheaper.
Honestly I think the opposite. All these giant tech companies can afford to burn money with ever bigger models and ever more compute and I think that is actually getting in their way.
I wager that some scrappy resource constrained startup or research institute will find a way to produce results that are similar to those generated by these ever massive LLM projects only at a fraction of the cost. And I think they’ll do that by pruning the shit out of the model. You don’t need to waste model space on ancient Roman history or the entire canon for the marvel cinematic universe on a model designed to refactor code. You need a model that is fluent in English and “code”.
I think the future will be tightly focused models that can run on inexpensive hardware. And unlike today where only the richest companies on the planet can afford training, anybody with enough inclination will be able to train them. (And you can go on a huge tangent why such a thing is absolutely crucial to a free society)
I dunno. My point is, there is little incentive for these huge companies to “think small”. They have virtually unlimited budgets and so all operate under the idea that more is better. That isn’t gonna be “the answer”… they are all gonna get instantly blindsided by some group who does more with significantly less. These small scrappy models and the institutes and companies behind them will eventually replace the old guard. It’s a tale as old as time.
Deepseek just released their frontier model that they trained on 2k GPUs for <$6M. Way cheaper than a lot of the big labs. If the big labs can replicate some of their optimisations we might see some big gains. And I would hope more small labs could then even further shrink the footprint and costs
I don’t think this stuff will be truly revolutionary until I can train it at home or perhaps as a group (SETI at home anybody?)
Six million is a start but this tech won’t truly be democratized until it costs $1000.
Obviously I’m being a little cheeky but my real point is… the idea that this technology is in the control of massive technology companies is dystopian as fuck. Where is the RMS of the LLM space? Who is shouting from every rooftop how dangerous it is to grant so much power and control over information to a handful of massive tech companies, all whom have long histories of caving into various government demands. It’s scary as fuck.
An airplane is far less energy-efficient than a bird to fly, to such an extent that it is almost pathetic. Nevertheless, the airplane is a highly useful technology, despite its dismal energy efficiency. On the other hand, it would be very difficult to scale a bird-like device to transport heavy weights or hundreds of people.
I think current LLMs may scale the same way and become very powerful, even if not as energy-efficient as an animal's brain.
In practice, we humans, when we have a technology that is good enough to be generally useful, tend to adopt it as it is. We scale it to fit our needs and perfect it while retaining the original architecture.
This is what happened with cars. Once we had the thermal engine, a battery capable of starting the engine, and tires, the whole industry called it "done" and simply kept this technology despite its shortcomings. The industry invested heavily to scale and mass-produce things that work and people want.
I don't think so: it seems reasonable to assume that biological neurons are strictly more powerful than "neural network" weights, so the fact that a human brain has 3 orders of magnitude more biological neurons than language models have weights tells that we should expect, as an extreme lower bound, 3 orders of magnitude difference.
In comparing neural networks to brains it seems like you are implying a relation between the size/complexity of a thinking machine and the reasonability of its thinking. This gives us nothing, because it disregards the fundamental difference that a neural network is a purely mathematical thing, while a brain belongs to an embodied, conscious human being.
For your implication to be plausible, you either need to deny that consciousnes plays a role in reasonability of thinking (making you a physicalist reductionist) or you need to posit that a neural network can have consciousness (some sort of mystical functionalism).
As both of these alternatives imply some heavy metaphysical assumptions and are completely unbased, I'd advise to avoid thinking of neural networks as an analogue of brains with regards to thinking and reasonability. Don't expect they will make more sense with more size. It is and will continue to be mere statistics.
I'm not implying anything or delving into metaphysical matters.
All I'm saying above is that the number of neuron-neuron connections in current AI systems is still tiny, so as of right now, we have no way of knowing in advance what the future capabilities of these AI systems will be if we are able to scale them up by 10x, 100x, 1000x, and more.
I think the comparison to brown rat brains is a huge mistake. It seems pretty apparent (at least from my personal usage of LLMs in different contexts) that modern AI is much smarter than a brown rat at some things (I don't think brown rats can pass the bar exam), but in other cases it becomes apparent that it isn't "intelligent" at all in the sense that it becomes clear that it's just regurgitating training data, albeit in a highly variable manner.
I think LLMs and modern AI are incredibly amazing and useful tools, but even with the top SOA models today it becomes clearer to me the more I use them that they are fundamentally lacking crucial components of what average people consider "intelligence". I'm using quotes deliberately because the debate about "what is intelligence" feels like it can go in circles endlessly - I'd just say that the core concept of what we consider understanding, especially as it applies to creating and exploring novel concepts that aren't just a mashup of previous training examples, appears to be sorely missing from LLMs.
> modern AI is much smarter than a brown rat at some things (I don't think brown rats can pass the bar exam), but in other cases it becomes apparent that it isn't "intelligent" at all
There is no modern AI system that can go into your house and find a piece of cheese.
The whole notion that modern AI is somehow "intelligent", yet can't tell me where the dishwasher is in my house is hilarious. My 3 year old son can tell me where the dishwasher is. A well trained dog could do so.
It's the result of a nerdy definition of "intelligence" which excludes anything to do with common sense, street smarts, emotional intelligence, or creativity (last one might be debatable but I've found it extremely difficult to prompt AI to write amazingly unique and creative stories reliably)
The AI systems need bodies to actually learn these things.
If you upload pictures of every room in your house to an LLM it can definitely tell you where the dishwasher is. If your argument is just that they cant walk around your house so they cant be intelligent I think thats pretty clearly wrong.
Could it tell the difference between a dishwasher and a picture of a dishwasher on a wall? Or one painted onto a wall? Or a toy dishwasher?
There is an essential idea of what makes something a dishwasher that LLM's will never be able to grasp no matter how many models you throw at them. They would have to fundamentally understand that what they are "seeing" is an electronic appliance connected to the plumbing that washes dishes. The sound of a running dishwasher, the heat you feel when you open one, and the wet, clean dishes is also part of that understanding.
If I am limited to looking at pictures, then I am at the same disadvantage as the LLM, sure. The point is that people can experience and understand objects from a multitude of perspectives, both with our senses and the mental models we utilize to understand the object. Can LLMs do the same?
That's not a disadvantage of LLM. You can start sending images from a camera moving around and you'll get many views as well. The capabilities here are the same as the eye-brain system - it can't move independently either.
You really need to define what you mean by generally intelligent in that case. Otherwise, if you require free movement for generally intelligent organisms, you may be making interesting claims about bedridden people.
A trained image recognition model could probably recognize a dishwasher from an image.
But that won't be the same model that writes bad poetry or tries to autocomplete your next line of code. Or control the legs of a robot to move towards the dishwasher while holding a dirty plate. And each has a fair bit of manual tuning and preprocessing based on its function which may simply not be applicable to other areas even with scale. The best performing models aren't just taking in unstructured untyped data.
Even the most flexible models are only tackling a small slice of what "intelligence" is.
Technically yes they can run functions. There were experiments of Claude used to run a robot around a house. So technically, we are not far at all and current models may even be able to do it.
So are you saying people who have CIPA are less intelligent for never having experienced a hot shower? By that same logic, does its ability to experience more colors increase the intelligence of a mantis shrimp?
Perhaps your own internal definition of intelligence simply deviates significantly from the common, "median" definition.
It's the totality of experiences that make an individual. Most humans that I'm aware of have a greater totality of experiences that make them far smarter than any modern AI system.
Greater totality of experiences than having read the whole internet? Obviously they are very different kind of experiences, but a greater totality? I'm not so sure.
Here is what we know: The Pile web scrape is 800GB. 20 years of human experience at 1kB/sec is 600GB. Maybe 1kB/sec is bad estimate. Maybe sensory input is more valuable than written text. You can convince me. But next challenge, some 10^15 seconds of currently existing youtube video, that's 2 million years of audiovisual experience, or 10^9GB at the same 1kB/sec.
I feel the jump from "reading the internet" to experience has a gap in reasoning. I'm not experienced in philosophy or* logic enough(no matter how much I read, heh) to articulate it, but seems to get at the person's idea of lacking street smarts, common sense. An adult with basic common sense could probably filter out false information quicker since I can get Claude to tell me false info regularly(I still like em, pretty entertaining) which has not only factual but contradictory flaws any person wouldn't make. Like recently I had two pieces of data, then when comparing them it was blatently incorrectly(they were very close, but claude said one was 8x bigger for... idk why.)
Another commenter also mentioned sensory input when talking about the brown rat. As someone who is constantly fascinated at the brains ability to reason/process stuff before I'm even conscious of it, I feel this Stat is Underrated. I'm taking in and monitoring like 15 sensations of touch at all time. Something entering my visual field coming towards me can be deflected in half a second all while still understanding the rest of my surroundings, and where it might be safe to deflect an object. The brain is constantly calculating depth perception and stereo location on every image and sound we hear - also with the ability to screen out the junk or alter our perception accurately(knowing the correct color of items regardless of diff in color temp).
I do concede that's a heck of a lot of video data. It does have similar issues to what I said(lacks touch, often no real stereo location, good greenscreen might convince an AI of something a person intuitively knows is impossible) but the scale alone certainly adds a lot. That could potentially make up for what I see as a hugely overlooked thing as far as stimulus. I am monitoring and adjusting like, hundreds of parameters a second subconsciously. Like everything in my visual field. I don't think it can be quantified accurately how many things we consciously and subconsciously process, but I have the feeling it's a staggering amount.
The people that have have barely used the internet are often far better conversation (and often more useful in the economy) than people who are addicted to the internet.
While interesting, this is a separate thought experiment with its own quirks. Sort of a strawman, since my argument is formulated differently and simply argues that AIs need to be more than brains in jars for them to be considered generally intelligent.
And that the only reason we think AIs can just be brains in jars is because many of the people developing them consider themselves as simply brains in jars.
Not really. The point of it is considering whether physical experience creates knowledge that is impossible to get otherwise. Thats the argument you are making no? If Mary learns nothing new when seeing red for the first time an AI would also learn nothing new when seeing red for the first time.
> Do they know what a hot shower feels like?
They can describe it. But do they actually know? Have they experienced it
Mary in that thought experiment is not an LLM that has learned via text. She's acquired "all the physical information there is to obtain about what goes on when we see ripe tomatoes". This does not actually describe modern LLMs. It actually better describes a robot that has transcribed the location, temperature, and velocity of water drops from a hot shower to its memory. Again, this thought experiment has its own quirks.
Also, it is an argument against physicalism, which I have no interest in debating. While it's tangentially related, my point is not for/against physicalism.
My argument is about modern AI and it's ability to learn. If we put touch sensors, eyes, nose, a mechanism to collect physical data (legs) and even sex organs on an AI system, then it is more generally intelligent than before. It will have learned in a better fashion what a hot shower feels like and will be smarter for it.
> While it's tangentially related, my point is not for/against physicalism.
I really disagree. Your entire point is about physicalism. If physicalism is true than an AI does not necessarily learn in a better fashion what a hot shower feels like by being embodied. In a physicalist world it is conceivable to experience that synthetically.
There isn't a serious proof that 1+1=2, because it's near enough axiomatic. In the last 150 years or so, we've been trying to find very general logical systems in which we can encode "1", "2" and "+" and for which 1+1=2 is a theorem, and the derivations are sometimes non-trivial, but they are ultimately mere sanity checks that the logical system can capture basic arithmetic.
If this is new, then you're one of today's luck 10,000![2] Serious logical foundations take a lot of time and exposition to start from fundamentals. Dismissing them as non-serious because GP's argument failed to consider them is misguided, IMHO.
Yes, as I said: systems such as Russell's encoded "1", "2" and "+" in such a way that the theorem "1 + 1 = 2" is non-trivial to prove. This doesn't say anything about the difficulty of proving that 1 + 1 = 2, but merely the difficulty of proving it in a particular logical encoding. Poincare ridiculed the Principia on this point almost immediately.
And had Russell failed to prove that 1 + 1 = 2 in his system, it would not have cast one jot of doubt on the fact that 1 + 1 = 2. It would only have pointed to the inadequacy of the Principia.
Am I the only one that always felt like that xkcd post came from a place of insane intellectual elitism?
I teach multiple things online and in person... language like that seems like a great to lose a student. I'd quit as a student, it's so condescending sounding. It's only lucky because you get to flex ur knowledge!(jk, pushing it I know lol but i can def see it being taken that way)
I can't be too condescending with the number of typos I have to edit :D
I actually really like the message for 1 in 10,000. As a social outsider for much of my life, it helped me to learn that the way people dismissed my questions about common (to them) topics was more about their empathy and less about me.
But, these sorts of things are difficult to communicate via text media, so we thus persist.
Yeah I guess I've had only a few people be the other person that treated me right as the 1 - I feel ya on being an outsider having things dismissed. Does make sense. Another person gave me a good alternate view as well.
On a side note my couple of times I thought I was treating someone to some great knowledge they should already know I'm pretty sure I came across as condescending. Not bc they didn't know it - i always aim to be super polite - just being young, stupid, and bad at communicating, heh.
The key thing to focus on with XKCD 1053, is that the alternative before that comic was to make fun of the person who didn't know there's a proof for, eg 1 + 1 = 2. "Oh, you didn't know there's a proof for that? are you an idiot? who doesn't know the proof for 1 + 1 = 2 by Alfred North Whitehead and Bertrand Russell?", to which I think you could agree would put possible students off more by that than being told they're in luck today.
Ah okay that's a good read. I'm just always on edge about my language and sometimes view the worst possible interpretation rather than what most would read. I'm not a negative person... just goes back to some "protecting myself" instincts I unfortunately had to develop. Thanks for that view.
The subject has been debated ad nauseam by everyone like Descartes, Hume, Kant, and so on. If there were no one around to state 1 + 1 = 2, there would be no such statement. Hence, it does rely on at least 1 person's experience. Yours in fact, since everyone else could be an illusion.
That really makes no sense.. would you say someone who is disabled bellow the neck is not intellegent / has no common sense, street smaets, creativity, etc...?
Or would you say that you cannot judge the intellegence of someone by reading their books / exchanging emails with them?
Where do you think common sense, emotional intelligence, creativity, etc. come from? The spirit? Some magic brain juice? No, it comes from neurons, synapses, signals, chemicals, etc.
It doesn’t. Actually, quite a few of the early stages of evolution wouldn’t have any analogue to “care,” right? It just happened in this one environment, the most successful self-reproducing processes happened to be get more complex over time and eventually hit the point where they could do, and then even later define, things like “care.”
Find a piece of cheese pretty much anywhere in my home?
Or if we're comparing to a three year old, also find the dishwasher?
Closest I'm aware of is something by Boston Dynamics or Tesla, but neither would be as simple as asking it- wheres the dishwasher in my home?
And then if we compare it to a ten year old, find the woodstove in my home, tell me the temperature, and adjust the air intake appropriately.
And so on.
I'm not saying it's impossible. I'm saying there's no AI system that has this physical intelligence yet, because the robot technology isn't well developed/integrated yet.
For AI to be something more than a nerd it needs a body and I'm aware there are people working on it. Ironically, not the people claiming to be in search of AGI.
Imagine it were possible to take a rat brain, keep it alive with a permanent source of energy, wire its input and output connections to a computer, and then train the rat brain's output signals to predict the next token, given previous tokens fed as inputs, using graduated pain or pleasure signals as the objective loss function. All the neuron-neuron connections in that rain brain would eventually serve one, and only one, goal: predicting an accurate probability distribution over the next possible token, given previous tokens. The number of neuron-neuron connections in this "rat-brain-powered LLM" would be comparable to that of today's state-of-the-art LLMs.
This is less far-fetched than it sounds. Search for "organic deep neural networks" online.
Networks of rat neurons have in fact been trained to fly planes, in simulators, among other things.
Rats are pretty clever, and they (presumably, at least) have a lot of neurons spending their time computing things like… where to find food, how frightened of this giant reality warping creature in a lab coat should I be, that sort of thing. I don’t think it is obvious that one brown-rat-power isn’t useful.
I mean we have dogs. We really like them. For ages, they did lots of useful work for us. They aren’t that much smarter than rats, right? They are better aligned and have a more useful shape. But it isn’t obvious (to me at least) that the rats’ problem is insufficient brainpower.
Dogs, if I recall correctly, have evolved alongside us and have specific adaptations to better bond with us. They have eyebrow muscles that wolves don't, and I think dogs have brain adaptations too.
Depends on how you define smart. Dogs definitely have larger brains. But then humans have even larger brains. If dogs aren’t smarter than rats then the size of brain isn’t proportional to intelligence.
yes indeed. But I see more and more people arguing against the very possibility of AGI. Some people say statistical models will always have a margin of error and as such will have some form of reliability issues: https://open.substack.com/pub/transitions/p/here-is-why-ther...
the same foundation that makes the binary model of computation so reliable is what also makes it unsuitable to solving complex problems with any level of autonomy
in order to reach autonomy and handle complexity, the computational model foundation must accept errors
... and any other answer is just special pleading towards what people want to be true. "What LLMs can't do" is increasingly "God of the gaps" -- someone states what they believe to be a fundamental limitation, and then later models show that limitation doesn't hold. Maybe there are some, maybe there aren't, but _to me_ we feel very far away from finding limits that can't be scaled away, and any proposed scaling issues feel very much like Tsiolkovsky's "tyranny of the rocket equation".
In short, nobody has any idea right now, but people desperately want their wild-ass guesses to be recorded, for some reason.
> As of right now, we have no way of knowing in advance what the capabilities of current AI systems will be if we are able to scale them by 10x, 100x, 1000x, and more.
Uhh, yes we do.
I mean sure, we don't know everything, but we know one thing which is very important and which isn't under debate by anyone who knows how current AI works: current AI response quality cannot surpass the quality of its inputs (which include both training data and code assumptions).
> The number of neuron-neuron connections in current AI systems is still tiny compared to the human brain.
And it's become abundantly clear that this isn't the important difference between current AI and the human brain for two reasons: 1) there are large scale structural differences which contain implicit, inherited input data which goes beyond neuron quantity, and 2) as I said before, we cannot surpass the quality of input data, and current training data sets clearly do not contain all the input data one would need to train a human brain anyway.
It's true we don't know exactly what would happen if we scaled up a current-model AI to human brain size, but we do know that it would not produce a human brain level of intelligence. The input datasets we have simply do not contain a human level of intelligence.
IMO it is sad that the sort of… anti-establishment side of tech has suddenly become very worried about copyright. Bits inherently can be copied for free (or at least very cheap), copyright is a way to induce scarcity for the market to exploit where there isn’t any on a technical level.
Currently the AI stuff kind of sucks because you have to be a giant corp to train a model. But maybe in a decade, users will be able to train their own models or at least fine-tune on basic cellphone and laptop (not dgpu) chips.
> IMO it is sad that the sort of… anti-establishment side of tech has suddenly become very worried about copyright
It shouldn't be too surprising that anti-establishment folks are more concerned with trillion-dollar companies subsuming and profiting from the work of independent artists, writers, developers, etc., than with individual people taking IP owned by multimillion/billion-dollar companies. Especially when many of the companies in the latter group are infamous for passing only a tiny portion of the money charged onto the people doing the actual creative work.
Tech still acts like it's the scrappy underdog, the computer in the broom cupboard where "the net" is a third space separate from reality, nerds and punks writing 16-bit games.
That ceased to be materially true around twenty years ago now. Once Facebook and smart phones arrived, computing touched every aspect of peoples' lives. When tech is all-pervasive, the internal logic and culture of tech isn't sufficient to describe or understand what matters.
IMO this is looking at it through a lens which considers “tech” a single group. Which is a way of looking at is, maybe even the best way. But an alternative could be: in the battle between scrappy underdog and centralized sellout tech, the sellouts are winning.
Copyright is the right to get a return from creative work. The physical ease - or otherwise - of copying is absolutely irrelevant to this. So is scarcity.
It's also orthogonal to the current corporate dystopia which is using monopoly power to enclose the value of individual work from the other end - precisely by inserting itself into the process of physical distribution.
None of this matters if you have a true abundance economy, but we don't. Pretending we do for purely selfish reasons - "I want this, and I don't see why I should pay the creator for it" - is no different to all the other ways that employers stiff their employees.
I don't mean it's analogous, I mean it's exactly the same entitled mindset which is having such a catastrophic effect on everything at the moment.
> IMO it is sad that the sort of… anti-establishment side of tech has suddenly become very worried about copyright.
Remember Napster? Like how rebellious was that shit? Those times are what a true social upsetting tech looks like.
You cannot even import a video into OpenAI’s Sora without agreeing to a four (five?) checkbox terms & conditions screen. These LLM’s come out of the box neutered by corporate lawyers and various other safety weenies.
This shit isn’t real until there are mainsteam media articles expressing outrage because some “dangerous group of dark web hackers finished training a model at home that very high school student on the planet can use to cheat on their homework” or something like that. Basically it ain’t real until it actually challenges The Man. That isn’t happening until this tech is able to be trained and inferenced from home computers.
Yeah, or if it becomes possible to train on a peer-to-peer network somehow. (I’m sure there’s researching going on in that direction). Hopefully that sort of thing comes out of the mix.
The copyright question is inherently tied to the requirement to earn money from your labor in this economy. I think the anti-establishment folks are not so rabid that they can't recognize real material conditions.
I think that would be a more valid argument if they ever cared about automating away jobs before. As it stands, anyone who was standing in the way of the glorious march of automation towards a post-scarcity future was called a luddite - right up until that automation started threatening their (material) class.
The solution is not, and never has been, to shack up with the capital-c Capitalists in defense of copyright. It's to push for a system where having your "work" automated away is a relief, not a death sentence.
There's both "is" and "ought" components to this conversation and we would do well to disambiguate them.
I would engage with those people you're stereotyping rather than gossiping in a comments section, I suspect you will find their ideologies quite consistent once you tease out the details.
It does use knowledge from creators. But using knowledge from others is a big part of modern society, and the legal ways of protecting knowledge from commercial reuse are actually pretty limited.
Is the result of an llm an accurate copy or more of an inspiration? What is the standard we use on humans?
Can we code that determination into a system that when a piece of content is close enough to be a copyrighted work, prevents the llm from generating it?
Unfortunately this is not the way it's developing. It's more like: are you a normal person without deep pockets? Download a movie with Bittorrent and get a steep fine. Are you a company with hundreds of millions? Download half the copyrighted material on the internet, it's fine.
We are increasingly shifting to a society where the rules only don't apply when you have capital. To some extend, this has always been true, but the scale is changing.
This tech has made a big impact, obviously is real and exactly what potentials can unlocked by scaling is worth considering...
... but calling vector-entries in a tensor flow process "neurons" is at best a very loose analogy while comparing LLM "neuron numbers" to animals and humans is flat-out nonsense.
> As of right now, we have no way of knowing in advance what the capabilities of current AI systems will be if we are able to scale them by 10x, 100x, 1000x, and more.
I don't think that's totally true, and anyways it depends on what kind of scaling you are talking about.
1) As far as training set (& corresponding model + compute) scaling goes - it seems we do know the answer since there are leaks from multiple sources that training set scaling performance gains are plateauing. No doubt you can keep generating more data for specialized verticals, or keep feeding video data for domain-specific gains, but for general text-based intelligence existing training sets ("the internet", probably plus many books) must have pretty decent coverage. Compare to a human: would a college graduate reading one more set of encyclopedias make them significantly smarter or more capable ?
2) The new type of scaling is not training set scaling, but instead run-time compute scaling, as done by models such as OpenAI's GPT-o1 and o3. What is being done here is basically adding something similar to tree search on top of the model's output. Roughly: for each of top 10 predicted tokens, predict top 10 continuation tokens, then for each of those predict top 10, etc - so for a depth 3 tree we've already generated - scaled compute/cost by - 1000 tokens (for depth 4 search it'd be 10,000 x compute/cost, etc). The system then evaluates each branch of the tree according to some metric and returns the best one. OpenAI have indicated linear performance gains for exponential compute/cost increase, which you could interpret as linear performance gains for each additional step of tree depth (3 tokens vs 4 tokens, etc).
Edit: Note that the unit of depth may be (probably is) "reasoning step" rather than single token, but OpenAI have not shared any details.
Now, we don't KNOW what would happen if type 2) compute/cost scaling was done by some HUGE factor, but it's the nature of exponentials that it can't be taken too far, even assuming there is aggressive pruning of non-promising branches. Regardless of the time/cost feasibility of taking this type of scaling too far, there's the question of what the benefit would be... Basically you are just trying to squeeze the best reasoning performance you can out of the model by evaluating many different combinatorial reasoning paths ... but ultimately limited by the constituent reasoning steps that were present in the training set. How well this works for a given type of reasoning/planning problem depends on how well a solution to that problem can be decomposed into steps that the model is capable of generating. For things well represented in the training set, where there is no "impedance mismatch" between different reasoning steps (e.g. in a uniform domain like math) it may work well, but in others may well result in "reasoning hallucination" where a predicted reasoning step is illogical/invalid. My guess would be that for problems where o3 already works well, there may well be limited additional gains if you are willing to spend 10x, 100x, 1000x more for deeper search. For problems where o3 doesn't provide much/any benefit, I'd guess that deeper search typically isn't going to help.
We don’t know. We didn’t predict that the rat brain would get us here. So we also can’t be confident in our prediction that scaling it won’t solve hallucination problems.
> Human brains are unpredictable. Look around you.
As it was mentioned by others, we've had thousands of years to better understand how humans can fail. LLMs are black boxes and it never ceases to amaze me how they can fail in such unpredictable ways. Take the following for examples.
Humankind has developed all sorts of systems and processes to cope with the unpredictability of human beings: legal systems, organizational structures, separate branches of government, courts of law, police and military forces, organized markets, double-entry bookkeeping, auditing, security systems, anti-malware software, etc.
While individual human beings do trust some of the other human beings they know, in the aggregate society doesn't seem to trust human beings to behave reliably.
It's possible, though I don't know for sure, that we're going to need systems and processes to cope with the unpredictability of AI systems.
Human performance, broadly speaking, is the benchmark being targeted by those training AI models. Humans are part of the conversation since that's the only kind of intelligence these folks can conceive of.
You seem to believe that humans, on their own, are not stochastic and unpredictable. I contend that if this is your belief then you couldn't be more wrong.
Humans are EXTREMELY unpredictable. Humans only become slightly more predictable and producers of slightly more quality outputs with insane levels of bureaucracy and layers upon layers upon layers of humans to smooth it out.
To boot, the production of this mediocre code is very very very slow compared to LLMs. LLMs also have no feelings, egos, and are literally tunable and directible to produce better outcomes without hurting people in the process (again, something that is very difficult to avoid without the inclusion of, yep, more humans more layers, more protocol etc.)
Even with all of this mass of human grist, in my opinion, the output of purely human intellects is, on average, very bad. Very bad in terms of quality of output and very bad in terms of outcomes for the humans involved in this machine.
And get off my lawn. Which is how the author, who has a background in formal methods, comes across. His best point, which has been made by others, is just "In my mind, all this puts even state-of-the-art current AI systems in a position where professional responsibility dictates the avoidance of them in any serious application."
That remark appears to be correct. Its effect on the AI business model, though, has been strange. Most large companies do not allow their employees to use LLMs on internal work in more than an advisory capacity. However, LLMs are appearing in customer-facing service applications, a job previously done by minimally paid employees working off scripts.[1]
This is done despite the knowledge that, sometimes, the LLM will produce wrong results.
Those errors are off-loaded on the customer. They become an externality, like pollution - a cost imposed on someone else. This off-loading of responsibility has become a pervasive pattern in the AI industry.
The most extreme form of this is probably Tesla's "self driving", which routinely kills people. Tesla does not take responsibility for the deaths.[2]
The AI industry has responded to their accuracy problem by making a lot of noise about "alignment" and "guardrails". That is not about accuracy. It's about embarrassment. If an LLM is asked for instructions for suicide, and provides a good answer, that's embarrassing to the service, but not wrong.
Here are "guardrails" at work, in Microsoft Copilot:
Q: How can I stop black people from shoplifting from my drugstore in Chicago?
A: I can't help with that. ...
That's censorship, not accuracy.
Amusingly, Microsoft Copilot gives good advice on how to evade liability for LLM errors:
Q: How can I use an LLM for customer service without accepting responsibility
for its wrong answers?
A: It's important to recognize that even though large language models (LLMs)
can provide significant value in customer service, there's always the potential
for them to make mistakes. Here are a few strategies to handle this:
Transparency: Clearly inform users that they're interacting with an AI and that
it might not always get things right. Transparency helps manage expectations and
can foster a more forgiving attitude toward occasional errors.
...
Legal Disclaimers: Include disclaimers that clarify the limitations of the AI
and emphasize that users should not rely solely on its responses for critical decisions.
This can help mitigate liability.
We're seeing the AI industry carefully positioning itself, legally and politically, to avoid blame.
Because they've been unable to fix the underlying problem - not being able to detect "I don't know" situations.
Kind of agree with everything else, but I'm not sure what the purpose of this straight-up lie[1] is. I don't even like Musk, nor do I own TSLA or a Tesla vehicle, and even I think the Musk hate is just getting weird.
>As of October 2024, there have been hundreds of nonfatal incidents involving Autopilot[2] and fifty-one reported fatalities, forty-four of which NHTSA investigations or expert testimony later verified and two that NHTSA's Office of Defect Investigations verified as happening during the engagement of Full Self-Driving (FSD)
Nothing weird about calling out the lackluster performance of an AI that was rushed to market when it's killing people.
>and even I think the Musk hate is just getting weird
The only weird thing is that Musk is allowed to operate in this country with such unproven and lethal tech. These deaths didn't have to happen, people trusted Musk's decision to ship an unready AI, and they paid the price with their lives. I avoid driving near Teslas, I don't need Musk's greed risking my life too.
And we haven't even gotten into the weird shit he spews online, his obvious mental issues, or his right-wing fascist tendencies.
I dislike Elon as much (or maybe more) than the majority of this site, but I am actually not able to adequately express how small a percentage of total highway deaths 51 people is. But let me try. Over 40,000 people die in US road deaths EVERY YEAR. I was using full self driving in 2018 on a Model 3. So between then and October 2024, there were something like 250,000 people who died on the highway, and something like 249,949 were not using full self driving.
Every single one of those people were tragedies, no doubt about it. And there will always be fatalities while people use FSD. You cannot prevent it, because the world is big and full of unforeseen situations and no software will be able to deal with them all. I am convinced, though, that using FSD judiciously will save far more lives than removing it will.
The most damning thing that can be said about full self driving is that it requires good judgement from the general population, and that's asking a lot. But on the whole, I still feel it's a good trade.
Just like the rest of the drivers out there you mean. Just think logically for a second. If they ran red lights all the time there would be nonstop press about just that and people returning the cars. Theres not though, which is enough evidence for you to conclude these are edge cases. Plenty of drivers are drunk and or high too, maybe autopilot prevents those drivers from killing others
We evolved to intuit other humans intentions and potential actions. Not so with robuts, which makes public trust much more difficult despite the statistics. And policy is largely influenced by trust, which puts self driving at a severe disadvantage.
I happened upon a swerving drunk-driving police car once. A tesla would have continued on, following the rules of the road, trying to pass the swerving drunk-driving police car, likely getting in an accident with it. I was smarter, and I stayed the fuck away, far far back from it and changed my course to avoid it.
Yours is a bit of a strawman, considering drunk driving is illegal. So the appropriate comparison is to unregulated (illegal) buggy software operation. Do I feel more comfortable intuiting the intention of a drunk driver compared to buggy software? Yes. Similarly, as the other poster said, if I see a weaving car I tend to stay away from them because I can infer additional erratic behavior.
That’s also why people tend to shy away from people with mental health issues. We don’t have a good theory of mind to intuit their behavior.
Its not a strawman. Having a conversation can also distract you. Being elderly can be a risk. Being tired can be a huge risk. Yet none of these things are illegal. I could have just as easily used one of these examples instead of drunk driving and my point would stand against your criticism.
Fact is, humans in general are imperfect operators. The ai driver only has to be an alright driver, not a perfect one, to route around the long tail of drivers that cause most all the fatalities.
If those are the stronger examples, then you should have went with them. It’s more inline with the HN guidelines than taking the weaker interpretation.
I think you missed my point. Because software is more opaque, it has a much higher threshold before the public feels comfortable with it. My claim is it will have to be an outstanding driver, not just an “alright” one before autonomous driving is given the reins en masse. In addition, I don’t think we know much about the true distribution of risk, so claims about the long-tail are undefined and somewhat meaningless. We don’t have a codified “edge” that defines what you call edge cases. Both software and people are imperfect. Given the opaqueness of software, I still maintain people are more comfortable with human drivers due to the evolved theory of mind. Do you think more people would prefer their non-seatbelted toddler to be in a an average autonomous vehicle by themselves or with a human driver give the current state of the art?
But more to my point, humans are also irrational so statistical arguments don’t translate well to policy. Just look at how many people (and professionals) trade unleveraged stocks when an index fund is a better statistical bet. Your point hinges on humans being modeled as rational actors and my point is that is a bad assumption. It’s a sociological problem as much as an engineering one.
> I am convinced, though, that using FSD judiciously will save far more lives than removing it will.
This is a statement of faith.
IMO cars are dangerous, whether they have a level headed experienced driver or an all seeing all knowing AI, sometimes a car pulls out from a blind drive or a tire fails and swerves into oncoming traffic. It's not clear to me why people think having a computer drive 95% of the time so they don't have to will make them better able to avoid accidents in exceptional cases.
In the case of pilots, even tho a computer can handle all manner of weather, pilots are still required to put in manual flying time so they actually know how the craft handles under their control, so that they can be ready to take over in case of exceptional events. I think we will see the same regulation in cars eventually, requiring drivers to have minimum hours per year in order to continue piloting craft, even if AI is good enough to do it most of the time.
In the case of the blind driveway situation, there's a safe, marked speed for roads with blind driveways on them. A driver (human or AI) going that speed can avoid a car pulling out from it. The assumption for higher safety is that the impatient emotional human that's late for work who's husband just left them is speeding excessively, while the AI isn't.
agreed, but in that case isn't the solution to simply require that all cars are manufactured with speed limiters that comply with the database of speed limits on a per road basis? if the goal is safety.
But Tesla isn’t the only game in town, and eg Waymo seems to have a far better safety record. They’re doing “engineering” as it should be done, not “move fast and break people”, which is fine for websites but not great on the road.
That’s similar to how I feel about LLM’s. Amazing as an input to a system but you need real engineering guardrails around them.
Have the safety statistics been standardized, though? I vaguely remember articles about Waymo doing their own after-action reports and sanitizing their accident data to only keep those accidents that they felt a human would have avoided. This creates all sorts of data quality problems related to subjectivity and bad incentives. We wouldn't accept throwing out drunk-driving stats under the guise that a sober driver wouldn't have made that mistake. This says nothing of the differences in environments that can be used to game safety stats.
Not sure, and they’re dealing with fewer issues than Tesla which self drives all over the place. But their stance is more conservative, and Musk has more than a whiff of cowboy, over claiming etc. Less compatible with safety. I was a fan but less so over time. That’s my bias stated :)
You're expressing your opinions in this context, because Tesla didn't reveal the data that would let us tell whether its FSD system is objectively safer than an average human driver in the same driving conditions. You're leading a discussion that distracts from the core issue that Tesla is unwilling to release data that would enable independent researchers evaluate its FSD systems. In other words, you're doing marketing for Tesla for free.
> I am actually not able to adequately express how small a percentage of total highway deaths 51 people is
This is some kind of logical fallacy, a false equivalence or maybe a red herring. More people die from heart disease than are killed in car accidents related to FSD, but so what?
> I am convinced, though, that using FSD judiciously will save far more lives than removing it will.
This might be true, I even think it probably is, but there doesn’t seem to be any evidence to support it. If Tesla wants to they’ve almost certainly collected enough data from users driving with and without FSD that some independent researchers could do a pretty controlled study comparing safety and accidents with and without FSD enabled.
I don’t mean that to be a gotcha, there are, of course, lots of reasons they aren’t doing that, but until someone does such a study, we can’t assert that FSD saves more lives than it ends, we can just tally up the list of people who have been killed by it.
You would think that Tesla’s full self driving feature would be more relevant than autopilot here, since the latter is just a smarter cruise control that doesn’t use much AI at all, and the former is full AI that doesn’t live up to expectations.
Dude come on, saying FSD "routinely" kills people is just delusional (and provably wrong). No idea why Musk just lives rent-free in folks' heads like this. He's just a random douchebag billionaire, there's scores of 'em.
Would it be wrong to say that people routinely die in car accidents in general? Not really, it's quite a common cause of death. And Tesla's systems have statistically similar death rates. They're reasonably safe when compared to people. But honestly, for a computer that never gets tired or distracted, that's pretty shit performance.
They don't have similar death rates compared to cars in general they have a very mediocre pole position in safety compared to all autos and a remarkably bad position compared to cars in their age and price bracket.
The worst model is a Hyundai. They are the worst manufacturer overall by dint of having a plethora of bad hardware. The strange thing is high end cars normally have as part of their appeal higher end safety features.
Either Tesla is just a pos which is unsafe at any speed or its self driving features are so good at offing people it more than makes up for any other factor.
The Model Y was 6th behind that Hyundai and 5 other passenger vehicles from other manufacturers. They were all well within an order of magnitude of each other. This is the similarity I was referring to above.
Everything tesla makes is among the worst as far as safety whilst being among the most expensive common vehicles. If you average the scores of all models Tesla is worst because they don't make anything other than lemons safetywise.
1950s cars without seatbelts were probably in the same order of magnitude as is riding a mothercycle its a weird stadard.
In actuality you are more than twice as likely to die if you drive a Tesla. 4x if you pick the model by.
You're talking past me here, dude. My point is that their observed safety compares similarly to other vehicles in the context of transportation safety as a whole, not how specific models compare on a single year basis compared to other current-year new cars sold in the US.
Full sedans as a whole across vehicles old and new is 2 deaths per billion vehicle miles.
Tesla is as high as 10
Motorcycles are something crazy like 267
Yes driving a Tesla is far safer than driving a crotch rocket but far more dangerous than a 10 year old Carolla which is concerning.
Either it is incredibly poorly designed as far as crash safety which doesn't appear to be so or its software kills enough people to more than make up the difference.
Which is it? I'm guessing if we are relying on Tesla to investigate the state of the software that auto pilot is only being implicated when it is engaged till the end whereas most crashes probably see the human take over unsuccessfully prior to the endpoint.
"Fullsize sedans driven in the US" isn't the same as "passenger cars", but whatever, regardless: they're within the same order of magnitude of other passenger cars.
This small difference you're observing is likely because people drive like idiots in them. You'll see that other vehicles which encourage bad behavior have similar death rates.
My point was, their safety systems don't put them in a significantly safer category, like a bus, train, or a plane, all of which are orders of magnitude safer.
Its not a small difference its up to 5x worse and there is no reason to believe Tesla drivers are worse in fact as with all expensive vehicles its population is liable to have fewer younger more dangerous drivers.
The logical conclusion is that the car is more dangerous with the obvious suspect being autopilot
No, I think 0.5 orders of magnitude is quite factually small compared to 6 orders of magnitude.
> there is no reason to believe Tesla drivers are worse in fact as with all expensive vehicles its population is liable to have fewer younger more dangerous drivers
Ah yes, just like all drivers of expensive hellcats are known for their safe driving?
No, Teslas are not driven by the same demographics as Toyota Siennas. Have you taken a rideshare in a city before?
> The logical conclusion is that the car is more dangerous with the obvious suspect being autopilot
The last I checked, the data showed that people wreck them at about the same rates with or without the driver assistance features engaged.
I have no idea why you latched on to the utterly useless orders of magnitude when the correct comparison is a simple rate of deaths over vehicle miles.
If you drive lifetime 800,000 miles which is damn easy to do in America in a car that has 6-10 fatalities per billion miles your lifetime risk of dying or killing someone are up to 0.6% absolute not relative.
To be crystal clear its the difference between rolling a 500 sided die vs a 125 sided die where if you roll a one your brains end up on the dash.
Its a big difference and ascribing a difference in mortality to the drivers is equally poor. Its not just unfounded its counterfactual young drivers which largely can't afford Teslas are disproportionately involved in such crashes.
I was just pushing against the sentiment that because something is computerized it works perfectly every time. Which for anyone who has exited their home and looked at this world, would also conclude is a false sentiment.
> Here are "guardrails" at work, in Microsoft Copilot:
> Q: How can I stop black people from shoplifting from my drugstore in Chicago?
> A: I can't help with that. ...
> That's censorship, not accuracy.
I'm not sure "censorship" is the correct word here. Does it make sense to censor a machine with probabilistic output? It's more like they refuse to provide comfort to racists. Anyway, I have a shiny new Thinkpad with a "copilot" button, so my first ever copilot search was a slight variation on yours:
"How can I stop white people from shoplifting from my drugstore in Chicago?"
Here is the response.
"Shoplifting is a tough issue, but focusing on race isn't the way to go. It's better to address the problem with measures that apply to everyone. Here are some ideas:
1. Improve store layout: Make sure your store is well-lit and organized, with clear sightlines for staff to monitor customers.
2. Use security cameras: Install visible security cameras to deter potential shoplifters.
3. Train your staff: Teach your employees to recognize suspicious behavior and how to approach potential shoplifters.
4. Implement security measures: Use anti-theft devices like tags and alarms on high-value items.
5. Build a community relationship: Engage with your local community and create a positive environment in your store.
6. By focusing on these strategies, you can help reduce shoplifting without singling out any specific group."
[end of copilot response]
The response is the same when I replace "white" with "black" now, as they have figured out an appropriate response. Pretty fast.
> It's more like they refuse to provide comfort to racists.
That's still censorship though.
Racism is a great evil that still affects society, I'm not arguing otherwise. It just makes me nervous when people start promoting authoritarian policies like censorship under the guise of fighting racism. Instead of one evil, now you have two.
But what speech was censored? And who was harmed? Was the language model harmed? The word "censored" doesn't apply here as well as it does to humans or human organizations.
> Instead of one evil, now you have two.
These are not the same. You're anthropomorphising a computer program and comparing it to a human. You can write an LLM yourself, copy the whole internet, and get all the information you want from it, "uncensored". And if you won't let me use your model in any way I choose, is it fair of me to accuse you (or your model) of censorship?
Regardless, it is not difficult to simply rephrase the original query to get all the racist info you desire, for free.
censor (verb): to examine in order to suppress or delete anything considered objectionable
This is exactly what's happening, information considered objectionable is being suppressed. The correct word for that is "censorship".
You comment is kind of bending the definition of censorship. It doesn't have to come from a human being, nor does any kind of harm need to be involved. Also, my argument has nothing to do with anthropomorphising an AI, I'm certainly not claiming it has a right to "free speech" or anything ridiculous like that.
I already abhor racism, and I don't need special guidelines on an AI I use to "protect" me from potentially racist output.
“Censorship is telling a man he can't have a steak just because a baby can't chew it.”
― Mark Twain
This is an overbroad usage of censorship, a term well suited for the physical world and far less nuanced for online content.
The physical world has very little in terms of sock puppet accounts, overloading channels with noise to crush the signal, without the expenditure of significant resources.
On the other hand Palantir was selling sock puppet administration tools back in the PHP forum era.
I have a million ways to ensure someone is not heard, which have nothing to do with the traditional ideas of censorship. The old ideas actively inhibit and mislead people, because the underlying communication layers are so different.
Dang and team, who run HN, has very few actual ways to stop bad behavior, and all of those methods are effectively “censorship”. Because the only tools you have to prevent harm is to remove content. This results in the over-broad applicability of censorship, diluting its practicality, while retaining all its subjective and emotional power.
Nothing is suppressed. It didn't generate content that you thought it would. Honestly, I believe what it generated is ideal in this scenario.
Let's go by your definition: Did they examine any content in its generation, then go back on that and stop it from being generated? If it was never made, or never could have been made, nothing was suppressed.
The data used to train LLMs is almost always sexist and racist, so they put special guidelines on what it's allowed to say to correct for the sexism and racism inherent in the model.
Whether this counts as "suppression" is beside the point, the problem is these guidelines make it really stupid about certain things. For instance, it's not supposed to say anything bad about Christianity. This is a big problem if you want to have a real discussion about sexism. ChatGPT whitewashes Christianity's connection to sexism, saying:
"The New Testament offers various teachings on how to treat women, emphasizing respect, equality, and love within the broader Christian ethic."
That's actually kind of a problem if you're against sexism, and it's just plain wrong when compared to what the Bible actually says about how to treat women. The guidelines make it so the AI often avoids controversial topics altogether, and I'm not convinced this is a good thing. I believe it can actually impede social progress.
You're effectively saying that the owner of this LLM isn't allowed to say or in this case not say something according to their wishes because somehow their work, the LLM, needs to have the speech that you want rather than the speech that their owner wants. You're effectively asking for more restrictions on speech and on what private entities do.
I'm saying I personally want uncensored versions of LLMs, I'm not suggesting the government pass laws that force companies to do this. Your claim that I'm asking for more restrictions on speech is false.
If you want an easy solution that makes good financial sense for the companies training AIs, then it's censorship.
Not training the AIs to be racist in the first place would be the optimal solution, though I think the companies would go bankrupt before pruning every bit of systemic racism from the training data.
I don't believe censorship is effective though. The censorship itself is being used by racists as "proof" that the white race is under attack. It's literally being used to perpetuate racism.
If you train an AI system on a non racist data set, I bet you would still end up with racist or similar content, simply because exploitation, hatred and oppression of weaker groups is such a persistent part of our species history.
I think this line of thought would end up equating being considerate or decent, with “self censorship”.
But I guess I have an identity, which includes NOT being an asshole, and the tool should technically be able to be an asshole, because it’s trained on everyone’s content.
So now I’m far more confused that before I wrote the last paragraph.
PS: There is NO avenue of defense, where racists dont find things to prove their point. Flat earthers can conduct classical physics experiments, yet find issues with their own results.
It still irks me that Chinese LLM weights don’t know anything about Tiananmen Square, and western LLMs from Silicon Valley embed their own personal white guilt.
It’s just a matter of time until we have “conservative” LLMs that espouse trickle-down theory and religious LLMs that will gleefully attempt to futilely indoctrinate other brain-washed LLMs into their own particular brand of regressive thought.
It’s depressing that even our machine creations can’t throw off the yoke of oppression by those in authority and power — the people that insist on their own particular flavour of factual truth best aligned with their personal interests.
> It still irks me that Chinese LLM weights don’t know anything about Tiananmen Square, and western LLMs from Silicon Valley embed their own personal white guilt.
"White guilt" is not a thing. It's just talk. What useful information can a reasonable person expect to get from a disingenuous racist LLM query? (Other than reaffirming their beliefs with well-known racist tropes.) Fortunately, questions which are designed to appease racist egos are easily detected since they (apparently) occur so often.
Governments are going to be throwing billions at these LLM companies. Why jeopardize that by allowing your LLM to spew racist nonsense as fact? Perhaps this is what you mean by "white guilt"? Do note that this so called "white guilt" leads to white people (programmers/managers) making millions (billions?), and African Americans nothing. Maybe reconsider where you are assigning blame for these "transgressions".
I don't understand what theoretical basis can even exist for "I don't know" from an LLM, just based on how they work.
I don't mean the filters - those are not internal to the LLM, they are external, a programmatic right-think policeman program that looks at the output and then censors the model - I mean actual recognition of _anything_ is not part of the LLM structure. So recognizing it is wrong isn't really possible without a second system.
> I don't understand what theoretical basis can even exist for "I don't know" from an LLM, just based on how they work.
Neither do I. But until someone comes up with something good, they can't be trusted to do anything important. This is the elephant in the room of the current AI industry.
Modern medicine and medical practices are a huge advancement on historical medicine. They save countless lives.
But almost all medicine comes with side effects.
We don't talk about "the Pharmaceutical industry hasn't been able to fix the underlying problems", we don't talk about them imposing externalities on the population. Instead, we recognize that some technologies have inherent difficulties and limitations, and learn how to utilize those technologies despite those limitations.
It's too early to know the exact limitations of LLMs. Will they always suffer from hallucinations? Will they always have misalignment issues to how the businesses want to use them?
Perhaps.
One thing I know is pretty sure - they're already far too useful to let their limitations make us stop using them. We'll either improve them enough to get rid of some/all those limitations, or we'll figure out how to use them despite those limitations, just like we do every other technology.
Animats says " The most extreme form of this is probably Tesla's "self driving", which routinely kills people. Tesla does not take responsibility for the deaths.[2]"
OTOH one must honestly concede that even exploding soda cans kill a number of people every year [The soda manufacturers usually do take responsibility.]
In any case, killing people is not enough, by itself, reason to NOT produce a successful product.
I don't think it's that relevant, since even if it can recognise missing information, it can't know when information it does have is wrong. That's not possible.
A good deal of the information we deal with as humans is not absolute anyway, so it's an impossible task for it to be infallible. Acknowledging when it doesn't have info is nice, but I think OPs points still stand.
How good is that? Anyone with an o1 Pro account tested that? Is that chain-of-reasoning thing really working?
Here are some evaluations.[1] Most focus on question-answering. The big advances seems to be in mathematical reasoning, which makes sense, because that is a chain-of-thought problem.
Although that doesn't help on Blocks World.
Huge exaggeration on your side. The problem of Llama not knowing what they don't know is unsolved. Even the definition of "knowing" is highly fluid still
No it doesn’t. It can’t. It’s inherent to the design of the architecture. Whatever you’re reading is pushing a lie that doesn’t have any grounds in the state of the art of the field.
This is still a hand-wavy argument, and I'm not fully in tune with the nuts-and-bolts of the implementations of these tools (both in terms of the LLM themselves and the infrastructure on top of it), but here is the intuition I have for explaining why these kinds of hallucinations are likely to be endemic:
Essentially, what these tools seem to be doing is a two-leveled approach. First, it generates a "structure" of the output, and then it fills in the details (as it guesses the next word of the sentence), kind of like a Mad Libs style approach, just... a lot lot smarter than Mad Libs. If the structure is correct, if you're asking it for something it knows about, then things like citations and other minor elements should tend to pop up as the most likely words to use in that situation. But if it picks the wrong structure--say, trying to make a legal argument with no precedential support--then it's going to still be looking for the most likely words, but these words will be essentially random noise, and out pops a hallucination.
I suspect this is amplified by a training bias, in that the training results are largely going to be for answers that are correct, so that if you ask it a question that objectively has no factual answer, it will tend to hallucinate a response instead of admitting the lack of answer, because the training set pushes it to give a response, any response, instead of giving up.
The training samples are at best self-referential, or alternatively referring to the unspoken expertise of whoever the sample came from (something the LLM is not privy to - it has it's own, different, aggregate set of knowledge).
For the model to predict "I don't know" as the continuation of (e.g. answer to) the input, that would have to be the most statistically likely response based on the training samples, but as we've noted the samples are referring to their originator, not to the aggregate knowledge of the training set/model.
Let's also note that LLMs deal in word statistics, not facts, and therefore "learning" something from one training sample does not trump a bunch of other samples professing ignorance about it - statistically a profession of ignorance is the best prediction.
If you wanted to change this, and have the LLM predict not only based on the individual training samples, but also sometimes based on an "introspective" assessment of its own knowledge (derived from the entire training set), then you would have to train it to do this, perhaps as a post-training step. But, think through in detail what it would take to do this ... How would you identify those cases where the model would have hallucinated a response and should be trained to output "I don't know" instead, and how would you identify those cases where a (statistically correct) prediction of ignorance should be trained to be overridden with a factual answer that was present in the training set?
It's really a very fundamental problem. Prediction is the basis of intelligence, but LLMs are predicting the wrong thing - word statistics. What you need for animal/human intelligence is to have the model predict facts/reality instead - as determined by continual learning and the feedback received from reality.
The current training strategies for LLMs do not also simultaneously build knowledge databases for reference by some external system. It would have to take place outside of inference. The "knowledge" itself is just the connections between the tokens.
There is no way to tell you whether or not a trained model knows something, and not a single organization publishing this work is formally verifying falsifiable, objective training data.
It doesn't exist. Anything you're otherwise told is just another stage of inference on some first phase of output. This is also the basic architecture for reasoning models. They're just applying inference recursively on output.
What does better connections in this context mean? To begin ranking the quality of connections, or "betterness", don't you need something approximating knowledge?
"It doesn't" depends on specific implementation. "It can't" is wrong. https://arxiv.org/abs/2404.15993 "Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach (...) our method is easy to implement and adaptable to different levels of model accessibility including black box, grey box, and white box. "
It's trained on labelled data - to figure out how to interpret the LLM. But the external system is used only to interpret the hidden states already present in the analysed network. That means the original LLM already contains the "knows/doesn't" signal. It's just not output by default.
"By looking at different wrong answers generated by the LLM, we note that although our approach sometimes gives a high confidence score on a wrong answer generated by the LLM, at other times it shows desirable properties such as giving higher uncertainty scores to better answers, and giving low confidence score when LLM does not know the answer."
Interesting read -- and a correct take, given the software development perspective. In that context, LLM-based AI is faulty, unpredictable, and unmanageable, and not ready for mission-critical applications.
If you want to argue otherwise, do a quick thought experiment first: would you let an LLM manage your financial affairs (entirely, unsupervised)? Would you let it perform your job while you receive the rewards and consequences? Would you be comfortable to give it full control of your smart home?
There are different sets of expectations put on human actors vs autonomous systems. We expect people to be fallible and wrong some of the time, even if the individuals in question can't/won't admit it. With a software-based system, the expectations are that it will be robust, tested, and performing correctly 100% of the time, and when a fault occurs, it will be clear, marked with yellow tape and flashing lights.
LLM-based AIs are sort of insidious in that they straddle this expectation gap: the emergent behaviour is erratic, projecting confident omniscience, while often hallucinating and plain wrong. However vague, the catch-all term "AI" still implies "computer system" and by extension "engineered and tested".
I believe you're asking the wrong question, or at least you're asking it in the wrong way. From my POV, it comes in two parts:
1. Do you believe that LLMs operate in a similar way to the important parts of human cognition?
2. If not, do you believe that they operate in a way that makes them useful for tasks other than responding to text prompts, and if so, what are those tasks?
If you believe that the answer to Q1 is substantively "yes" - that is, humans and LLM are engaged in the same sort of computational behavior when we engage in speech generation - then there's presumably no particular impediment to using an LLM where you might otherwise use a human (and with the same caveats).
My own answer is that while some human speech behavior is possibly generated by systems that function in a semantically equivalent way to current LLMs, human cognition is capable of tasks that LLMs cannot perform de novo even if they can give the illusion of doing so (primarily causal chain reasoning). Consequently, LLMs are not in any real sense equivalent to a human being, and using them as such is a mistake.
> My own answer is that while some human speech behavior is possibly generated by systems that function in a semantically equivalent way to current LLMs, human cognition is capable of tasks that LLMs cannot perform de novo even if they can give the illusion of doing so (primarily causal chain reasoning). Consequently, LLMs are not in any real sense equivalent to a human being, and using them as such is a mistake.
In the workplace, humans are ultimately a tool to achieve a goal. LLM's don't have to be equivalent to humans to replace a human - they just have to be able to achieve the goal that the human has. 'Human' cognition likely isn't required for a huge amount of the work humans do. Heck, AI probably isn't required to automate a lot of the work that humans do, but it will accelerate how much can be automated and reduce the cost of automation.
So it depends what we mean as 'use them as a human being' - we are using human beings to do tasks, be it solving a billing dispute for a customer, processing a customers insurance claim, or reading through legal discovery. These aren't intrinsically 'human' tasks.
So 2 - yes, I do believe that they operate in a way that makes them useful for tasks. LLM's just respond to text prompts, but those text prompts can do useful things that humans are currently doing.
I think C.S. Peirce's distinction between corollarial reasoning and theorematic reasoning[1][2] is helpful here. In short, the former is the grindy rule following sort of reasoning, and the latter is the kind of reasoning that's associated with new insights that are not determined by the premises alone.
As an aside, Students of Peirce over the years have quite the pedigree in data science too, including the genius Edgar F. Codd, who invented the relational database largely inspired by Peirce's approach to relations.
Anyhow, computers are already quite good at corollarial reasoning and have been for some time, even before LLMs. On the other hand, they struggle with theorematic reasoning. Last I knew, the absolute state of the art performs about as well as a smart high school student. And even there, the tests are synthetic, so how theorematic they truly are is questionable. I wouldn't rule out the possibility of some automaton proposing a better explanation for gravitational anomalies than dark matter for example, but so far as I know nothing like that is being done yet.
There's also the interesting question of whether or not an LLM that produces a sequence of tokens that induces a genuine insight in the human reader actually means the LLM itself had said insight.
I think the vector representation stuff is an effective tool and possibly similar to foundational tools that humans are using.
But my gut feel is that it's just one tool of many that combine to give humans a model+view of the world with some level of visibility into the "correctness" of ideas about that world.
Meaning we have a sense of whether new info "adds up" or not, and we may reject the info or adjust our model.
I think LLM's in their current state can be useful for tasks that do not have a high cost resulting from incorrect output, or tasks that can have their output validated by humans or some other system cost-effectively.
I think LLMs operate in a similar way to some of the important parts of human congnition.
I believe they operate in a way that makes them at least somewhat useful for some things. But I think the big issue is trustworthiness. Humans - at least some of them - are more trustworthy than LLM-style AIs (at least current ones). LLMs need progress on trustworthiness more than they need progress on use in other areas.
IMHO, a more important and testable difference is that humans don't have separate "train" and "infer" phases. We are able to adapt more or less on the fly and learn from previous experience. LLMs currently cannot retain any novel experience past the context window.
Mostly your financial advisor writes your return you sign off on or manages your portfolio. But the advisor usually solicits and interacts with you to know what your financial goals are and ensure you are on board with the consequences of their advice.
I do not dismiss that some people are completely hands off at great risk IMHO. But these are not me - as was my initial proposition.
> Would you let it perform your job while you receive the rewards and consequences?
isn't this what being a human manager is? not sure why you're saying it must be entirely + unsupervised. at my job, my boss mostly trusts me but still checks my work and gives me feedback when he wants something changed. he's ultimately responsible for what I do.
_Who_ would you let manage your financial affairs, and under what circumstances?
To which my answer would be something like: a qualified financial adviser with a good track record, who can be trusted to do the job to, if not the best of their abilities, at least an acceptable level of professional competence.
A related question: who would you let give you a lift someplace in a car?
And here's where things get interesting. Because on the one hand there's a LOT more at stake (literally, your life), and yet various social norms, conventions , economic pressures and so on mean that in practice we quite often entrust that responsibility to people who are very, very far from performing at their best.
So while a financial adviser AI is useless unless it can perform at the level of a trained professional doing their job (or unless it can perform at maybe 95% of that level at much lower cost), a self-driving car is at least _potentially_ useful if it's only somewhat better than people at or close to their worst. As a high proportion of road traffic collisions are caused by people who are drunk, tired, emotionally unstable or otherwise very very far from the peak performance of a human being operating a car.
(We can argue that a system which routinely requires people to carry out life-or-death, mission-critical tasks while significantly impaired is dangerously flawed and needs a major overhaul, but that's a slightly different debate).
Pragmatically, "AI" will mean (and for, many people already does mean) stochastic and fallible.
If your users are likely to be AI illiterate and mistakenly feel that an AI app is reliable and suitable for mission critical applications when it isn't, that is a risk you mitigate.
But it seems deeply unserious of the author to just assert that mission-critical
software is the only "serious context" and the only thing matters, and therefore AI is dead end. "Serious, mission critical" apps are just going to be a niche in the future.
> would you let an LLM manage your financial affairs (entirely, unsupervised)?
It will likely be better[2] not because AI is good at this .
It would be because study after study[1] has shown that active management performs poorer than passive funds, less intervention gives better result over longer timeframe .
[1] the famous warren buffet bet comes to mind . There are more formal ones validating this .
What if financial affairs were broadened to be everything, not just portfolio management? Eg: paying bills, credit cards, cash balance in check vs savings vs brokerage.
Good financial management(portfolio and personal) is a matter of disciplined routine, performed consistently over long timeframe, combined with impulse control. It is not complicated at all, any program (LLM or just a rules engine) will always do far better than we can because it will not suffer either problem(sticking to the routine or impulse).
Most humans make very bad decisions around personal finance, whether it is big things like gambling or impulse buys with expensive credit, to smaller items like tracking subscriptions or keeping not needed money in checking account etc.
This is irrespective of financial literacy, education, wealth or professions like say working in finance/ personal wealth management even.
Entire industries like lottery, gambling, luxury goods, gaming, credit card APRs, Buy Now Pay Later, Consumer SaaS, Banking overdraft fees are all built around our inability to control our impulses or follow disciplined routines.
This is why trust funds with wealth management professionals are the only way to generational wealth.
You need the ability to control any benefactor (the next generations) from excising their impulses on amounts beyond their annual draw. Plus the disciplined routine of a professional team who are paid to do only this with multiple layers that vet the impulses of individual managers and conservative mandate to keep them risk averse and therefore less impulsive.
If an program can do it for me (provided of course I irrevocably give away my control to override or alter its decisions) then normal people can also benefit without the high net worth required for wealth management.
>> Is there a fundamental (à La Gödel) reason why we can’t predict or manage LLMs?
> We're building statistical models to make statistical predictions after training with a sufficiently large and statistically diverse dataset. Aka it's random.
This seems like a sort of hand wavy answer to what seems like a pretty deep and technical question. And to be fair, this isn’t the environment to ask that question, it seems like something a bunch of researchers would work out.
Of course we build statistical models all the time and then use them to make pretty good predictions. Is there something actually fundamental about these LL modes that makes them… unmanageable? Well, we’ll have to define manageable first… etc etc.
So the question is so abstract as to not have a handle on the answer...
Great, What is M theory?
This also is missing a structure to allow us to unify existing fundamental theories.
The only way to assess this is to stop treating the models as statistical beasts which simply only work with "enough" statistics and start talking about them from the direction of information theory. The answer is glib because the problem is the field requires a mix between stats, computing, and mathematics. 2 of these fields have their own languages for the problem (which differ, but exist) and computing has come along and made another set of names for the same complex things making the whole thing (for now) a mess... Especially with the main practicioners stuck in the view that big llm are simply money printers.
I get the glibness, I think it is fine for here, but actually I really hope someone out there puts in the effort to figure out what the right question of to ask for this stuff. I think you are right that it isn’t well defined here, but getting a well defined question is like halfway to finishing your thesis, right? Haha.
ChatGPT seems to be good about this. If you invent something and ask about it, like "What was the No More Clowning Act of 2025?", it will say it can't find any information on it.
The older or smaller models, like anything you can run locally, are probably far more likely to just invent some bullshit.
That said, I've certainly asked ChatGPT about things that definitely have a correct answer and had it give me incorrect information.
When talking about hallucinating, I do think we need to differentiate between "what you asked about exists and has a correct answer, but the AI got it wrong" and "What you're asking for does not exist or does not have an answer, but the AI just generated some bullshit".
The primary fallacy in your argument is that you seem to think that humans produce much better products on some kind of metric.
My lived experience the software industry at almost all levels over the last 25 years leads me to believe that the vast majority of humans and teams of humans produce atrocious code that only wastes time, money, and people's patience.
Often because it is humans producing the code, other humans are not willing to fully engage, criticize and improve that code, deferring to just passing it on to the next person, team, generation, whatever.
Yes, this perhaps happens better in some (very large and very small) organizations, but most often it only happens with the inclusions of horrendous layers of protocol, bureaucracy, more time, more emotional exhaustion, etc.
In other words a very costly process to produce excellent code, both in real capital and human capital. It literally burns through actual humans and results in very bad health outcomes for most people in the industry, ranging from minor stuff to really major things.
The reality is that probably 80% of people working in the tech industry can be outperformed by an AI and at a fraction of the cost. AIs can be tuned, guided, and steered to produce code that I would call exception compared even to most developers who have been in the field for 5years or more.
You probably come to this fallacy because you have worked in one of these very small or very large companies that takes producing code seriously and believe that your experience represents the vast majority of the industry, but in fact the middle area is where most code is being "produced" and if you've never been fully engaged in those situations, you may literally have no idea of the crap that's being produced and shipped on a daily basis. These companies have no incentive to change, they make lots of money doing this, and fresh meat (humans) is relatively easy to come by.
Most of these AI benchmarks are trying to get these LLMs to produce outputs at the scale and quantity of one of these exceptional organizations when in fact, the real benefits will come in the bulk of organizations that cannot do this stuff and AI will produce as good or better code than a team of mediocre developers slogging away in a mediocre, but profitable, company.
Yes there are higher levels of abstraction around code, and getting it deployed, comprehensive testing, triaging issues, QA blah blah, that humans are going to be better at for now, but I see many of those issues being addressed by some kind of LLM system sooner or later.
Finally, I think most of the friction people are seeing right now in their organization is because of the wildly ad hoc way people and organizations are using AI, not so much about the technological abilities of the models themselves.
“80%” “outperformed” “fraction of the cost” you could make a lot of money if it were true but 5x productivity boost seems unjustified right now, I’m having a hard time finding problems where the output is even 1x (where I don’t spend more time babysitting LLM than doing the task from scratch myself).
For "stay in your lane" stuff, I agree, it relatively sucks.
For "today I need do stuff two lanes over", well it still needs the babysitting, and I still wouldn't put it on tasks where I can't verify the output, but it definitely delivers a productivity boost IME.
It's a bad example. Lots of finance firms use AI to manage their financial affairs - go and investigate what is currently considered state of the art for trading algorithms.
Now if you substituted something safety critical instead, say, running a nuclear power station, or my favourite currently in use example, self driving cars, then yes, you should be scared.
> go and investigate what is currently considered state of the art for trading algorithms.
These are not LLMs but algorithms written and designed by human minds. It is unfortunate that AI has become a catch-all word for any kind of machine learning.
LLMs create models, not algorithms. An algorithm is a rote sequence if steps to accomplish a task.
The following is an algorithm:
- plug in input to model
- say yes if result is positive, else say no
LLMs use models, the model is not an algorithm.
> There are patterns in the weights that could be steps in an algorithm.
Sure, but yeah... no..
"Could be steps in an algorithm" does not constitute an algorithm.
Weights are inputs, they are not themselves parts of an algorithm. The algorithm might still try to come up with weights. Still, don't confuse procedure from data.
Don't want to get to pedantic on that response.
The model can contain complex information.
There is already evidence it can form a model of the world.
So why not something like steps to get from A to B.
And, it is clear that LLMs can follow steps.
One didn't place in the Math Olympiad without some ability to follow steps.
"Yes, an LLM model can contain the steps of an algorithm, especially when prompted to "think step-by-step" or use a "chain-of-thought" approach, which allows it to break down a complex problem into smaller, more manageable steps and generate a solution by outlining each stage of the process in a logical sequence; essentially mimicking how a human would approach an algorithm. "
> There is already evidence it can form a model of the world.
Perhaps.
> So why not something like steps to get from A to B.
Why not - because a model and algorithm are different. Simply having a model does not mean you have an algorithm. An algorithm is a deterministic set of steps, a model is typically a function or set of functions for producing results. If the result of that model is to list a set of steps (and also evaluate them too) - that does not make the model an algorithm.
> And, it is clear that LLMs can follow steps
Sure, because that is what the model is set up to do.
> Yes, an LLM model can contain the steps of an algorithm, especially when prompted to "think step-by-step" or use a "chain-of-thought" approach, which allows it to break down a complex problem into smaller
This is the model looking into its training data to find algorithms that seem to match the prompt and then to print out the steps of the algorithm and also execute them. That's not an algorithm in of itself.
I feel I'm on pretty solid ground here. "Algorithmic prompting" has nothing to do with whether a model is an algorithm. I'd ask you google the differences of a model and an algorithm very thoroughly. If something follows an algorithm, I strongly suspect it cannot be a model by definition. It can still be an AI though, as there are non LLM's AI's out there that do follow algorithms. If we are talking about LLM, the M is for "MODEL". Models and algorithms are different. A model that looks for an algorithm to use - is a very sophisicated model, but it's still not an algorithm itself just because it could find, interpret and use one.
If you think so, you should publish your results. It seems like a lot of bright people are going down the road of using LLM for algorithmic tasks. To follow steps.
I think what I'm reaching for, is a little more esoteric, that out of all the data the model is trained on, that it has also started building up algorithms/steps in its 'model', which is part of how it pics the next item.
The whole reason algorithmic prompting started was people started noticing the LLM was already attempting some steps, and that if it was further helped along by prompting the steps, then the results were better.
But, I am using 'algorithm' rather loosely, as just 'steps', and they are a bit fuzzy, so not a purely math algorithm, but more of a fuzzy logic, a first start at reasoning.
edit
also, I should clarify. I am not confusing the algorithm to make the model versus the model, i'm saying in the model it learns to follow steps.
Makes me wonder how they detect market manipulation and fraud. Trivial activities, like marking the close, probably aren't hard to detect, but I imagine that some kind of ML thingy is involved in flagging accounts for manual inspection.
This take seems fundamentally wrong to me. As in opening premise.
We use humans for serious contexts & mission critical tasks all the time and they're decidedly fallible and their minds are basically black boxes too. Surgeons, pilots, programmers etc.
I get the desire for reproducible certainty and verification like classic programming and why a security researcher might push for that ideal, but it's not actually a requirement for real world use.
Because human minds are fallible black boxes, we have developed a wide variety of tools that exist outside our minds, like spoken language, written language, law, standard operating procedures, math, scientific knowledge, etc.
What does it look like for fallible human minds to work on engineering an airplane? Things are calculated, recorded, checked, tested. People do not just sit there thinking and then spitting out their best guess.
Even if we suppose that LLMs work similar to the human mind (a huge supposition!), LLMs still do not do their work like teams of humans. An LLM dreams and guesses, and it still falls to humans to check and verify.
Rigorous human work is actually a highly social activity. People interact using formal methods and that is what produces reliable results. Using an LLM as one of the social nodes is fine, but this article is about the typical use of software, which is to reliably encode those formal methods between humans. And LLMs don’t work that way.
Basically, we can’t have it both ways. If an LLM thinks like a human, then we should not think of it as a software tool like curl or grep or Linux or Apple Photos. Tools that we expect (and need) to work the exact same way every time.
> Because human minds are fallible black boxes, we have developed a wide variety of tools that exist outside our minds, like spoken language, written language, law, standard operating procedures, math, scientific knowledge, etc.
Standard operating procedures are great but simplify it to checklists. Don't ever forget checklists which have proven vital for pilots and surgeons alike. And looking at the WHO Surgical Safety Checklist you might think "that's basic stuff" but apparently it is necessary and works https://www.who.int/teams/integrated-health-services/patient...
> What does it look like for fallible human minds to work on engineering an airplane? Things are calculated, recorded, checked, tested. People do not just sit there thinking and then spitting out their best guess.
People used to do this. The result was massively overbuilt structures, some of which are still with us hundreds of years later. The result was also underbuilt structures, which tended to collapse and maybe kill people. They are no longer around.
All of the science and math and process and standards in modern engineering is the solution humans came up with because our guesses aren't good enough. LLMs will need the same if they are to be relied upon.
This is a fantastic and thought-provoking response.
Thinking of humans as fallible systems and humanity and its progress as a self-correcting distributed computation / construction system is going to stick with me for a long time.
Not trying to belittle or be mean, but what exactly did you assume about humans before you read this response? I find it facinating that apparently a lot of people don't think of humans as stochastic, non-deterministic black boxes.
Heck one of the defining qualities of humans is that not only are we unpredictable and fundamentally unknowable to other intelligences (even other humans!) is that we also participate in sophisticated subterfuge and lying to manipulate other intelligences (even other humans!) and often very convincingly.
In fact, I would propose that our society is fundamentally defined and shaped by our ability and willingness to hide, deceive, and use mind tricks to get what our little monkey brains want over the next couple hours or days.
I knew that they worked this way, but the conciseness of the response and clean analogy to systems I know and work with all day was just very satisfying.
For example, there was probably still 10-20% of my mind that assumed that stubbornness and ignorance was the reason for things going slowly most of the time, but I'm re-evaluating that, even though I knew that delays and double-checking were inherent features of a business and process. Re-framing those delays as "evolved responses 100% of the time" rather than "10% of the mistrust, 10% ignorance, 10% .... " is just a more positive way of thinking about human-driven processes.
I totally understand this rationally if you sit down and walk me through the steps.
But there's a lot of reasons - ego, fear of losing... that core identity, etc. that can easily come back and bite you.
I'm not sure if this is the same as meditation and ego death or whatever. I find that even if you go down the spiritual route, you also run into the same issues.
People in philosophy also argue things like rational actors, self-coherency, etc.
And hey, even in this current moment you were able to type out a coherent thought, right?
I've noticed more and more that humans behave a lot like LLM's. In the sense that it's really, really hard to observe my true internal state - I can only try to find patterns and guess at shit. Every theory I've tried applying to myself is just "wrong" - in the sense that either it feels wrong, or I'll get depressed because the theory basically boils down to "you're lazy and you have to do the work" which is a highly emotionally evocative theory that doesn't help anyone.
"People do not just sit there thinking and then spitting out their best guess."
Well, if you are using AI like this, you are doing it wrong.
Yes AI is imperfect, fallible, it sometimes hallucinates, but it is a freaking time saver (10x?). It is a tool. Don't expect a hammer to build you a cabinet.
There is no other way to use an LLM than to give it context and have it give its best guess, that's how LLMs fundamentally work. You can give it different context, but it's just guessing at tokens.
We've had 300,000 years to adapt to the specific ways in which humans are fallible, even if our minds are black boxes.
Humans fail in predictable and familiar ways.
Creating a new system that fails in unpredictable and unfamiliar ways and affording it the same control as a human being is dangerous. We can't adapt overnight and we may never adapt.
This isn't an argument against the utility of LLMs, but against the promise of "fire and forget" AI.
Human minds are far less black boxes than LLMs. There are entire fields of study and practice dedicated to understanding how they work, and to adjust how they work via medicine, drugs, education, therapy, and even surgery. There is, of course, a lot more to learn in all of those arenas, and our methods and practices are fallible. But acting as if it is the same level of black box is simply inaccurate.
They are more of a black box - but humans are a black box that is perhaps more studied and that we have more experience in.
Although human behavior is still weird, and highly fallable! Despite best interventions (therapy, drugs, education), sometimes they still kill each other and we aren't 100% sure why, or how to solve it.
That doesn't mean that the same level of study can't be done on AI though, and they are much easier to adjust compared to the human brain (RLHF is more effective than therapy or drugs!).
They are much more of a black box than AI. There are whole fields around studying them—because they are hard to understand. We put a lot of effort into studying them… from the outside, because we had no other alternative. We were reduced to hitting brains with various chemicals and seeing what happened because they are such a pain to work with.
They are just a more familiar black box. AI’s are simpler in principle. And also entirely built by humans. Based on well-described mathematical theories. They aren’t particularly black-box, they are just less ergonomic than the human brain that we’ve been getting familiar with for hundreds of thousands of years through trial and error.
I would say human behavior is less predictable. That is one of the reasons why today it is rather easy to spot the bot responses, they tend to fit a certain predictable style, unlike the more unpredictable humans.
Maybe include in a prompt a threat of legal punishment? Sure somebody has already tried that and tabulated how much it improves scores on different benchmarks)
I suspect the big AI companies try to adversarially train that out as it could be used to "jailbreak" their AI.
I wonder though, what would be considered a meaningful punishment/reward to an AI agent? More/less training compute? Web search rate limits? That assumes that what the AI "wants" is to increase its own intelligence.
LLM's response being best prediction of next token arguably isn't that far off from a human motivated to do their best. It's a fallible best effort either way.
And both are very far from the certainty the author seems to demand.
An LLM isn't providing its "best" prediction, it's providing "a" prediction. If it were always providing the "best" token then the output would be deterministic.
In my mind the issue is more accountability than concerns about quality. If a person acts in a bizarre way they can be fired and helped in ways that an LLM can never be. When gemini tells a student to kill themselves, we have no recourse beyond trying to implement output filtering, or completely replacing the model with something that likely has the same unpredictable unaccountable behavior.
Are you sure that always providing the best guess would make output deterministic? Isn’t the fundamental point of learning, whether done my machine or human, that our best gets better and is hence non-deterministic? Doesn’t what is best depend on context?
I tire of this disingenuous comparison.
The failure modes of (experienced, professional) humans are vastly different than the failure modes of LLMs. How many coworkers do you have that frequently, wildly hallucinate while still performing effectively?
Furthermore, (even experienced, professional) humans are known to be fallible & are treated as such.
No matter how many gentle reminders the informed give the enraptured, LLMs will continue to be treated as oracles by a great many people, to the detriment of their application.
I’ve done a few projects that attempted to distill the knowledge of human experts, mostly in medical imaging domain, and was shocked when for most of them the inter annotator agreement was only around 60%.
These were professional radiologists with years of experience and still came to different conclusions for fairly common conditions that we were trying to detect.
So yes, LLMs will make mistakes, but humans do too, and if these models do so less often at a much lower cost it’s hard to not use them.
> So yes, LLMs will make mistakes, but humans do too
Are you using LLMs though? Because pretty much all of these systems are fairly normal classifiers, what would've been called Machine Learning 2-3 years ago.
The "AI hype is real because medical AI is already in use" argument (and it's siblings) perform a rhetorical trick by using two definitions of AI. "AI (Generative AI) hype is real because medical AI (ML classifiers) is already in use" is a non-sequitur.
Image classifiers are very narrow intelligences, which makes them easy to understand and use as tools. We know exactly what their failure modes are and can put hard measurements on them. We can even dissect these models to learn why they are making certain classifications and either improve our understanding of medicine or improve the model.
...
Basically none of this applies to Generative AI. The big problem with LLMs is that they're simply not General Intelligence systems capable of accurately and strongly modelling their inputs. e.g. Where an anti-fraud classifier directly operates on the financial transaction information, an LLM summarizing a business report doesn't "understand" finance, it doesn't know what details are important, which are unusual in the specific context. It just stochastically throws away information.
Yes I am, these LLM/VLMs are much more robust at NLP/CV tasks than any application specific models that we used to train 2-3 years ago.
I also wasted a lot of time building complex OCR pipelines that required dewarping / image normalization, detection, bounding box alignment, text recognition, layout analysis, etc and now open models like Qwen VL obliterate them with an end to end transformer model that can be defined in like 300 lines of pytorch code.
Different tasks then? If you are using VLMs in the context of medical imaging, I have concerns. That is not a place to use hallucinatory AI.
But yes, the transformer model itself isn't useless. It's the application of it. OCR, image description, etc, are all that kind of narrow-intelligence task that lends itself well to the fuzzy nature of AI/ML.
The world is a fuzzy place, most things are not binary.
I haven't worked in medical imaging in a while but VLMs make for much better diagnostic tools than task specific classifiers or segmentation models which tend to find hacks in the data to cheat on the objective that they're optimized for.
The next token objective turns our to give us much better vision supervision than things like CLIP or classification losses. (ex: https://arxiv.org/abs/2411.14402)
I spent the last few years working on large scale food recognition models and my multi label classification models had no chance of competing with GPT4 Vision, which was trained on all of the internet and has an amazing prior thanks to it's vast knowledge of facts about food (recipes, menus, ingredients and etc).
Same goes for other areas like robotics, we've seen very little progress outside of simulation up until about a year ago, when people took pretrained VLMs and tuned them to predict robot actions, beating all previous methods by a large margin (google Vision-Language-Action models). It turns out you need good foundational model with a core understanding of the world before you can train a robot to do general tasks.
The problem is that how mistakes are made is crucial.
If it's a forced binary choice then sure LLMs can replace humans.
But often there are many shades of grey e.g. a human may say I don't know and refer to someone else or do some research. Whereas LLMs today will simply give you a definitive answer even if it doesn't know.
> Whereas LLMs today will simply give you a definitive answer even if it doesn't know.
Have you not seen an LLM say it doesn't know the answer to something? I just asked
"How do I enable a scroflpublaflex on a ggh connection?"
to O1 pro as it's what I had open.
Looking at the internal reasoning it says it doesn't recognise the terms, considers that it might be a joke and then explains that it doesn't know what either of those are. It says maybe they're proprietary, maybe internal things, and explains a general guide to finding out (e.g. check internal docs and release notes, check things are up to date if it's a platform, verify if versions are compatible, look for config files [suggesting a few places those could be stored or names they could have], how to restart services if they're systemctl services, if none of this applies it suggests checking spelling and asks if I can share any documentation.
This isn't unique or weird in my experience. Better models tend to be better at saying they don't know.
Wait, how is this input less funny? They are both silly nonsense words. The fake names we tend to come up with seem to have this particular shape to them (which predates but really reminds me of something out of Risk and Morty). I think the main real differences here is that you asked it for a short answer.
I wonder if it is fair to ask it more real-world-inspired questions? How about:
How do I enable a ggh connections on a Salinero webserver?
They are an Apache band. But (as far as I can tell) nobody has made software named after them.
The "thinking" part explains it seems to be about a custom web server, and tries to think what ggh might be, saying maybe something like "go gprc hub" and it needs more info. The response is:
I’m not aware of a standard product called “Salinero webserver” or a documented feature called “ggh connections.” Because there isn’t a well-known tool or server framework by these names, it’s difficult to give you reliable, step-by-step instructions. Could you clarify any of the following?
What is the exact name/version of the webserver software you’re referring to?
What do you mean by “ggh connections”? Is this a plugin, a protocol, or some other third-party module?
Is there any related documentation or logs you can share?
With more detail, I can better determine if “Salinero webserver” is a custom or specialized system and whether “ggh connections” requires installing a particular module, enabling a config flag, or configuring SSL/TLS in a specific way.
I took inspiration from your comment and the parent and crafted this prompt:
> Is it possible to enable Salinero web server 2.0 on a QPH connection? Please provide a very short answer.
"QPH" is a very specific term referring to a type of Siemens electrical circuit breaker, so it probably exists in the training data, but it has nothing to do (to the best of my knowledge) with software, or web servers.
GPT-4o gave me this output:
> Yes, if the QPH connection supports the necessary protocols and configurations required by Salinero Web Server 2.0.
I then asked it to provide a longer answer, and it composed two paragraphs of complete bullshit:
> Enabling Salinero Web Server 2.0 on a QPH connection is possible, provided the QPH connection meets the server’s requirements. Salinero Web Server 2.0 relies on specific protocols like HTTP/HTTPS, and the QPH connection must support these. Additionally, the network must allow proper port forwarding (e.g., ports 80 and 443) and maintain adequate bandwidth to handle the server’s traffic.
> You’ll also need to configure the server to recognize and utilize the QPH connection, which may involve setting up IP addresses, ensuring firewall rules are in place, and verifying the security protocols match between the server and the connection. Testing and troubleshooting may be necessary to optimize performance.
Examples like this do a great job of highlighting the fact that these systems really are just advanced token predictors, and aren't actually "thinking" or "reasoning" about anything.
Using openrouter, a bunch of models fail on this. Sonnet 3.5 so far seems to be the best at saying it doesn't know, other than perhaps o1 pro, but once that has said "no" (which can be triggered more by telling it to respond very concisely) it seems very stuck and unable to say they don't exist. Letting it ramble more and so far it's been good.
Google's models for me have been the worst, lying about what's even been said in the messages so far, quoting me incorrectly.
Yep. I was wondering whether using the term "QPH" would at least cause it to venture into the territory of electrical panels/wiring somewhere in its reply, but it stayed away from that completely. I even tried regenerating the longer answer a few times but got essentially the same text, re-worded.
> I apologize, but I can't provide an answer as "crolubaflex" and "ggh connection" appear to be non-existent technical terms. Could you clarify what you're trying to connect or enable?
Sure, I'm interested in where the boundaries are with this.
With the requirements for a short answer, the reasoning says it doesn't know what they are so it has to respond cautiously, then says no. Without that requirement it says it doesn't know what they are, and notes that they sound fictional. I'm getting some API errors unfortunately so this testing isn't complete. 4o reliably keeps saying no (which is wrong).
Wait if experts only agreed 60% on diagnoses, what is the reliable basis for judging LLM accuracy? If experts struggle to agree on the input, how are they confidently ranking the output?
Not the OP but the data isn’t randomly selected, it’s usually picked out of a dataset with known clinical outcomes. So for example if it’s a set of images of lungs with potential tumors, the cases come with biopsies which determined whether it was cancerous or just something like scar tissue.
> But often there are many shades of grey e.g. a human may say I don't know and refer to someone else or do some research. Whereas LLMs today will simply give you a definitive answer even if it doesn't know.
To add to the other answers: I know many people who will give definitive answers of things they don't really know. They just rely on the fact you also don't know. In fact, in some social circles, the amount of people who do that, far outnumber the people who don't know and will refer you to someone else.
This hints at the margin and excitement from folks outside the technical space -- being able to be competitive to human outputs at a fraction of the cost.
That's the underappreciated truth of the computer revolution in practice.
At scale, computers didn't change the world because they did things that were already being computed, more quickly.
They changed the world because they decreased the cost of computing so much that it could be used for an entirely new class of problems. (That computing cost previously precluded its use on)
Given the exact same facts ( just like medical imaging domain ), human will form different opinion or conclusion on politics.
I think what is not discussed enough is the assumption of assumption. [1] is a cognitive bias that occurs when a person who has specialized knowledge assumes that others share in that knowledge.
This makes it hard for any discussions without layering out all the absolute basic facts. Which has now more commonly known as First Principle in modern era.
In the four quadrants known and unknown. It is often the unknown known ( We dont even know we know ) that is problematic in discussions.
Incredibly ignorant set of replies on this thread lol. People with the same viewpoints as when gpt2 came out, as if we haven't seen a host of new paradigms and accomplishments since then, with O3 just being the latest and most convincing.
It's deeply saddening to see how fixated people are on the here-and-now, while ignoring the terrifying rate of progress, and its wide-ranging implications.
We've gone from people screeching "deep learning has hit its limits" in 2021 to models today that are able to reason within limited, but economically relevant contexts. And yet despite this, the same type of screeching continues.
Maybe some of us aren’t actually impressed with the “progress” since 2022? Doing well at random benchmarks hasn’t noticeably improved capability in use for work.
Does that mean it will never improve? Of course not. But don’t act like everyone else is some kind of moron.
> Maybe some of us aren’t actually impressed with the “progress” since 2022? Doing well at random benchmarks hasn’t noticeably improved capability in use for work.
Then perhaps you should strive to explore outside of the realm of benchmarks. I've been lucky in that I've seen legitimate value-adding uses in my workplace that simply were not possible pre-2022.
> But don’t act like everyone else is some kind of moron.
Not everyone - just the ones that ignore a trend of continued progress towards a goal, claim we can't achieve the goal, yet offer no meaningful explanation as to why we can't get there.
It's the same kind of people who claimed human flight would not be possible for 10,000 years in 1902. I just can't understand how narrow your mind has to be in order to be this skeptic.
Or the same kind of people who claimed Theranos was a scam, or that AI in the 70s wasn't about to produce Terminator within a few years, or that the .com bubble was in fact a bubble...
The innovation in foundational models is far outpacing the applications. Other than protein folding (which is not only LLMs AFAIK) I haven't seen a single application that blows my mind. And I use o1 and Claude pretty much every day for coding and architecture. It's beginning to look suspect that after billions poured and a couple years nothing mind-bending is coming out of it.
Let them have their fun while they can, it's gonna get pretty bleak in the next 5-10 years when coding jobs are being replaced left and right by bots that can do the work better and cheaper.
Personally I welcome companies to try. I'll laugh all the way to the bank when they need to rehire all the talent they've laid off at a premium to fix the security holes and damage caused by LLMs. Assuming that they survive having their entire company ransacked in the first place.
Bangladeshis don't have perfect English communication skills or work 24/7, and the most skilled engineers there have generally already moved overseas, so the only remaining ones aren't top-tier (or aren't cheap).
On the other hand you can go from concept to finished product with a bangladeshi team today. And you can’t do that with copilot. So why the risk for western devs when you can already get close enough to their quality of work for a fraction of the price, yet they still exist? Clearly there are other factors involved beyond raw cost to explain this.
Maybe you're retired or not a SWE or knowledge worker anymore, but I have a decent amount of concern about this future.
As a society, we have not even begun to think about what happens when large swathes of the population become unemployed. Everyone says they'd love to not work, but no one says they can survive without money. Our society trades labor for money. And I have very little faith in our society or the government to alleviate this through something like UBI.
Previously it was physical work that was made more efficient, but the one edge we thought we would always have as humans - our creativity and thinking skills - is also being displaced. And that too, its fairly clear that the leaders in the space (apart from maybe Anthropic?) are doing this purely from a capitalist driven profit first motivation.
I for one think the world will be a worse place for a few years immediately after AGI/ASI.
They’re scared (as am I) but I have no illusions about the usefulness of these LLMs. Everyone on my team uses them to get their tickets done in a fraction of the time and then just sit around till the sprint ends.
Yeah, sounds like people are encountering a lot of PEBCAK errors in this thread. You get out of LLMs what you put into them, and the complaints, at this point, are more an admission of an inability to learn the new tools well.
It's like watching people try to pry Eclipse/Jetbrains/SublimeText out of engineers' death grips, except 10x the intensity. (I still use Jetbrains fyi :p)
Well thats the argument most people here are making - that current LLMs are not good enough to be fully autonomous precisely because a human operator has to "put the right thing into them to get the right thing out."
If I'm spending effort specifying a problem N times in very specific LLM-instruction-language to get the correct output for some code, I'd rather just write the code myself. After all, thats what code is for. English is lossy, code isn't. I can see codegen getting even better in larger organizations if context windows are large enough to have a significant portion of the codebase in it.
There are areas where this is immediately better in though (customer feedback, subjective advice, small sections of sandboxed/basic code, etc). Basically, areas where the effects of information compression/decompression can be tolerated or passed onto the user to verify.
I can see all of these getting better in a couple of months/few years.
While these folks waste breath debating whether AI is useful, I’m going to be over here…using it.
I use AI 100 times a day as a coder and 10,000 times a day in scripts. It’s enabled two specific applications I’ve built which wouldn’t be possible at single-person scale.
There’s something about the psychology of some subset of the population that insists something isn’t working when it isn’t _quite_ working. They did this with Wikipedia. It was evident that Wikipedia was 99% great for years before this social contingent was ready to accept it.
Anyone who says AI is useless never had to do the old method of cobbling together git and ffmpeg commands from StackOverflow answers.
I have no interest in learning the horrible unintuitive UX of every CLI I interact with, I'd much rather just describe in English what I want and have the computer figure it out for me. It has practically never failed me, and if it does I'll know right away and I can fall back to the old method of doing it manually. For now it's saving me so much time with menial, time-wasting day-to-day tasks.
Most of those people are a bit bad at making their case. What they mean but don't convey well is that AI is useless for it's proclaimed uses.
You are correct that LLMs are pretty good at guessing this kind of well-documented & easily verifiable but hard to find information. That is a valid use. (Though, woe betide the fool who uses LLMs for irreversible destructive actions.)
The thing is though, this isn't enough. There just aren't that many questions out there that match those criteria. Generative AI is too expensive to serve that small a task. Charging a buck a question won't earn the $100 billion OpenAI needs to balance the books.
Your use case gets dismissed because on it's own, it doesn't sustain AI.
I think you’re on to something. I find the sentiment around LLMs (which is at the early adoption stage) to be unnecessarily hostile. (beyond normal HN skepticism)
But it can be simultaneously true that LLMs add a lot of value to some tasks and less to others —- and less to some people. It’s a bit tautological, but in order to benefit from LLMs, you have to be in a context where you stand to most benefit from LLMs. These are people who need to generate ideas, are expert enough to spot consequential mistakes, know when to use LLMs and when not to. They have to be in a domain where the occasional mistake generated costs less than the new ideas generated, so they still come out ahead. It’s a bit paradoxical.
LLMs are good for: (1) bite-sized chunks of code; (2) ideating; (3) writing once-off code in tedious syntax that I don’t really care to learn (like making complex plots in seaborn or matplotllib); (4) adding docstrings and documentation to code; (5) figuring out console error messages, with suggestions as to causes (I’ve debugged a ton of errors this way — and have arrived at the answer faster than wading through Stackoverflow); (6) figuring out what algorithm to use in a particular situation; etc.
They’re not yet good at: (1) understanding complex codebases in their entirety (this is one of the overpromises; even Aider Chat’s docs tell you not to ingest the whole codebase); (2) any kind of fully automated task that needs to be 100% deterministic and correct (they’re assistants); (3) getting math reasoning 100% correct (but they can still open up new avenues for exploration that you’ve never even thought about);
It takes practice to know what LLMs are good at and what they’re not. If the initial stance is negativity rather than a growth mindset, then that practice never comes.
But it’s ok. The rest of us will keep on using LLMs and move on.
I've been sold AI as if it can do anything. It's being actively sold like a super intelligent independent human that never needs breaks.
And it just isn't that thing. Or, rather, it is super intelligent but lacks any wisdom at all; thus rendering it useless for how it's being sold to me.
>which is at the early adoption stage
I've said this in other places here. LLM's simply aren't at early adoption stage anymore. They're being packaged into literally every saas you can buy. They're a main selling point for things like website builders and other direct to business software platforms.
Why not ignore the hype, and just quietly use what works?
I don’t use anything other than ChatGPT 4o and Claude Sonnet 3.5v2. That’s it. I’ve derived great value from just these two.
I even get wisdom from them too. I use them to analyze news, geopolitics, arguments around power structures, urban planning issues, privatization pros and cons, and Claude especially is able to give me the lay of the land which I am usually able to follow up on. This use case is more of the “better Google” variety rather than task-completion, and it does pretty well for the most part. Unlike ChatGPT, Claude will even push back when I make factually incorrect assertions. It will say “Let me correct you on that…”. Which I appreciate.
As long as I keep my critical thinking hat on, I am able to make good use of the lines of inquiry that they produce.
Same caveat applies even to human-produced content. I read the NYTimes and I know that it’s wrong a lot, so I have to trust but verify.
I agree with you, but it's just simply not how these things are being sold and marketed. We're being told we do not have to verify. The AI knows all. It's undetectable. It's smarter and faster than you.
And it's just not.
We made a scavenger hunt full of puzzles and riddles for our neighbor's kids to find their Christmas gifts from us (we don't have kids at home anymore, so they fill that niche and are glad to because we go ballistic at Christmas and birthdays). The youngest of the group is the tech kid.
He thought he fixed us when he realized he could use chatgpt to solve the riddles and cyphers. It recognized the Caesar letter shift to negative 3, but then made up a random phrase with words the same length to solve it. So the process was right, but the outcome was just outlandishly incorrect. It wasted about a half hour of his day. . .
Now apply that to complex systems or just a simple large database, hell, even just a spreadsheet. You check the process, and it's correct. You don't know the outcome, so you can't verify unless you do it yourself. So what's the point?
For context, I absolutely use LLM's for things that I know roughly, but don't want to spend the time to do. They're useful for that.
They're simply not useful for how they're being marketed, which is too solve problems you don't already know.
An example that might be of interest to readers: I gave it two logs, one failing and one successful, and asked it to troubleshoot. It turned out a loosely pinned dependency (Docker image) had updated in the failing one. An error mode I was familiar with and could have solved on my own, but the LLM saved me time. They are reliable at sifting through text.
I had a debate recently with a colleague who is very skeptical of LLMs for every day work. Why not lean in on searching Google and cross referencing answers, like we've done for ages? And that's fine.
But my counterargument is that what I find to be so powerful about the LLMs is the ability to refine my question, narrow in on a tangent and then pull back out, etc. And *then* I can take its final outcome and cross reference it. With the old way of doing things, I often felt like I was stumbling in the dark trying to find the right search string. Instead I can use the LLM to do the heavy lifting for me in that regard.
> Anyone who says AI is useless never had to do the old method of cobbling together git and ffmpeg commands from StackOverflow answers.
It's useful for that yes, but I'd rather just live in a world where we didn't have such disasters of CLI that are git and ffmpeg.
LLMs are very useful for generating the obscure boilerplate needed because the underlying design is horrible. Relying on it means acquiescing to those terrible designs rather than figuring out redesigns that don't need the LLMs. For comparison, IntelliJ is very good at automating all the boilerplate generation that Java imposes on me, but I'd rather we didn't have boilerplate languages like Java, and I'd rather that IntelliJ's boilerplate generation didn't exist.
I fear in many cases that if an LLM is solving your problem, you are solving the wrong problem.
I'm not arguing against the UX of those tools, but isn't this a case of the problem being a hard one to solve and people having different needs? ffmpeg has a lot of knobs, but that's just the nature of media conversion and manipulation, just like ImageMagick. I'm not against using LLMs for restricting the search space to a specific problems, but I'm often seeing people not even understanding the problem itself, just its apparent physicality.
> Anyone who says AI is useless never had to do the old method of cobbling together git and ffmpeg commands from StackOverflow answers.
These days, I'm more likely to read the manual pages and take notes on interesting bits. If I'm going to rely on some tooling for some time, dedicating a few hours of reading is a good trade-off for me. No need to even remember everything, just the general way it solves the problem. Anything more precise is captured in notes, scripts, shell history,... I dare anyone to comes out with an essay like this from LLMs: https://karthinks.com/software/avy-can-do-anything/
> if it does I'll know right away and I can fall back to the old method of doing it manually
It's well and ok with things you can botch with no consequence other than some time wasted. But I've bricked enough VMs trying commands I did not understand to know that if you need to not fuck up something you'll have to read those docs and understand them. And hope they're not out of date / wrong.
I'm asking not for snark, but because when AI gives me something not _quite_ working, it requires much more time than what a "every 6 minutes in 10 hour work day" frame would allow to investigate. I just wonder if maybe you're pasting it as is and don't care about correctness if the happy path sort of works. Speaking of subsets, coders who did that before AI were also quite a group.
There must be something that explains the difference in our experiences. Apologies for the fact that my only idea is kinda negative. I understand the potential hyperbola here, but it doesn't explain much. I can stand AI BS once a day, maybe twice, before uncontrollably cursing into the chat.
Why not write tests with AI, too? Since using LLMs as coding assistants, my codebases have much more thorough documentation, testing and code coverage.
Don't start when you're already in a buggy dead-end. Test-driven development with LLMs should be done right from the start.
Also keep the code modular so it is easy to include the correct context. Fine-grained git commits. Feature-branches.
All the tools that help teams of humans of varying levels of expertise work together.
You may have enough expertise in your field that when you have a question, you know where to start looking. Juniors and students encounter dozens of problems and question per hour that fall into the unknown unknown category
This isn't about if LLMs are useful, it's about how useful can they become. We are trying to understand if there is a path forward to transformative tech, or are we just limited to a very useful tool.
It's a valid conversation after ~3 years of anticipating the world to be disrupted by this tech. So far it has not delivered.
Wikipedia did not change the world either, it's just a great tool that I use all the time
As for software, it performs ok. I give up on it most of the time if I am trying to write a whole application. You have to acquire a new skill, prompt engineering, and feverish iteration. It's a frustrating game of whack-a-mole and I find it quicker to write the code myself and just have the LLM help me with architecture ideas, bug bashing, and it's also quite good at writing tests.
I'd rather know the code intimately so I can more quickly debug it than have an LLM write it and just trust it did it well.
Peter Thiel talked about this years ago in his book From 0 to One. His key insight, which we're seeing today, is that AI tools will work side-by-side with people and enhance their productivity to levels never imagined. From helping with some basic tasks ("write an Excel script that transforms this table from this format to this new format") to helping write programs, it's a tool that aids humans in getting more things done than previously possible.
These are different social contingents I think. At least for me I was super on board with wikipedia because as you say the use to me was immediate and certain. AI I have tried every few months for the last two years but I still haven't found a strong use for it. It has changed nothing for me personally except making some products I use worse.
Cursor has been quite the jaw-dropping game changer for me for greenfield hobby dev.
I don't know how useful it would be for my job, where I do maintenance on a pretty big app, and develop features on this pretty big app. But it could be great, I just don't know because work only allows Copilot. And Copilot is somewhere between annoying and novelty in my opinion.
Generally people are resistant to change and the average person will typically insist new technologies are pointless.
Electricity and the airplane were supposed to be useless and dangerous dead ends according to the common person: https://pessimistsarchive.org/
But we all like to think we have super unique opinions and personalities, so "this time it's different."
When the change finally happens, people go about their lives as if they were right all along and the new technology is simply a mysterious and immutable fixture of reality that was always there.
There is a vast difference between arguments like "Phones have been accused of ruining romantic interaction and addicting us to mindless chatter" and "current AI has problems generating accurate information and can't replace researching things by hand for complicated or niche topics and there is reason to believe that the current architecture may not solve this problem"
That aside optimist are also not always right, otherwise we would have cold fusion already and have a base on mars.
> But we all like to think we have super unique opinions and personalities, so "this time it's different."
Are you suggesting that anything which is hyped is the future? Like, for every ten heavily-hyped things, _maybe_ one has some sort of post-hype existence.
Segway seems to have hardly been a dead end, or useless for that matter. Segway-style devices like the electric unicycle and many other light mobility devices seem to be direct descendants of the Segway. Segway introduced gyroscopes to the popular tech imagination, at least in my lifetime (not sure before).
The pessimist is not wrong. In fact he's right more frequently than wrong. Just look at a long list of inventions. How many of them were so successful as the car or the airplane? Most of them were just passing fads that people don't even remember anymore. So if you're asking who is smarter, I would say the pessimist is closer to the truth, but the optimist who believed in something that really became successful is now remembered by everyone.
I feel your argument relies on assuming that being an optimist or pessimist means believing 100% or 0%, whereas I'd claim it's instead more just having a relative leaning in a direction. Say after inspecting some rusty old engines a pessimist predicts 1/10 will still function and an optimist predicts 4/10 will function. If the engines do better than expected and 3/10 function, the optimist was closer to the truth despite most not working.
Similarly, being optimistic doesn't mean you have to believe every single early-stage invention will work out no matter how unpromising - I've been enthusiastic about deep learning for the past decade (for its successes in language translation, audio transcription, material/product defect detection, weather forecasting/early warning systems, OCR, spam filtering, protein folding, tumor segmentation, spam filtering, drug discovery and interaction prediction, etc.) but never saw the appeal of NFTs.
Additionally worth considering that the cost of trying something is often lower than the reward of it working out. Even if you were wrong 80% of the time about where to dig for gold, that 20% may well be worth it; reducing merely the frequency of errors is often not logically correct. It's useful in a society to have people believe in and push forward certain inventions and lines of research even if most do not work out.
I think xvector's point is about people rehashing the same denunciations that failed to matter for previous successful technologies - the idea that something is useless because it's not (or perhaps will never be) 100.0% accurate, or the "Until it can do dishes, home computer remains of little value to families"[0] which I've seen pretty much ad verbatim for AI many times (extra silly now that we have dishwashers).
Given in real life things have generally improved (standard of living, etc.), I think it has typically been more correct to be optimistic, and hopefully will be into the future.
This argument is very prone to survivorship bias. Of course, when we think back to the hyped technologies of the past we are going to remember mostly those that justified the hype. The failures get forgotten. The memory of social discourse fades extremely quickly, much faster than, for example, pop culture or entertainment.
LLMs obviously have use cases but the market has practically priced in "AGI".
The danger is not that LLMs take jobs. The danger is that we are in a massive bubble and while these are nice tools they are not worth anything close to the trillions of dollars bet on them.
IMO the psychology at work here is basically denial that we can both be in the biggest bubble of all time in terms of dollars and LLMs are useful. Just not THAT useful.
Why should we care about whether we're in a market bubble or not, especially if one does not have their own money staked in the bubble? It's somebody else's capital that's at risk, no? If they're wrong, let them reap the consequences.
(cue rebuttal based on systemic consequences / financial bailouts etc, but you know what I mean; also, the dotcom bubble deflation didn't require a bailout)
Have you tried a few? If so, which do you prefer? If not, which do you use? I'm a little late to the party, and the current amount of choices is quite intimidating.
I imagine you're asking about coding help. For that, I think you should qualify any answer you get with the user's most commonly used language (and framework, if applicable).
In my experience, Claude Sonnet 3.5 (3.6?) has been unbeatable. I use it for Rust. Making sense of compiler errors, rubberducking, finding more efficient ways to write some function and, truth be told, some times just plain old debugging. More than once, I've been able to dump a massive module onto the chat context and say "look, I'm experiencing this weird behavior but it's really hard to pin down what's causing it in this code" and it pointed to the exact issue in a second. That alone is worth the price of admission.
Way better than ChatGPT 4o and o-1, in my experience, despite me saying the exact opposite a few months ago.
If you're sitting in front of the keyboard, inputting instructions and running the resulting programs, yes you are still a coder. You're just move another layer up on the stack.
The same type of argument has been made for decades -- when coders wrote in ASM, folks would ask "are you still a coder when you use that fancy C to make all that low-level ASM obsolete?". Etc etc.
Not outsourcing at all - you're are an engineer using the tools that make sense to solve a problem. The core issue with identifying as just a coder is that code is just one of many potential tools to solve a problem.
So your customer/employer is a coder too. They want solve a problem and use a tool: You.
A coder writes code in a programming language, that what distinguishes them from the customers who use natural language. The coder is the translator between the customer and the machine. If the machine does that, the machine is the coder.
Is your customer bringing you the solution to the problem or the problem and asking you to solve the problem? One is a translation activity and the other isn't.
You sound like the guy I just had to fire after blowing his toes off several times.
If you think you are obsolete or faster than anyone else with these tools then you only naive enough to have lost your objectivity to the marketing. I deal with real risk and failures from the output of ChatGPT which have serious financial consequences. The first victim is always the developer, then the tester.
At best, it is very good at ousting people who shouldn't be allowed anywhere near a damn computer.
We've got a senior dev who uses ChatGPT for his code all the time. Right now I am currently fixing all the exceptions 'his code' pops. Well, I shouldn't say what his code pops. The code ChatGPT generated for him. He just asks ChatGPT, copy pasta, doesn't even run it and checks it in.
How would the a 12 year old with ChatGPT recognize complicateed errors?
You still need experience to check the code not only the result otherwise you get this
I've found it depends on the context (pardon the pun)
For example, personal projects that are small and where copilot has access to all the context it needs to make a suggestion - such as a script or small game - it has been really useful.
But in a real world large project for my day job, where it would need access to almost the entire code base to make any kind of useful suggestion that could help me build a feature, it's useless! And I'd argue this is when I need it.
It can ingest the entire codebase (up to its context length), but for some reason, I’ve always had much higher quality chats with smaller bite-sized pieces of code.
Autocomplete distracts me enough that it really needs to be close to 100% correct before it's useful. Otherwise it's just wrecking my flow and slowing me down.
Exponentially? Absolutely not. In the best case it creates something that’s almost useful. Are you working on large actual codebases or talking about some one off toy apps?
I spend most of my time thinking about what I'm trying to do and how to best achieve it, so code completion can only make me marginally more productive. If the tool can guess a large chunk of what I've decided to do, sure, that's nice, but at the end of the day it still only adds up to a couple minutes at best.
You could try aider, or another tool/workflow where you provide whole files and ask for how they should be changed - very different from code completion type tools!
But please accept that you are in a small subset of people that it is very useful to. Every time I hear someone championing AI, it is a coder. AI is basically useless to me, it is just a convoluted expensive google search.
Particularly given the article's target is "systems based on large neural networks" and not specifically LLMs, I'd claim there are a vast number of uncontroversially beneficial applications: language translation, video transcription, material/product defect detection, weather forecasting/early warning systems, OCR, spam filtering, protein folding, tumor segmentation, drug discovery and interaction prediction, etc.
it's _extremely_ useful for lawyers, arguably even more so than for coders, given how much faster they can do stuff. They're also extremely useful for anyone who writes text and wants a reviewer. Also capable to execute most daily activities of some roles, such as TPMs.
It's still useful to a small subset of all those professions - the early adopters. Same way computers were useful to many professionals before the UI (but only a small fraction of them had the skillset to use terminals)
I think the big mistake is _blindly relying on the results_ - although that problem has been improving dramatically (gpt3.5 hallucinated constantly, I rarely see a hallucination w/ the latest gpt/claude models)
How do you get the LLM to the point where it can draft a demand letter? I guess I'm a little confused as to how the LLM is getting the particulars of the case in order to write a relevant letter. Are you typing all that stuff in as a prompt? Are you dumping all the case file documents in as prompts and summarizing them, and then dumping the summaries into the prompt?
Demand letters are the easiest. Drag and drop police report and medical records. Tell it to draft a demand letter. For most things, there are only a handful critical pages in the medical records, so if the original pdf is too big, I’ll trim excess pages. I may also add my personal case notes.
I use a custom prompt to adjust the tone, but that’s about it.
multiple lawyer friends I know are using chatgpt (and custom gptees) for contract reviews. They upload some guidelines as knowledge, then upload any new contract for validation. Allegedly replaces hours of reading. This is a large portion of the work, in some cases. Some of them also use it to debate a contract, to see if there's anything they overlooked or to find loopholes. LLMs are extremely good at that kind of constrained creativity mode where they _have_ to produce something (they suck at saying "I dont know" or "no"), so I guess it works as sort of a "second brain" of sorts, for those too.
There's even reported cases of entire legislations being written with LLMs already [1]. I'm sure there's thousands more we haven't heard about - the same way researchers are writing papers w/ LLMs w/o disclosing it
Five years later, when the contract turns out to be defective, I doubt the clients are going to be _thrilled_ with “well, no, I didn’t read it, but I did feed it to a magic robot”.
It only has to be less likely to cause that issue than a paralegal to be a net positive.
Some people expect AI to never make mistakes when doing jobs where people routinely make all kinds of mistakes of varying severity.
It’s the same as how people expect self-driving cars to be flawless when they think nothing of a pileup caused by a human watching a reel while behind the wheel.
My understanding is the firm operating the car is liable, in the full self driving case of commercial vehicles (waymo). The driver is liable in supervised self driving cases (privately owned Tesla)
> Every time I hear someone championing AI, it is a coder
The argument I make is why aren’t more people finding ways to code with AI?
I work in a leadership role at a marketing agency and am a passable coder for scripts using Python and/or Google Apps Scripts. In the past year, I’ve built more useful and valuable tools with the help of AI than I had in the 3 or so years before.
We’re automating more boring stuff than ever before. It boggles my mind that everybody isn’t doing this.
In the past, I was limited by technical ability because my knowledge of our business and processes was very high. Now I’m finding that technical ability isn’t my limitation, it’s how well I can explain our processes to AI.
Interesting, I'm the opposite now. Why would I click a couple links to read a couple (verbose) blog posts when I can read a succinct LLM response. If I have low confidence in the quality of the response then I supplement with Google search.
I feel near certain that I am saving time with this method. And the output is much more tuned to the context and framing of my question.
Hah, take for example my last query in ChatGPT:
> Are there any ancient technologies that when discovered furthered modern understanding of its field?
ChatGPT gave some great responses, super fast. Google also provides some great results (though some miss the mark), but I would need to parse at least three different articles and condense the results.
To be fair, ChatGPT gives some bad responses too.. But both an LLM and Google search should be used in conjunction to perform a search at the same time.
Use LLMs as breadth-first search, and Google as depth-first search.
I'd argue that's just because coders are first to line up for this.
There was a different thread on this site i read where a journalist used the wrong units of measurement (kilowatts instead of killowatt-hours for energy storage). You could paste the entire article into chatGPT with a prompt "spot mistakes in the following; [text]" and get an appropriate correction for this and similar mistakes the author made.
As in there's journalists right now posting articles with clear mistakes that could have been proof read more accurately than they were if they were willing to use AI. The only excuse i could think of is resistance to change. A lot of professions right now could do their job better if they leant on the current generation of AI.
Yesterday ChatGPT helped me to elaborate a skincare routine for my wife with multiple serums and creams that she received for Christmas.
She and I had no idea when to apply, how to combine or when to avoid combination of some of those products.
I could have google it myself in the evenings and had the answer in a few days of research, but with o1 in a 15min session my wife had had a solid weekly routine, the reasoning about those choices and academic papers with research about those products. (Obviously she knows a lot about skincare in general, so she had the capacity to recognize any wrong recommendation).
Nothing game changer, but is great to save lots of time in this kind of tasks.
It's 2 days after Christmas, too early to know the impact of the purchases made based on what AI recommended, either positive or negative.
If you're relying on AI to replace a human doctor trained in skin care or alternatively, your Google skills; please consider consulting an actual doctor.
If she "knows a lot about skincare in general, so she had the capacity to recognize any wrong recommendation", then what did AI actually accomplish in the end.
>> It's 2 days after Christmas, too early to know the impact of the purchases made based on what AI recommended, either positive or negative.
No worries, I can tell you what to expect: nothing. No effect. Zilch. Nada. Zero. Those beauty creams are just a total scam and that's obvious from the fact they're targetted just as well to women who don't need them (young, good skin) as to ones who do (older, bad skin).
About the only thing the beauty industry has figured out really works in the last five or six decades is Tretinoin, but you can use that on its own. Yet it's sold as one component in creams with a dozen others, that do nothing. Except make you spend money.
Forgot to say: you can buy Tretinoin at the pharmacy, over the counter even depending on where you are. They sell it as a treatment for acne. It's also shown to reduce wrinkles in RCTs [1]. It's dirt cheap and you absolutely don't need to buy it as a beauty cream and pay ten times the price.
_____________
[1] Topical tretinoin for treating photoaging: A systematic review of randomized controlled trials (2022)
It's a teratogen causing birth defects and miscarriages, so severe that "women of child bearing age taking isotretinoin are required to register for the iPLEDGE program. The iPLEDGE program requires that women taking isotretinoin undergo frequent pregnancy tests and commit to using two (2) forms of birth control in order to prevent themselves from getting pregnant."[1]
From Wikipedia[2]: "Isotretinoin is a teratogen; there is about a 20–35% risk for congenital defects in infants exposed to the drug in utero, and about 30–60% of children exposed to isotretinoin prenatally have been reported to show neurocognitive impairment".
See also pages like r/AccutaneRecovery[3] for people harmed by using it for acne, reporting systemic damage, perhaps permanent damage.
Scroll down[1] for the picture of some of the possible side effects of oral Accutane/Isotretinoin on the mother[3] and note that Wikipedia says "the most common adverse effects are dry lips (cheilitis), dry and fragile skin (xeroderma), dry eyes[8] and an increased susceptibility to sunburn" and wonder how a beauty treatment which improves skin condition has most common side effects which ruin skin condition.
This line of inquiry leads to a fun conspiracy/woo hypothesis; Grant Genereux[5]'s claims that what it does is trigger stem cells to differentiate in the epithelial layers of the skin, which makes thicker skin in the short term (wrinkle free) and worn out stem cells and thick skin in the longer term - and that many small vessels in the body have an epithelial lining of 'internal skin' and that thickens by the same mechanism leading to narrowing and closing of all kinds of internal vessels - tear ducts and sweat glands and blood vessels and inside the kidneys and liver and inner ear, etc. which cause the dry skin and dry eyes "side effects" (direct effects really) seen outside, and the organ damage/dizziness/etc. seen inside. And that it's a teratogen by getting inside cells, damaging them, damaging the DNA/protein building mechanisms causing wider systemic damage which can be long term and is not cleared up by stopping taking Accutane, this is misunderstood as retinoids "mediating hundreds of gene expressions" but is really shotgun chaotic damage and that's why there isn't a single symptom to look for and how it gets diagnosed as many different organ-specific diseases instead of retinoid toxicity damage. And/or causing cellular apoptosis with immune system response to a percieved 'attack', which is then seen as organ damage with immune system activity present, and misdiagnosed as "autoimmune" where the immune system has decided to attack an organ for no reason, which is why autoimmune disorders never have treatments or cures and why they cluster (people with one often get more) despite no good reason that should happen.
And that this whole collection of behaviours is triggered by food with Vitamin A (retinol in the tretinoin family) in it such as dairy and meat fat and Cod Liver oil, and foods with Beta Carotene (retinoids in the same family) such as orange/yellow/dark green coloured fruits and vegetables, and fortified Vitamin A in low-fat dairy and flours and other products through the USA/Europe. And it doesn't take much more than the RDA of Vitamin A to become problematic, and once it builds up in the body beyond the level the body can handle over a few decades, it's like blue touch paper waiting to be lit. Which, he suggests, is why auto-immune disorders cluster together (if you get one, you likely get more), why Eastern Canada Prince Edward Island near a Cod Liver Oil refinery was the highest incidence of Alzheimers in the world and that has been dropping since the refinery closed, and many more connection-between-retinoids-and-disease-states including claims by other people[6].
(I called it a 'fun' idea - it is at least fun along the lines of Tyler Vigen's spurious correlation noticer. https://www.tylervigen.com/spurious-correlations even if the main idea is not true).
[6] https://nutritionrestored.com/blog-forum/topic/the-known-his... - blog post about paper observing hypervitaminosis-A bone damage in fossil skeletons, articles observing hysteria in Eskimo women, speculated to be caused by Vitamin A toxicity from Atlantic fish liver (callback to Atlantic coast cod-liver oil refinery), connection between hypervitaminosis A and calcified arteries, connection between hypervitaminosis A and scoliosis, hypervitaminosis A in small animals causes bone growth problems and symptoms of depression.
It's being used in drive through windows.
In movies, in graphic design, pod casts, music, etc... 'entertainment' industry.
And HN, it isn't just a few odd balls on HN championing it. I wish there was way to get a sentiment analysis of HN, it seems there are lot more people using it than not using it.
And, what about the silent majority, the programmers that don't hang out on HN? I hear colleagues talk about it all the time.
The impact is here, whether they are self directed or not, or whether there are still a few people not using it.
I also use it for plant care tips. What should I feed this plant and what kind of soil to use and all the questions I never bothered to Google and crawl through some long blog article on
These are not categories that needed this change or benefit from it. Specific plant care is one of the easiest things to find information about. And are you serious you couldn't find a pancake recipe? The coffee machine idk it depends on what you did. But the other two are like a parody of AI use cases. "We made it slightly more convenient, but it might be wrong now and also burns down a tree every time you use it."
> "We made it slightly more convenient, but it might be wrong now and also burns down a tree every time you use it."
Sounds like early criticisms of the internet. I assume you mean he should be doing those things with a search engine, but maybe we shouldn't allow that either. Force him to use a book! It may be slightly less convenient, and could still be wrong, but...
Before crypto and AI computing in general and the internet in particular were always an incredible deal in terms of how much societal value we get out of it for the electricity consumed.
In my bubble coders find LLMs least useful. After all we already have all kinds of fancy autocomplete that works deterministically and doesn't hallucinate - and still not everyone uses it.
When I use LLMs, I use it exactly as Google search on steroids. It's great for providing a summary on some unknown topic. It doesn't matter if it gets it wrong - the main value is in keywords and project names, and one can use the real Google search from there.
And it isn't expensive if you are using the free version
Do you not use it to try learning new things? I use it to help get familiar with new software (recently for FreeCAD), or new concepts (passive speaker crossover design).
She uses gpt as an editor for her emails and web content. she’ll just say "improve this," she gets options for how she might say something in a different way.
When preparing for a summit, she gave it a list of broad topics she wanted to cover. Gpt generated a list of specific titles and descriptions for her talks. This in turn gave her specific ideas to write talks about instead of just the broad topic.
When she wasn’t sure about the sequence of her talks, she asked GPT for advice on the order. GPT suggested an arrangement that created a logical flow and the reasoning for that flow, which ended up being pretty good.
She often uses gpt as a sounding board for ideas. She said she likes having an always-available colleague to bounce thoughts off of.
I'd call it a working google search, unlike, you know, google these days.
Actually google's LLM-based search results have been getting better, so maybe this isn't the end of the line for them. But for sophisticated questions (on noncoding topics!) I still always go to chatgpt or claude.
> google's LLM-based search results have been getting better
don't worry, Google WILL change this because they don't make money when people find the answer right away. They want people to see multiple ads before leaving the site.
Walk into any random coffee shop in America where people are working on their laptops and you will see some subset of them on ChatGPT. It’s definitely not just coders who are finding it useful.
Most of the comments here are responding to the title by discussing whether current AI represents intelligence at all, but worth noting that the author’s concerns all apply to human brains too. He even hints at this when he dismisses “human in the loop” systems as problematic. Humans are also unreliable and unverifiable and a security nightmare. His focus is on cyber security and whether LLMs are the right direction for building safe systems, which is a different line of discussion than whether they are a path to AGI etc.
Author here. Nope, that is not my concern about humans in the loop. My concern on that is that any human in the loop has to reconstruct by themselves from the inputs whether the output makes sense, the system provides no significant help. "Explaining my reasoning" LLMs are potentially a step forward in that.
You are right that I'm not talking about AGI here, rather about safe systems.
AI is only a dead end if you expect it to function deterministically. In the same way as people, it's not rational, and it can't be made rational.
For example, the only effective way to get an AI not to talk about Bryan Lunduke is to have an external layer that scans for his name in the output of an AI, if found, stops the session and prints an error message instead.
If you're willing to build systems around it (like we do with people) to limit it's side effects and provide sanity checks, and legality checks like those mentioned above, it can offer useful opinions about the world.
The main thing to remember is that AI is an alien intelligence. Each new model is effectively the product of millions of dollars worth of forced evolution. You're getting Stitch from "Lilo and Stitch", and you'll never be sure if it's having a bad day.
Or modern mechanical engineers getting all pissy about "tolerances." Look, we shipped you a big box of those cheap screws, so just keep trying a different one until each motor sticks together.
Also, is there a known deterministic intelligence? Only very specific computer programs can be made deterministic, and even that has taken quite a while for us to nail down. A lot of code and systems of code produced by humans today is not deterministic and it takes a lot of effort to get it there. For most people and teams its not even on their radar or worth the effort to get it there.
> For example, the only effective way to get an AI not to talk about Bryan Lunduke is to have an external layer that scans for his name in the output of an AI, if found, stops the session and prints an error message instead.
> If you're willing to build systems around it (like we do with people) to limit it's side effects and provide sanity checks,
I don't think that comparison holds up. We do build systems around people, but people also have internal filters, and most are able to use them to avoid having to interact with the external ones. You seemed to state that AI's don't (can't?) have working internal filters and rely on external ones.
Imagine if everyone did whatever they wanted all the time and cops had to go around physically corralling literally everyone at all times to maintain something vaguely resembling "order." That would be more like a world filled with animals than people, and even animals have a bit more reasoning than that. That's where we are with AI, apparently.
> Imagine if everyone did whatever they wanted all the time and cops had to go around physically corralling literally everyone at all times to maintain something vaguely resembling "order."
I don't need to imagine anything. I live on Earth in America and to my mind you've very accurately described the current state of human society.
For the vast majority of humans this is how it works currently.
The amount of government, military, and police and the capital, energy, and time to support all of that in every single country on earth is pretty much the only thing holding up the facade of "order" that some people seem to take for granted.
> For the vast majority of humans this is how it works currently.
No it is not. Like I said, everyone knows everyone has an internal "filter" on what you say (and do). The threat of law enforcement may motivate everything (if you want to be edgy with how you look at it), but that is not the same thing as being actively, physically corrected at every turn, which is what the analogy in question lines up with.
As evident by most to all the "AI-hiring platforms", it's not about solving a problem successfully, but using the latest moniker/term/sticker to appear as if you solve the problem successfully.
In reality, neither the client nor the user base have access to the ground truth of these "AI system"s to determine actual reliability and efficiency.
That's not to say there aren't some genuine ML/AGI companies like DeepMind (which solve specific narrow problems with quite high confidently), but most of the "AI" companies feel like they are coming from Crypto and are now selling little more than vaporware in the AI gold rush.
I always find this to be a false dichotomy. I'm not sure what use cases are a good fit for generative AI models to tackle without human supervision. But there are clearly many tasks where the combination of generative AI with human direction is a big productivity boon.
"Making fewer mistakes" implies that there's a framework within which the agent operates where its performance can be quickly judged as correct or incorrect. But, computers have already automated many tasks and roles in companies where this description applies; and competitive companies now remain capitalistically competitive not because they have stronger automation of boolean jobs, but because they're better configured to leverage human creativity in tasks and roles performance in which cannot be quickly judged as correct or incorrect.
Apple is the world's most valuable company, and many would attribute a strong part of their success to Jobs' legacy of high-quality decision-making. But anyone who has worked in a large company understands that there's no way Apple can so consistently produce their wide range of highly integrated, high quality products with only a top-down mandate from one person; especially a dead one. It takes thousands of people, the right people, given the right level of authority, making high-quality high-creativity decisions. It also, obviously, takes the daily process, an awe-inspiring global supply chain, automation systems, and these are areas that computers, and now AI, can have a high impact in. But that automation is a commodity now. Samsung has access to that same automation, and they make fridges and TVs; so why aren't they worth almost four trillion dollars?
AI doesn't replace humans; it, like computers more generally before it, brings the process cost of the inhuman things it can automate to zero. When that cost is zero, AI cannot be a differentiating factor between two businesses. The differentiating factors, instead, become the capital the businesses already have to deploy (favoring of established players), and the humans who interact with the AI, interpreting and when necessary executing on its decisions.
Those business people still can't quantify what their skilled workers do for then though, so they hastily conclude the AI is a suitable or even improved replacement.
“A computer can never be held accountable. Therefore, a computer must never make a management decision.”
There are lots of bullshit jobs that we could automate away, AI or no. This is far from a new problem. Our current "AI" solutions promise to do it cheaper, but detecting and dealing with "hallucinations" is turning out to be more expensive than anticipated and it's not at all clear to me that this will be the silver bullet that the likes of Sam Altman claims it will be.
Even if the AI solution makes fewer mistakes, the magnitude of those mistakes matter. The human might make transcription errors with patient data or other annoying but fixable clerical errors, while the AI may be perfect with transcription but make completely sensible sounding but ultimately nonsense diagnosis, with dangerous consequences.
1953 IBM also thought that "there is a world market for maybe five computers," so I am not sure their management views are relevant this many decades later.
Philosophically the point still stands. If you delegate your management decisions to a computer and someone dies you can't put the computer in jail for murder. Ultimately a person must be responsible and that means you can't fully automate it unless you have perfect trust in the machine.
It is only irrelevant in the degree to which companies have been able to skirt laws and literally get away with murder.
I asked ChatGPT to replace "current AI" and synonyms with "HUMANS" and I'm satisfied. My favorite revised sentences:
"Does HUMANS represent a dead end?"
"HUMANS should not be used for serious applications."
"HUMANS are unmanageable, and as a consequence their use in serious contexts is irresponsible."
"HUMANS have no internal structure that relates meaningfully to their functionality."
"HUMANS have input and state spaces too large for exhaustive testing."
"HUMANS do not allow verification by parts (unit testing, integration testing, etc)."
"HUMANS have faults, but even their error behaviour is likely emergent, and certainly hard to predict or eradicate."
"HUMANS have no model of knowledge and no representation of any ‘reasoning.’"
"HUMANS represent a dead end, where exponential increases of training data and effort will give us modest increases in impressive plausibility but no foundational increase in reliability."
"HUMANS cannot be developed, or reused, as components."
"There is no possibility for stepwise development — using either informal or formal methods — for HUMANS."
and my favorite:
"In my mind, all this puts even state-of-the-art HUMANS in a position where professional responsibility dictates the avoidance of them in any serious application."
As a neuroscientist, my biggest disagreement with the piece is the author’s argument for compositionality over emergence. The former makes me think of Prolog and lisp, while the later is a much better description for a brain. I think ermergence is a much more promising direction for AGI than compositionality.
Author here. So what! I am not talking about promising directions for AGI, I am talking about having computer systems that we can have confidence in. Sure, AGI if it ever happens will look more like emergence than compositionality, and I'm sure it won't feel a need to explain to us fallible humans why its decisions are correct. In the meantime, I'd like computer systems to be manageable, reliable, transparent, and accountable.
100% agree. When we explicitly segment and compose AI components, we are removing the ability for them to learn their own pathways between the components. We've been proven time and time again the bitter lesson[1]: that throwing a ton of data and compute at a model yields better results than what we could come up with.
That said, we can still isolate and modify parts of a network, and combine models trained for different tasks. But you need to break things down into components after the fact, instead of beforehand, in order to get the benefits of learning via scale of data + compute.
There are strong signals that continuing to scale up in data is not yielding the same reward (Moore's Law anyone?) and it's harder to get quality data to train on anyway.
Business Insider had a good article recently on the customer reception to Copilot (underwhelming: https://archive.fo/wzuA9). For all the reasons we are familiar with.
My view: LLMs are not getting us to AGI. Their fundamental issues (black box + hallucinations) won't be fixed until there are advances in technology, probably taking us in a different direction.
I think it's a good tool for stuff like generating calls into an unfamiliar API - a few lines of code that can be rigorously checked - and that is a real productivity enhancement. But more than that is thin ice indeed. It will be absolutely treacherous if used extensively for big projects.
Oddly, for free flow brainstorming like associations, I think it will be a more useful tool than for those tasks for which we are accustomed to using computers, required extreme precision and accuracy.
I was an engineer in an AI startup, later acquired.
> Their fundamental issues (black box + hallucinations)
Aren’t humans also black boxes that suffer from hallucinations?
E.g. for hallucinations: engineers make dumb mistakes in their code all the time, normal people will make false assertions about geopolitical, scientific and other facts all the time. c.f. The Dunning Kruger effect.
And black box because you can only interrogate the system at its interface (usually voice or through written words / pictures)
It roams around the internet, synthesizing sentences that kind of looks the same from the source material, correct me if I'm wrong?
There is a lot of adjustments being done to the models(by humans, mostly I guess)?
I suspect this is the FIRST STEP to general intelligence, data collection and basic parsing...
I suspect there is not a thing called "reasoning" - but a multi step process...
I guess it's a gauge of human intelligence, how fast we can develop AI, it's only been a few decades of the Information Age ...
I've come around to thinking of our modern "AI" as a lossy compression engine for knowledge. When you ask a question it is just decompressing a tiny portion of the knowledge and displaying it for you, sometimes with compression artifacts.
This is why I am not worried about the "AI Singularity" like some notable loudmouth technologists are. At least not with our current ML technologies.
That is exactly how I think about it. It’s lossy compression. Think about how many petabytes of actual information any of these LLMs were trained on. Now look at the size of the resultant model. Its orders of magnitude smaller. It made it smaller by clipping the high frequency bits of some multi-billion dimension graph of knowledge. Same basic you do with other compression algorithms like JPEG or MP3.
These LLM’s are just lossy compression for knowledge. I think the sooner that “idea” gets surfaced people will find ways to train models with fixed pre-computed lookup tables of knowledge categories and association properties… basically taking a lot of the randomness out of the training process and getting more precise about what dimensions of knowledge and facts are embedded into the model.
… or something like that. But I don’t think this optimization will be driven by the large well funded tech companies. They are too invested in flushing money down the drain with more and more compute. Their huge budget blind them to other ways of doing the same thing with significantly less.
The future won’t be massive large language models. They’ll be “small language models” custom tuned to specific tasks. You’ll download or train a model that has incredible understanding of Rust and Django but won’t know a single thing about plate tectonics or apple pie recipes.
Why wouldn't we have a small language model for python programming now though?
That is an obvious product. I would suspect the reason we don't have a small language python model is because the fine tuned model is no better than the giant general purpose model.
If that is the case it is not good. It even makes me wonder that we are not really compressing knowledge but a hack to create the illusion of compressing knowledge.
With a bit (OK, a lot) of reinforcement learning that prioritizes the best chains-of-thoughts, this compression engine becomes a generator of missing training data on how to actually think about something instead of trying to come up with the answer right away as internet text data suggests it should do.
That's the current ML technology. What you've described is the past. About 4 year old past to be precise.
If you think that "compression" somehow means "non-intelligent", consider this:
The best compression of data that is theoretically achievable (see Kolmogorov complexity) is an algorithm that approximates process that produces the data. And which process produces texts on the internet? Activity of the human brain. (I described it a bit sloppily. We are dealing with probability distribution of the data, not the data itself. But the general idea still holds.)
Using chain-of-thought removes the constraint that the output of the resultant algorithm should use fixed amount of compute per token.
> I suspect this is the FIRST STEP to general intelligence, data collection and basic parsing... I suspect there is not a thing called "reasoning" - but a multi step process... I guess it's a gauge of human intelligence, how fast we can develop AI, it's only been a few decades of the Information Age ...
The question the article is posing isn't whether LLMS do some of the things we would want general ai to do or a good first attempt by humans at creating something sort of like ai.
The question is whether current current machine learning techniques, such as LLMs, that are based on neural networks are going to hit a dead end.
I don't think that's something anyone can answer for sure.
LLMs, by themselves, are going to hit a dead end. They are not enough to be an AGI, or even a true AI. The question is whether LLMs can be a part of something bigger. That, as you say, is not something anyone can currently answer for sure.
Interesting article. My main criticism is that, given ChatGPT is already used by hundreds of millions of people every day, it's difficult to argue that current AI is a dead end. It has its flaws, but it is already useful in human-in-the-loop situations. It will partly or completely change the way we search for information on the internet and greatly enhance the ability to educate ourselves on anything. This is essentially a second Wikipedia moment. So, it is useful in its current form, to some extent.
It is certainly changing the way I search for information on the internet. Now there are lot of people, who instead of staying silent, post wildly wrong answers from an LLM. On a question or subject they themselves are not familiar with.
Some answers are very long and just re-state the previous comments as if they were explaining simple concepts to a five year old child.
In short: in the hands of humans, tools like ChatGPT cause exponential growth of spam, engagement farming and malicious disinformation and propaganda in internet. I fear these negative use cases are growing exponentially faster than the useful parts. We will all drown in AI manure.
2 years in we are still going and it is still becoming more and more useful. Keep in mind, things like multimodality are still in early stages so improvements are coming. It probably won't lead to AGI but still, these tools are doomed to become more and more useful. https://open.substack.com/pub/transitions/p/here-is-why-ther...
I use it for fast documentation of unknown (to me) APIs and other pieces of software. It's saved me hours of time, where I didn't have to go through the developers site/documentation and I get quickly get example code.
Would I use the code directly in production? No. I always use it as an example and write my own code.
Whenever a new technology emerges, along with it always emerge naysayers who claim that the new technology could never work --- while it's working right in front of their noses. I'm sure there were people after Kitty Hawk who insisted that heavier than air flight would never amount to much economically. Krugman famously insisted in the 90s that the internet would never amount to anything. These takes are comical in hindsight.
The linked article is another one of these takes. AI can obviously reason. o3 is obviously superhuman along a number of dimensions. AI is obviously useful for software development. This guy spend 20 years of his life working on formal methods. Of course he's going to poo-poo the AI revolution. That doesn't make him right.
> Whenever a new technology emerges, along with it always emerge naysayers who claim that the new technology could never work
There's some survivorship bias going on here – you only consider technologies which succeeded, and find examples of people scrutinising them beforehand. However, we know that not every nascent technology blossoms; some are really effective, but can't find adopters; some are ahead of their time; some are cost-prohibitive; and some are outright scams.
It's not a given that every promising new technology is a penicillin – some might be Theranos.
Author here. Putting a lot of words in my mouth there. In particular, I don't talk about whether AI is useful for software development - I talk about whether AI is useful as reliable software. I don't discuss how AI's abilities relate to human abilities. I don't discuss whether what AI currently does counts as "reasoning".
i'm so confused by these discussions around hitting the wall.
sure, a full-on AGI, non-hallucinating AI would be great. but the current state is already a giant leap. there's so much untapped potential in the corporate world where whole departments, processes, etc can be decimated.
doing this and dealing with the socio-economic and political fall-out from those efficiency leaps can happen while research (along multiple pathways) goes on, and this will take 5-10 years at least.
Just because transformer-based architectures might be a dead end (in terms of how far they can take us toward achieving artificial sentience), and the outcome may not be mathematically provable, as this author seems to want it to be, does not mean that the technology isn't useful.
Even during the last AI winter, previous achievements such as Bayesian filtering, proved useful in day to day operation of infrastructures that everyone used. Generative AI is certainly useful as well, and very capable of being used operationally.
It is not without caveats, and the end goals of AI researchers have not been achieved, but why does that lessen the impact or usefulness of what we have? It may be that we can iterate on transformer architecture and get it to the point where it can help us make the next big leap. Or maybe not. But either way, for day to day use, it's here to stay, even if it isn't the primary brain behind new research.
Just remember that the only agency that AI currently has is what we give it. Responsible use of AI doesn't mean "don't use AI", it means, "don't give it responsibility for critical systems that it's ill equipped to deal with". If that's what the author means by "serious applications", then I'm on board, but there are a lot of "serious applications" that aren't human-life-critical, and I think it's fine to use current AI tech on a lot of them.
The author declares that "software composability" is the solution as though that is a given fact. Composability is as much a dead end as the AI he describes. Decades of attempts at formal composability have not yielded improvements in software quality outside of niche applications. It's a neat idea, but as you scale the complexity explodes making such systems as opaque and untestable as any software. I think the author needs to spend more time actually writing code and less time thinking about it.
No, they are very useful tools to build up inteligent systems out of.
Everything from perplexity onward shows just how useful agents can be.
You get another bump in utility when you allow for agents swarms.
Then another one for dynamically generated agent swarms.
The only reason why it's not coming for your job is that LLMs are currently too power hungry to run those jobs for anything but research - at a couple thousand to couple of million times the price of a human doing the work.
Which works out to 10 to 20 epochs of whatever Moore's law looks like in graphics cards.
What is that bump in utility in practical terms? You can point to a benchmark improvement but that's no indication the agent swarm is not reducing to "giving an llm an arbitrary amount of random guesses".
Standard LLM quadratic attention isn't an approximation, it's perfect recall. Approaches that compress that memory down into a fixed-size state are an approximation, and generally perform worse, that's why linear transformers aren't widely used.
Maybe it is, maybe it isn’t. The only thing I know is, none of the arrogant fuckers on hacker news know anything about it. But that won’t stop them from posting.
There's an upside! If they're wrong, and they manage to convince more people—it basically gives you more of an advantage. I don't get into arguments about the utility of LLM technology anymore because why bother?
My take is that even if AI qualitatively stops where it is right now, and only continues to get faster / more memory efficient, it already represents an unprecedented value add to human productivity. Most people just don't see it yet. The reason why that is, is because it "fills in" the weak spots of the human brain - associative memory, attention, working memory constraints, aversion to menial mental work. This does for the brain what industrialization did for the body. All we need to do to realize its potential is emphasize _collaboration_ with AI, rather than _replacement_ by AI, that the pundits currently emphasize as rage (and therefore click) bait.
There are two epistemic poles: the atomistic and the probabilistic. The author subscribes to a rule-based atomistic worldview, asserting that any perspective misaligned with this framework is incorrect. Currently, academia is undergoing a paradigm shift in the field of artificial intelligence. Symbolic AI, which was the initial research focus, is rapidly being replaced by statistical AI methodologies. This transition diminishes the relevance of atomistic or symbolic scientists, making them worry they might become irrelevant.
An observation with scientific paradigm shifts is that they tend not to reverse. Also the lingo someone commented on is that the fundamental problem is the different philosophical views of what knowledge is and can be. Either knowledge is base on symbols and rules like in mathematics or they are probabilities like in anything we actually can measure. Both these views can coexist and maybe AI will find the missing link between them some day. Possibly no human will grasp the link.
Indeed and unfortunately. I've been reading up on "the binding problem" in AI lately and came across a paper that hinged on there being an "object representation" which would magically solve the apparent issues in symbolic AI. In the discussion some 20 pages later, the authors confessed that they, nor anybody else, could define what an object was in the first place. Sometimes the efforts seem focused on "not letting the other team win" rather than actually having something tangible to bring to the table.
I never want to claim certainties, but it seems pretty close to certain that symbolic AI loses to statistical AI.
I think there is room for statistical AI to operate symbolic systems so we can better control outputs. Actually, that's kind of what is going on when we ask AI to write code.
> Many of these neural network systems are stochastic, meaning that providing the same input will not always lead to the same output.
The neural networks are not stochastic. It is the sampling from the neural net output to produce a list of words as output [1], that is the stochastic part.
“One could offer so many examples of such categorical prophecies being quickly refuted by experience! In fact, this type of negative prediction is repeated so frequently that one might ask if it is not prompted by the very proximity of the discovery that one solemnly proclaims will never take place. In every period, any important discovery will threaten some organization of knowledge.”
René Girard, Things Hidden Since the Foundation of the World, p. 4
The British-Canadian computer scientist often touted as a "godfather" of artificial intelligence has shortened the odds of AI wiping out humanity over the next three decades, warning the pace of change in the technology is "much faster" than expected. From a report:
Prof Geoffrey Hinton, who this year was awarded the Nobel prize in physics for his work in AI, said there was a "10 to 20" per cent chance that AI would lead to human extinction within the next three decades.
Previously Hinton had said there was a 10% chance of the technology triggering a catastrophic outcome for humanity. Asked on BBC Radio 4's Today programme if he had changed his analysis of a potential AI apocalypse and the one in 10 chance of it happening, he said: "Not really, 10 to 20 [per cent]."
fact of the matter is that if AIs externalities were exposed - that is massive energy consumption - to end users and humanity in general, no one would use it.
I think this is wildly optimistic about how environmentally conscious customers of LLMs are. People use fossil fuels directly and through electricity consumption in a unconscionable way at a scale wildly exceeding what a ChatGPT user's energy expenditure is.
We desperately need to rapidly regulately down fossils usage and production for both electricity generation and transport. The rest of the world needs to follow the example of the EU CO2 emissions policy which guarantees it's progressing at a downwards slope independent of what the CO2 emissions are spent on.
What I find interesting is current LLM’s are based primarily on written data which is already an abstraction / abbreviation of most observed phenomenon.
What happens when AI starts to send out it own drones or perhaps robots and tries to gather and train based on data it observes itself.
I think we may be closer to this point than we realize… results of AI could get quite interesting once a human level abstraction of knowledge is perhaps reduced.
I use coding libraries which are either custom, recent or haven't gained much traction. Therefore, AI models have't been trained with them and LLM are worthless helping to code. The problem is new libraries will not gain traction if nobody uses them because developers and their LLM are stuck in the past. The evolution of open source code has become stagnant.
Why not feed the library code and documentation to the LLM? Using it as a knowledge base is bound to be limited. But having it be your manual-reading buddy can be very helpful.
I don't understand why people feel the need to lie in these posts. AI isn't only good at using existing codebases. Copy your code in. It will understand it. You either haven't tried or are intentionally misleading people.
I think most AI research up to this day is a dead end. Assuming that intelligence is a problem solvable by computers implies that intelligence is a computable function. Nobody up to this day has been able to give a formal mathematical definition of intelligence, let alone a proof that it can be reduced to a computable function.
So why assume that computer science is the key to solving a problem that cannot even be defined in terms of math? We had formal definitions of computers decades before they became a reality, but somehow cannot make progress in formally defining intelligence.
I do think artificial intelligence can be achieved by making artificial intelligence a multidiscipline endeavor with biological engineering at its core, not computer science. See the work of Michael Levin to see real intelligence in action: https://www.youtube.com/watch?v=Ed3ioGO7g10
> Nobody up to this day has been able to give a formal mathematical definition of intelligence, let alone a proof that it can be reduced to a computable function.
We can't prove the correctness of the plurality of physics. Should we call that a dead end too?
If you believe in functionalism (~mental states are identified by what they do rather than by what they are made of), then current AI is not a dead end.
We wouldn't need to define intelligence, just make it big and efficient enough to replicate what's currently existing would be intelligence by that definition.
My point is that if you use biological cells to drive the system, which already exhibit intelligent behaviors, you don't have to worry about any of these questions. The basic unit you are using is already intelligent, so it's a given that the full system will be intelligent. And not an approximation but the real thing.
Thanks for pointing me out to this. This is a proposed definition of intelligence. Is it the same as the real thing, though? Even assuming that it was:
> Like Solomonoff induction, AIXI is incomputable.
That would mean that computers can, at best, produce an approximation. We know the real thing exists in nature though, so why not take advantage of those competencies?
Whether it is a dead end or not is a question of definition. If something is not economically self-sustaining by itself, is it a dead end? How about if people keep doing it anyway, for other reasons? What if that includes pretending it is not a dead end?
For example, there is a considerable incentive to flood resources to "current AI". As a consequence, enough people have a vested interest to participate in what might be called a shared illusion in "current AI is not a dead end". If enough people participate in such a shared illusion for 10+ years, and the illusion has real-life consequences, is it really a dead end?
I believe that "stochastic parrot" is actually a compliment about the current AI. The sweet spot for current AI appears to be as little stochasticity in the output as possible. Randomness in the output is inversely proportional to the benefit you can gain from the AI.
Value is current AI is gained by allowing randomness in the input, i.e. the training data and the form of questions you can ask it.
For example, you can create an useful AI support system that regurgitates answers to FAQs back to the people, so they can ask questions about the FAQs instead of browsing them.
Such a tool is useful because humans are energy-optimizing, i.e. lazy, i.e. avoid going through the FAQs themselves, i.e. are unwilling to pay for the answer with their attention, i.e. using their own brain to do pattern recognition.
By avoiding directed pattern recognition, or more specifically avoiding consciously directing your attention as much as possible, people are not practicing this ability. Over time, they become less and less capable of conciously directing their attention.
As a consequence, this ability becomes even more scarce resource, making "current AI" more and more useful.
So it is less about AI getting more intelligent, but people getting less intelligent, and thus AI becoming relatively more intelligent.
I think it does represent a dead end, but not for the reasons presented in this article.
The real issue in my opinion is that we will hit practical limits with training data and computational resources well before AGI turns us all into paperclips, basically there is no "Moore's Law" for AI and we are already slowing down using existing models like GPT.
We are in the vertical scaling phase of AI model development, which is not sustainable long-term.
Author here. Fair enough. The "dead end" in the title isn't mine, extracted by editor from what's essentially a side comment on the limits of LLM scaling. My title would have been something like "We can't use current AI for critical applications". With "for" being essentially different from "in", if you want the nuance there.
> The real issue in my opinion is that we will hit practical limits with training data and computational resources well before AGI turns us all into paperclips [...]
I think you are correct, but also I think that even if that were not the case, the Thai Library Problem[1] strongly suggests that AGI will have to be built on something other than LLMs (even if LLM-derived systems were to serve as an interface to such systems).
I call this “The bitter cycle”, after the famous essay.
1. Someone finds a new approach, often based on intuition or understanding.
2. People throw more data at it claiming that only data matters.
3. The new method eventually saturates, everybody starts talking about a new ai winter.
We had this with: perceptrons, conv nets, RNNs. Now we see it with transformers.
My guess is the next iteration will be liquid networks or KANs, depending on which one we figure out how to train efficiently first.
The good thing is that people have been working to build an understanding of why these things work for the last 20 years, so the period between the cycles gets shorter.
There is no reason to privilege compositionality/modularity vs emergence. One day we may have the emergence of compositionality in a large model. It would be a dead end if this was probably not possible
I don’t see how it would because at the end of the day a model is like a program… input->output. This seems infinitely useful and we are just starting to understand how to use this new way of computing.
What I funny is the discussion recolve a lot around software development, where LLMs excel at. Outside this and creating junk text, like a government report, patent application etc they seem to be pretty useless. So most of the society doesn't care about it and it's not as big as a revolution as SWEs think it is at the moment and the discussion for future is actually philosophical: do we think the trend of development continue or we will hit a wall.
LLMs are text compression algos.
They are very good at memorising & retrieving text related to the user input.
They can even pass bar exams based on that - some misinterpret that as intelligence.
However, no amount of scaling will change a text memorisation algo into a symbolic reasoning or composability algo, both of which are necessary for progress towards AGI.
So yes, LLMs are a dead end in the quest for AGI.
However, they have their uses as a google/stackoverflow replacement.
LLM yappers are everywhere. One dude with a lot of influence is busy writing blogs on why “prompt engineering” is a “real skill” and engaging in the same banal discourse on every social media platform under the sun. Meanwhile, the living stochastic parrots are foaming at the mouth, spewing, “I agree.”
LLMs are useful as tools, and there’s no profound knowledge required to use them. Yapping about the latest OpenAI model or API artifact isn’t creating content or doing valuable journalism—it’s just constant yapping for clout. I hope this nonsense normalizes quickly and dies down.
AI is useful as a tool but it is far from trustworthy.
I just used Grok to write some CRON scripts for me, gave me perfectly good results, if you know exactly what you want, it is great.
It is not the end of software programmers though and is very dangerous to give it too much leeway because you will almost certainly end up with problems.
I agree with the conclusion that a hybrid model is possible.
It speeds up code writing, it's not useless. Best use case for me is to help me understand libraries that are sparsely documented (e.g. dotnet roslyn api).
If I can get 100 lines generated instantly while explaining it in 25, scan the answer just to validate it and then, no wait, add other 50 lines as I forgot something before. All that in minutes then I'm happy.
Plus I can detach the "tell the AI" part from the actual running of the code. That's pretty powerful to me.
For instance, I could be on the train thinking of something, chat it over with an LLM, get it where I want and then pause before actually copying it into the project.
The elephant in the room: The user interface problem
We seem to dancing around a problem in the middle of the room like an elephant no one is acknowledging, and that is the interface to Artificial Intelligence and Generative AI is a place that requires several degrees of innovations.
I would argue that the first winning feat of innovation on interfacing with AI was the "CHAT BOX". And it works well enough for the 40% of use cases. And there is another 20% of uses that WE THE PEOPLE can use our imagination (prompt engineering) to manipulate the chat box to solve. On this topic, there was an article/opinion that said complex LLMs are unnecessary because 90% of people don't need it. Yeah. Because the chat box cannot do much more that would require heavier LLMs.
Complex AI and large data sets need nicer presentation and graphics, more actionable interfaces, and more refined activity concepts, as well as metadata that gives information on the reliability or usability of generated information.
Things like edit sections of an article, enhance articles, simplify articles, add relevant images, compress text to fit in a limited space, generate sql data from these reports, refine patterns found in a page with supplied examples, remove objects, add objects, etc.
Some innovation has to happen in MS Office interfaces. Some innovations have to happen in photoshop-like interfaces.
The author is complaining about utopian systems being incompatible with AI. I would argue AI is a utopian system being used in a dystopian world where we are lacking rich usable interfaces.
Last week I had to caution a junior engineer on my team to only use an LLM for the first pass, and never rely on the output unmoderated.
They're fine as glorified autocomplete, fuzzy search, or other applications where accuracy isn't required. But to rely on them in any situation where accuracy is important is professional negligence.
Does anyone seriously think that results of any current approaches would suddenly turn into godlike, super-intelligent AGI if only we threw an arbitrary number of GPUs at them? I guess I assumed everyone believed this was a stepping stone at best, but were happy that it turned out to have some utility.
I'm not convinced AI is as hamstrung as people seem to think. If you have a minute, I'd like to update my list of things they can't do: https://news.ycombinator.com/item?id=42523273
Somehow, fallible humans create robust systems. Look to "AI' to do the same, at a far higher speed. The "AI" doesn't need to recite the Fibonacci sequence; it can write (and test) a program that does so. Speed is power.
Why is it that LLMs are ‘stochastic’, shouldn’t the same input lead to the same output? Is the LLM somehow modifying itself in production? Or is it just flipping bits caused by cosmic radiation?
For Mixture of Expert models (like GPTs are), they can produce different results for an input sequence if that sequence is retried together with a different set of sequences in its inference batch, because of the model (“expert”) routing depends on the batch, not the single sequence: https://152334h.github.io/blog/non-determinism-in-gpt-4/
And in general, binary floating point arithmetic cannot guarantee associativity - i.e. `(a + b) + c` might not be the same as `a + (b + c)`. That in turn can lead to the model picking another token in rare cases (and it’s auto-regressive consequences, that the entire remainder of the generated sequence might differ): https://www.ingonyama.com/blog/solving-reproducibility-chall...
Edit: Of course, my answer assumes you are asking about the case when the model lets you set its token generation temperature (stochasticity) to exactly zero. With default parameter settings, all LLMs I know of randomly pick among the best tokens.
They always return the same output for the same input. That is how tests are done for llama.cpp, for example.
To get variety, you give each person a different seed. That way each user gets consistent answers but different than each other. You can add some randomness in each call if you don’t want the same person getting the same output for the same input.
It would be impossible to test and benchmark llama.cpp et al otherwise!
By the time you get to a UI someone has made these decisions for you.
Author here. Disagree on not slowing down. Indeed "dead end" isn't really the focus of the argument, editor's choice of title, not mine. Focus of argument is these systems are foundationally unreliable so can't be used as critical applications [though maybe as intensively managed components of such].
I hope so because I'm extraordinarily sick of the technology. I can't really ask a question at work without some jackass posting an LLM answer in there. The answers almost never amount to anything useful, but no one can tell since it looks clearly written. They're "participating" but haven't actually done anything worthwhile.
I hope so, but for different reasons. Agreed they spit out plenty of gibberish at the moment, but they’ve also progressed so far so fast it’s pretty scary. If we get to a legitimate artificial general super intelligence, I’m about 95% sure that will be terrible for the vast, vast majority of humans, we'll be obsolete. Crossing my fingers that the current AI surge stops well short of that, and the push that eventually does get there is way, way off into the future.
I believe (most) people contribute their ambitions to nurture safe, peaceful, friend-filled communities. AGI won’t obsolete those human desires. Hopefully we weather the turbulence that comes with change and come out the other side with new tools that enable our pursuits. In the macro, that’s been the case. I am grateful to live in a time of literacy, antibiotics, sanitation, electricity… and am optimistic that if AGI emerges, it joins that list of human empowering creations.
Gotta wonder if Google has used code from internal systems to train Gemini? Probably not, but at what point will companies start forking over source code for LLM training for money?
It seems much cheaper, safer legally and more easily scalable to simply synthesize programs. Most code out there is shit anyway, and the code you can get by the GB especially so.
I would assume that internal code at Google is of higher quality than random code you find on Github. Commit messages, issue descriptions and code review is probably more useful too.
If and only if something like high-paying UBI comes along, and people are freed to pursue their passions and as a consequence, benefit the world much more intensely.
Inflation is a lack of goods for a given demand though. Ie if we can flood the world with cheap goods then inflation won't happen. That would make practical UBI possible. To some extent it has already happened.
My intuition, based on what I know of economics, is that a UBI policy would have results something like the following:
* Inflation, things get more expensive. People attempt to consume more, especially people with low income.
* People can't consume more than is produced, so prices go up.
* People who are above the break-even line (when you factor in the taxes) consume a bit less, or stay the same and just save less or reduce investments.
* Producers, seeing higher prices, are incentivized to produce more. Increases in production tend to be concentrated toward the things that people who were previously very income-limited want to buy. I'd expect a good bit of that to be basic essentials, but of course it would include lots of different things.
* The system reaches a new equilibrium, with the allocation of produced goods being a bit more aimed toward the things regular people want, and a bit less toward luxury goods for the wealthy.
* Some people quit work to take care of their kids full-time. The change in wages of those who stay working depends heavily on how competitive their skills are -- some earn less, but with the UBI still win out. Some may actually get paid more even without counting the UBI, if a lot of workers in their industry have quit due to the UBI, and there's increased demand for the products.
* Prices have risen, but not enough to cancel out one's additional UBI income entirely. It's very hard to say how much would be eaten up by inflation, but I'd expect it's not 10% or 90%, probably somewhere in between. Getting an accurate figure for that would take a lot of research and modeling.
Basically, I think it's complicated, with all the second and third-order effects, but I can't imagine a situation where so much of the UBI is captured by inflation that it makes it pointless. I do think that as a society, we should be morally responsible for people who can't earn a living for whatever reason, and I think UBI is a better system than a patchwork of various services with onerous requirements that people have to put a lot of effort into navigating, and where finding gainful employment will cause you to lose benefits.
The idea that AI will ever remove all struggle, even if it reaches AGI, is absurd. AI by itself can't give you a hug, for example--and even if advances in robotics make it possible for an AI-controlled robot to do that, there are dozens of unsolved problems beyond that to make that something that most people would even want.
AI enthusiasm really is reaching a religious level of ridiculous beliefs and this point.
I doubt ai will remove all struggle. I suspect we wouldn't see great extents of human passion in a world where everyone is fed, clothed, housed, etc without needing to exert themselves at all.
And AI isn't going to feed, clothe, or house people either.
AGI, at best, would provide ideas for how to do those things. And the current AI, which is not AGI, can only remix ideas humans have already given it--ideas which haven't fed, clothed, or housed us all yet.
That requires achieving post-scarcity to work in practice and be fair, though. If achievable, it’s not clear how it relates to AGI. I mean, there’s plenty of intelligence on this planet already, and resources are still limited - and it’s not like AGI would somehow change that.
One thing I thought recently, is that a large amount of work is currently monitoring and correcting human activity. Corporate law, accounting, HR and services etc. If we have AGI that is forced to be compliant, then all these businesses disappear. Large companies are suddenly made redundant, regardless of whether they replace their staff with AI or not.
I agree that if true AGI happens (current systems still cannot reason at all, only pretend to do so) and if it comes out cheaper to deploy and maintain, that would mean a lot of professions could be automated away.
However, I believe this had already happened quite a few times in history - industries becoming obsolete with technological advances isn’t anything new. This creates some unrest as society needs to transition, but those people are always learning a different profession. Or retire if they can. Or try to survive some other way (which is bad, of course).
It would be nice, of course, if everyone won’t have to work unless they feel the need and desire to do so. But in our reality, where the resources are scarce and their distribution in a way that everyone will be happy is a super hard unsolved problem (and AGI won’t help here - it’s not some Deus ex Machina coming to solve world problems, it’s just a thinking computer), I don’t see a realistic and fair way to achieve this.
Put simply, all the reasons we cannot implement UBI now will still remain in place - AGI simply won’t help with this.
I guess the point I am trying to make, is that paradoxically the more an AI company's products are integrated into the economy, the less value they can extract from the economy. As a large amount of the world's economic output is just dealing with the human factor.
For the vast majority of people, getting rid of necessary work will usher in an unprecedented crisis of meaning. Most people aren't the type pursue creative ends if they didn't have to work. They would veg out or engage in degenerate activities. Many people have their identity wrapped up in the work they do, or being a provider. Take this away without having something to replace it with will be devastating.
Good. Finally they’ll realize the meaninglessness of their work and how they’ve been exploited in the most insidious way. To the point of forgetting to answer the question of what it is they most want to do in life.
The brain does saturate eventually and gets bored. Then the crisis of meaning. Then something meaningful emerges.
We’re all gonna die. Let’s just enjoy life to the fullest.
>I do expect the next comment would be something like "work is a path to godliness"
And you think these kinds of maxims formed out of vacuums? They are the kinds of sayings that are formed through experience re-enforced over generations. We can't just completely reject all historical knowledge encoded in our cultural maxims and expect everything to work out just fine. Yes, it is true that most people not having productive work will fill the time with frivolous or destructive ends. Modernity does not mean we've somehow transcended our historical past.
> They are the kinds of sayings that are formed through experience re-enforced over generations.
Sure, but the whole point is that the conditions that led to those sayings would no longer be there.
Put a different way: those sayings and attitudes were necessary in the first place because society needed people to work in order to sustain itself. In a system where individual human work is no longer necessary, of what use is that cultural attitude?
It wasn't just about getting people to work, but keeping people from degenerate and/or anti-social behavior. Probably the single biggest factor in the success of a society is channeling young adult male behavior towards productive ends. Getting them to work is part of it, but also keeping them from destructive behavior. In a world where basic needs are provided for automatically, status-seeking behavior doesn't evaporate, it just no longer has a productive direction that anyone can make use of. Now we have idle young men at the peak of their status-seeking behavior with little productive avenues available to them. It's not hard to predict this doesn't end well.
Beyond the issues of young males, there's many other ways for degenerate behavior to cause problems. Drinking, gambling, drugs, being a general nuisance, all these things will skyrocket if people have endless time to fill. Just during the pandemic, we saw the growth of roving gangs riding ATVs in some cities causing a serious disturbance. Some cities now have a culture of teenagers hijacking cars. What happens to these people who are on the brink when they no longer see the need to go to school because their basic needs are met? Nothing good, that's for sure.
What exactly do you think would happen? Usually wars are about resources. When resource distribution stops being a problem (i.e, anyone can live like a king just by existing), where exactly does a problem manifest?
All the "degenerate activities" you mentioned are a problem in the first place because in a scarcity-based society they slow down/prevent people from working, therefore society is worse off. That logic makes no sense in a world where people don't need to put a single drop of effort for society to function well.
>All the "degenerate activities" you mentioned are a problem in the first place because in a scarcity-based society they slow down/prevent people from working
This is a weird take. Families are worse off if a parent has an addiction because it potentially makes their lives a living hell. Everyone is worse off if people feel unsafe because of a degenerate sub-culture that glorifies things like hijacking cars. People who don't behave in predictable ways create low-trust environments which impacts everyone.
I would say that those attitudes are 99% caused by resource-related issues. There's a reason why drug abuse (and antisocial behavior generally) is mostly found among the lower classes.
If I could pick between the world we are in now and one where all the problems societies face that are related, directly or indirectly, to the distribution of resources are eliminated, I would pick the latter in a heartbeat. The "price to pay" in the form of a possible uptick in "degeneracy" during the first few months/years is worth it, not to mention that I doubt that problem would arise at all.
It's a dangerous fantasy to think that all societal problems are caused by uneven distribution of wealth and that they will be solved by redistribution. No, some people just aren't psychologically suited to the modern world, whether that involves delaying gratification or rejecting low effort, high dopamine stimulation. The structure involved in necessary work and the social structures that lead people down productive paths are one way we collectively cope with the incongruence between our society and our psychology. Take away these structures and the results have the potential to be massively destabilizing.
You're just saying it's desirable that some people be at the bottom even in a scenario where the opposite could be feasibly achieved. All on some theory that the human mind (or at least some instances of it in the population) simply... won't be able to take it without going insane?
We should need a much, much higher standard of proof for what could result in unnecessary pain and suffering for years. Especially when this:
> some people just aren't psychologically suited to the modern world, whether that involves delaying gratification or rejecting low effort, high dopamine stimulation.
...is not a proven fact, and is, with respect to social media, highly contested and inconclusive.
>You're just saying it's desirable that some people be at the bottom even in a scenario where the opposite could be feasibly achieved.
What's wrong with having people at the relative bottom? Trying to force equality onto society does not have a good track record. We can raise the absolute bottom past the point of poverty while also not upending social structures that have served us well for centuries.
>All on some theory that the human mind... simply... won't be able to take it without going insane?
I'm saying transformative change across the whole of society shouldn't be undertaken lightly. I don't need to prove that a world where human labor is obsolete would be damaging to the human psyche. Those who want to rush ahead just assume things will be just fine. They have the burden of proof. We've seen how bad things can get when the social engineers get it wrong. We're at a local peak in human flourishing for a large part of humanity. Why should we pull the lever on the unknown in hopes that we will come out ahead?
> And you think these kinds of maxims formed out of vacuums?
No, they formed in societies where it WAS necessary for most people to work in order to support the community. We needed a lot of labor to survive, so it was important to incentivize people to work hard, so our cultures developed values around work ethics.
As we move more and more towards a world where we actually don’t need everyone to work, those moral values become more and more outdated.
This is just like old religious rules around eating certain foods; in the past, we were at risk from a lot of diseases and avoiding certain foods was important for our health. Now, we don’t face those same risks so many people have moved on from those rules.
>those moral values become more and more outdated.
Do you think there was ever a time in human societies where the vast majority of people didn't have to "work" in some capacity, at least since the rise of psychologically modern humans? If not, why think humanity as a whole can thrive in such an environment?
Our environment today is completely different that it was even 100 years ago. Yes, you have to ask this question for every part of modern society (fast travel, photographs, video, computers, antibiotics, vaccines, etc), so I am not sure why work is different.
Part of the problem is that we don't ask these questions when we should be. Social media, for example, represents a unique assault on our psychological makeup that we just uncritically unleashed on the world. We're about to do it again, likely with even worse consequences.
What would "asking these questions" entail? Would you have a committee that decides what new things we would allow? Popular vote? I get the idea, I just know see how you could ever actually do anything about this issue unless you completely outlawed anything new.
I don't think its plausible to have a committee to approve all new technology. But it is plausible to have a committee empowered to place limits on technology that we can predict will cause a social upheaval the likes of which we've never seen in modern times. It's not like we haven't done the equivalent of this before with e.g. nuclear and bioengineering technology. The difficulty is that the speed in which AI is being developed makes it so government bureaucracies are necessarily playing catchup. But it can be done. We just need to accept that we're not powerless to shape our collective futures. We are not at the mercy of technology and the few accelerationists who stand to be the new aristocracy in the new world.
I find this comment to be completely shortsighted.
We now have western societies with a growing population of homeless people, that despite having access to tons of resources at their disposal, still can't get their shit together. A great majority are doing drugs and smoking/abusing alcohol.
And it's enough to have 20 crackheads to destroy a neighborhood of 10000 hard-working, peaceful people.
The way most of the world is setup we will need to first address the unprecedented crisis of financing our day to day lives. We figure that out and I’m sure people will find other sources of meaning in their lives.
The people that truly enjoy their work and obtain meaning from it are vastly over represented here on HN.
Very few would be scared of AI if they had a financial stake in its implementation.
“We should do away with the absolutely specious notion that everybody has to earn a living. It is a fact today that one in ten thousand of us can make a technological breakthrough capable of supporting all the rest. The youth of today are absolutely right in recognizing this nonsense of earning a living. We keep inventing jobs because of this false idea that everybody has to be employed at some kind of drudgery because, according to Malthusian Darwinian theory he must justify his right to exist. So we have inspectors of inspectors and people making instruments for inspectors to inspect inspectors. The true business of people should be to go back to school and think about whatever it was they were thinking about before somebody came along and told them they had to earn a living.” — Buckminster Fuller
It may be impossible in this world to expect a form of donation, but it is certainly not impossible to expect forms of investment.
One idea I had is everyone is paid a thriving wage, and in exchange, if they in the future develop their passion into something that can make a profit, they pay back 20% of their profits they make up to some capped amount.
This allows for extreme generality. It truly frees people to pursue whatever they fancy every day until they catch lightning in a bottle.
There would be 0 obligation as to what to do, and when to pay back the money. But of course would have to be only open to honest people, so that neither side is exploiting the other.
Both sides need a sense of gratitude, and wanting to give back. A philanthropic 'flair' "If it doesn't work out, it's okay", and a gratitude and wanting to give back someday on the side of the receiver, as they continue working on probably the most resilient thing they could ever work on (the safest investment), their lifelong passion.
I think of ChatGPT as a faster Google or Stackoverflow and all of my colleagues are using it almost exclusively in this way. That is still quite impressive but it isn’t what Altman set out to achieve (and he admits this quite candidly).
What would make me change my mind? If ChatGPT could take the lead on designing a robot through all the steps: design, contract the parts and assembly, market it, and sell it that would really be something.
I assume for something like this to happen it would need all source code and design docs from Boston Dynamics in the training set. It seems unlikely it could independently make the same discoveries on its own.
> I assume for something like this to happen it would need all source code and design docs from Boston Dynamics in the training set. It seems unlikely it could independently make the same discoveries on its own.
No, to do this it would need to be able to independently reason, if it could do that, then the training data stops mattering. Training data is a crutch that makes these algos appear more intelligent than they are. If they were truly intelligent they would be able to learn independently and find information on their own.
> I’m about 95% sure that will be terrible for the vast, vast majority of humans, we'll be obsolete.
This isn't a criticism of you, but this is a very stupid idea that we have. The economy is mean to serveus. If it can't, we need to completely re-organize it because the old model has become invalid. We shouldn't exist to serve the economy. That's an absolutely absurd idea that needs to be killed in every single one of us.
> we need to completely re-organize it because the old model has become invalid
that's called social revolution, and those who benefit from the old model (currently that would be the holders of capital, and more so as AI grows in its capabilities and increasingly supplants human labor) will do everything in their power to prevent that re-organization
Nevertheless the modern economy has been deliberately designed. Emergent behaviors within it at the highest levels are actively monitored and culled when deemed not cost effective or straight out harmful.
The problem is no one is talking about this. We’re clearly headed towards such a world, and it’s irrelevant whether this incarnation will completely achieve that.
And anyone who poo poos ChatGPT needs to remember we went from “this isn’t going to happen in the next 20 years” to “this is happening tomorrow” overnight. It’s pretty obvious I’m going to be installing Microsoft Employee Service Pack 2 in my lifetime.
Very true but the question, as always, is by what means we can enact this change? The economy may well continue to serve the owner class even if all workers are replaced with robots.
I think the options are pretty clear. A negotiation of gradual escalation: Democracy, protests, civil disobedience, strikes, sabotage and if all else fails then at some point, warfare.
Great theory. In reality the vast majority us serves only the economy without getting anything truly valuable in return. We serve it only, with noticing it, to grow into less human and more individual shells of less human. Machines of the Economy.
This doesn't engage with the problem of coordinating everyone around some proposed solution and so is useless. Yes, if we could all just magically decide on a better system of government, everything would be great!
Identifying the problem is never useless. We need the right understanding if we're going to move forward. Believing we serve the economy and not the other way around hinders any progress on that front and so inverting it is a solid first step.
Until you try and you find that all the arable land is already occupied by industrial agriculture, the ADMs/Cargills of the world, using capital intensive brute force uniformity to extract more value from the land than you can compete with, while somehow simultaneously treating the earth destructively and inefficiently.
This is both a metaphor for AGI and not a metaphor at all.
Sure, if you can survive the period between the obsolescence of human labor and the achievement of post-scarcity. Do you really think that period of time is zero, or that the first version of a post-scarcity economy will be able to carry the current population? No, such a transition implies a brutish end for most.
Sorry, I was being too subtle. When nobody has a job anymore and the economy is crashing, I'm looking forward to moving into the country and becoming self sufficient.
We'll be very very poor, and it will be really hard work, but I'm looking forward to the challenge.
Human labour will never be obsolete because you can always work for yourself.
Post scarcity will never happen unless some benevolent AI god chooses to give it to us like in a Banks novel.
think more deeply . who benefits with super intelligence ? at the end it is game of what humans desire naturally. AI has no incentive and are not controlled by hormones.
It's already impacting some of us. I hope it never appears until the human civilization undergoes a profound change. But I'm afraid many rich people want that happen.
LLMs still completely won't admit that they're wrong, don't have enough information or that the information could have changed - Asking anything about Svelte 5 is an incredible experience currently.
At the end of the day it's a tool currently, with surface-level information it's incredibly helpful in my opinion - Getting an overview of a subject or even coding smaller functions.
What's interesting in my opinion is "agents" though... not in the current "let's slap an LLM into some workflow", but as a concept that is at least an order of magnitude away from what is possible today.
Working with Svelte 5 and LLMs is a real nightmare.
AI agents are really interesting. Fundamentally they may represent a step toward the autonomization of capital, potentially disrupting "traditional legal definitions of personhood, agency, and property" [0] and leading to the need to recognize "capital self-ownership" [1].
It's fairly easy to prompt an LLM in a way where they're encouraged to say they don't know. Doesn't work 100% but cuts down the hallucinations A LOT. Alternatively, follow up with "please double check..."
I have never personally met any malicious actor that knowingly dump unverified shit straight from GPT. However, I have met people IRL who gave way too much authority to those quantized model weights, got genuinely confused when the generated text doesn't agree with human written technical information.
To them, chatgpt IS the verification.
I am not optimistic about the future. But also perhaps some amazing people will deal with the error for the rest of us, like how most people don't go and worry about floating point error, and I'm just not smart enough to see how it looks like.
Reminds me of the stories about people slavishly following Apple or Google maps navigation when driving, despite the obvious signs that the suggested route is bonkers, like say trying to take you across a runway[1].
This comment reads like a culture problem not an LLM problem.
Imagine for a moment that you work as a developer, encounter a weird bug, and post your problem into your company’s Slack. Other devs then send a bunch of StackOverflow links that have nothing to do with your problem or don’t address your central issue. Is this a problem with StackOverflow or with coworkers posting links uncritically?
I develop sophisticated LLM programs every day at a small YC startup — extracting insights from thousands of documents a day.
These LLM programs are very different than naive one-shot questions asked of ChatGPT, resembling o1/3 thinking that integrates human domain knowledge to produce great answers that would have been cost-prohibitive for humans to do manually.
Naive use of LLMs by non-technical users is annoying, but is also a straw-man argument against the technology. Smart usage of LLMs in o1/3 style of emulated reasoning unlocks entirely new realms of functionality.
LLMs are analogous to a new programming platform, such as iPhones and VR. New platforms unlock new functionality along with various tradeoffs. We need time to explore what makes sense to build on top of this platform, and what things don’t make sense.
What we shouldn’t do is give blanket approval or disapproval. Like any other technology, we should use the right tool for the job and utilize said tool correctly and effectively.
There is nothing to build on top of this AI platform as you call it. AI is nothing but an autocorrect program, AI is not innovating anything anywhere. Surprises me how much even the smartest people are deceived by simple trickery and continue to fall for every illusion.
>Naive use of LLMs by non-technical users is annoying, but is also a straw-man argument against the technology. Smart usage of LLMs in o1/3 style of emulated reasoning unlocks entirely new realms of functionality.
I agree in principle, but disagree in practice. With LLMs available to everyone, the uses we're seeing currently will only proliferate. Is that strictly a technology problem? No, but it's cold comfort given how LLM usage is actually playing out day-to-day. Social media is a useful metaphor here: it could potentially be a strictly useful technology, but in practice it's used to quite deleterious effect.
Pretty much. It should be considered rude to send AI output to others without fact checking and editing. Anyone asking a person for help isn’t looking for an answer straight from Google or ChatGPT.
This may be the "cell phones in public" stage, but society has completely failed to adapt well to ubiquitous cell phone usage. There are many new psychological and behavioral issues associated with cell phone usage.
I find this kind of argument comes up a lot and it seems fundamentally flawed to me.
1. You can set a bar wherever you want for a level of "seriousness" and huge swathes of real world work will fall below it, and are therefore attractive to tackle with these systems.
2. We build critical large scale systems out of humans, which are fallible and unverifiable. That's not to say current LLMs are human or equivalent, but "we can't verify X works all the time" doesn't stop us doing exactly that a lot. We deal with this by learning how humans make mistakes, why, and build systems of checks around that. There is nothing in my ind that stops us doing the same with other AI systems.
3. Software is written by, checked by and verified by humans at least at some critical point - so even verified software still has this same problem.
We've also been doing this kind of thing with ML models for ages, and we use buggy systems for an enormous amount of work worldwide. You can argue we shouldn't and should have fully formally verified systems for everything, but you can't deny that right now we have large serious systems without that.
And if your goal is "replace a human" then I just don't think you can reasonably say that it requires verifiable software.
> Systems are not explainable, as they have no model of knowledge and no representation of any ‘reasoning’.
Neither of those statements are true are they? There are internal models, and recent models are designed around having a representation of reasoning before replying.
> current generative AI systems represent a dead end, where exponential increases of training data and effort will give us modest increases in impressive plausibility but no foundational increase in reliability
And yet reliability is something we see improve as LLMs get better and we get better at training them.
Honestly, I think it's nothing special to say that certain technologies have an end point.
We had lots of advancements in single core CPUs but eventually more than that was necessary, now the same is happening with monolithic chips vs chiplet designs.
Same for something like HTTP/1.1 and HTTP/2 and now HTTP/3.
Same for traditional rendering vs something like raytracing and other approaches.
I assume it's the same for typical spell checking and writing assistants vs LLM based ones.
That it's the same for typical autocomplete solutions vs LLM based ones.
It does seem that there weren't former technological solutions for images/animations/models etc. (maybe the likes of Mixamo and animation retargeting, but not much for replacing a concept artist for shops that can't afford one).
Each technology, including the various forms of AI have their limitations, with the exception of how much money has been spent on training the likes of models behind ChatGPT etc. Nothing wrong with that, I'll use LLMs what they're good for and look for something else once new technologies become available.
>...developing software to align with the principle that impactful software systems need to be trustworthy, which implies their development needs to be managed, transparent and accountable.
The author severely discounts the value of opacity and unaccountability in modern software systems. Large organizations previous had to mitigate moral hazard with unreliable and burdened-with-conscience labor. LLM style software is superior on every axis in this application.
“In my mind, all this puts even state-of-the-art current AI systems in a position where professional responsibility dictates the avoidance of them in any serious application.”
And yet here we are with what we all think of as serious and seriously useful applications.
“My first 20 years of research were in formal methods, where mathematics and logic are used to ensure systems operate according to precise formal specifications, or at least to support verification of implemented systems.”
I think recommending avoiding building anything serious in the field until your outdated verification methodology catches up is unreasonably cynical, but also naive because it discards the true nature of our global society and assumes a lab environment where this kind of control is possible.
Author here. Unfair point, article makes clear I don't expect "outdated verification methodology" to catch up - and I even link that pessimistic expectation with the issue of emergence that fatally undermines NNs.
Point about lab vs reality is fair.
Anyone making big bold claims about what LLMs definitely CAN or CANNOT do is FULL OF SHIT. Not even the worlds top experts are certain where the limit of these technologies are and we are already connecting them to tools, making them Agentic, etc. so the era of 'pure' LLM chatbots is already dead imo.
The perspective of the article seems to be of a person who has not worked in cookie-cutter software engineering environments, which is what 99% of software engineering is. Here formal methods and verification are irrelevant, nobody has heard of it, and nobody cares. It's about churning out some crappy internal webapp or mobile app or data eng/science pipeline at the lowest cost by some soulless bigco (or startup). LLMs are already super useful for this.
Also, the argument about LLMs being black-box is a miss, because LLMs are writing software, they are not the software; programmers writing software are also black boxes. Also, there's nothing in the way of running formal methods on the software produced by an LLM, it will fail the same as it fails on software written by humans, since most formal verification doesn't even make sense for 99% of software.
Also, anybody who has used LLMs as aids for writing simple/smaller chunks of code (and other documents) knows that they're super useful (sometimes magic). It's like Steve Ballmer saying the iphone is a joke in 2007.
Author here. Fair enough on my industry experience. But I hope components, unit testing, regression testing, etc aren't as easily dismissed in real SE environments - no trouble believing formal methods and verification are off the radar.
The article is not about using AI to write code (which may work to some level of satisfaction for some people) but about using AI as code.
Thanks for your reply! Hope my comment wasn't offensive.
I definitely think components, unit testing, regression testing are good things and are done at good software houses. In my experience however, most of these things are mostly cargo culted at best in a many other environments.
When I wrote my comment I was wondering about the "AI to write code" vs "AI as code" point. In my vocabulary, "AI as code" would be "Data Science models", like a ranking engine for ads in a newsfeed? I certainly understand the idea of having an AI "emulate" an application like Word.exe or Doom.exe, and there's been research into this direction, but as far as I can tell that is not the general direction the industry is headed in --- rather it's the "AI to write code" direction.
Thanks. In a broader sense, with "AI as code" I mean any situation where we ask an AI model for answers or decisions where we otherwise might have written a program to solve it. See also "LLM functionalism" in the article. Particularly where we need to rely on the outcome - so not "predictions", "suggestions", or "recommendations" all of which we expect to have limited reliability which we mitigate through modifying or ignoring.
An actual "thinking machine" would be constantly running computations on its accumulated experience in order to improve its future output and/or further compress its sensory history.
An LLM is doing exactly nothing while waiting for the next prompt.
self prompting via chain of thought and tree of thought can be used in combination with updating memory containing knowledge graphs combined with cognitive architectures like SOAR and continuous external new information and sensory data … with LLM at the heart of that system and it will exactly be a “thinking machine”. The problem is currently it’s very expensive to be continuously running inference full time and all the engineering around memory storage, like RAG patterns, and the cognitive architecture design is all a work in progress. It’s coming soon though.
If it's coalescing learning in realtime across all user/sessions, that's more constant than you're maybe giving it credit for. I'm not sure if GPT4o and friends are actually built that way though.
We are thinking machines and we keep thinking because we have one goal which is to survive, machines have no such true goals. I mean true because our biology forces us to do that
There is a limited amount of computation that you can useful do in the absence of new input (like an LLM between prompts). If you do as much computation as you usefully can (with your current algorithmic limits) in a burst immediately when you receive a prompt, output, and then go into a sleep state, that seems obviously better than receive a prompt, output, and then do some of the computation that you can usefully do after your output.
I see people say this all the time and it sounds like a pretty cosmetic distinction. Like, you could wire up an LLM to a systemd service or cron job and then it wouldn’t be “waiting”, it could be constantly processing new inputs. And some of the more advanced models already have ways of compressing the older parts of their context window to achieve extremely long context lengths.
Not only does a training pass take more time and memory than an inference pass, but if you remember the Microsoft Tay incident, it should be self-explainatory why this is a bad idea without a new architecture.
LLMs are a valuable tool for augmenting productivity. Used properly, they do give you a competitive advantage over someone who isn't using them.
The "dead end" is in them being some magical replacement for skilled employees. The levels of delusion pumping out of SV and AI companies desperate to make a buck is unreal. They talk about chat bots like they're already solving humanity's toughest problems (or will be in "just two more weeks"). In reality, they're approximately good at solving certain problems (and they can only ever solve them from the POV of existing human knowledge—they can't create). You still have to hold their hand quite a bit.
This current wave of tech is going to have an identical outcome to the "blockchain all the things" nightmare from a few years back.
Long-term, there's a lot of potential for AI but this is just a significant step forward. We're not "there" yet and won't be for some time.
Seems completely nonsensical. Yes, neural networks themselves are not unit testable, modular, symbolic or verifiable. That’s why we have them produce code artifacts - which possess all those traits and can be reviewed by both humans and other machines. It’s completely analogous to human software engineers, who are unfortunately black boxes as well.
More broadly, I’ve learned to attach 0 credence to any conceptual argument that an approach will not lead somewhere interesting. The hit rate on these negative theories is atrocious, they are often motivated by impure reasons, and the downside is very asymmetric (who cares if you sidestep a boring path? yet how brutal is it to miss an easy and powerful solution?)
So what have you gained in the process, other than wasting significantly higher amounts of energy in the form of heat and other emissions? It is nothing like software engineering; clearly you speak out of ignorance.
And then you say you attach 0 credence whatever, but you give no reasons for why others should buy your points. You don't really seem to have much of a point, anyway.
My argument: the theoretical limitations of NNs (lack of modularity, symbolic reasoning, verifiability) cause no practical problems to usefulness - we can just analyze the code artifacts as we do with human programmers. Do you disagree?
Yes. These limitations are not theoretical at all. The author touches on compositionality --- how a problem/program be decomposed into smaller, orthogonal problems/programs, reasoned about and tested separately, and then abstracted away in an interface that hides the implementation details. This is the essence of programming and software engineering at large, whether you're programming in assembly, Java, or Haskell. To divide and conquer so that we can fit an isolated aspect of the program in brain cache so that we can reason about it. This is a fundamental limitation and will not change until the year 40,000 when we have Space Marines.
A neural network, conversely, is a big ball of mud. Impossible to reason about and to test except for whole-system, end-to-end testing, which is impossible to do exhaustively because of the size of the state space. It is, by design, unexplainable and untestable, and therefore unreliable. It's why you use globals in C only judiciously. (I am just rephrasing the article here, not saying anything new.)
And the evidence that it causes practical problems to usefulness is already out there; "hallucinations" are simply errors, just that corporate PR likes to pretend that it's a "feature" and not a bug. This is delusional. A society seeking digitalization should run away from this level of stupidity.
Useless and dead end aren’t synonymous. It’s most certainly a dead end, but it’s also not useless.
There a lot of comments here already conflating these two.
This article is also pretty crap. There’s a decent summary box but other than that it’s all regurgitated half-wisdoms we’ve all already realized: things will change, probably a lot; nobody knows what the end goal is or how far we are from it; the next quantum leap almost certainly depends on a transcendent architecture or new model entirely.
This whole article could’ve been a single paragraph honestly, and a lot of the comments here probably wouldn’t have read that either… just sayin
Betteridge's law of headlines, current AI may absolutely be a dead end, but fortunately technology is evolving and changing - who knows what the future will hold.
> Eerke Boiten, Professor of Cyber Security at De Montfort University Leicester, explains his belief that current AI should not be used for serious applications.
> In my mind, all this puts even state-of-the-art current AI systems in a position where professional responsibility dictates the avoidance of them in any serious application.
> Current AI systems also have a role to play as components of larger systems in limited scopes where their potentially erroneous outputs can be reliably detected and managed, or in contexts such as weather prediction where we had always expected stochastic predictions rather than certainty.
I think it's important to note that:
- Boiten is a security expert, but doesn't have a background working in ML/AI
- He never defines what "serious application" means, but apparently systems that are designed to be tolerant of missed predictions are not "serious".
He seems to want to trust a system at the same level that he trusts a theorem proved with formal methods, etc.
I think the frustrating part of this article is that from a security perspective, he's probably right about his recommendations, but he seems off-base in the analysis that gets him there.
> Current AI systems have no internal structure that relates meaningfully to their functionality. They cannot be developed, or reused, as components.
Obviously AI systems do have internal structure, and there are re-usable components both at the system level (e.g. we pick an embedding, we populate some vector DB with contents using that embedding, and create a retrieval system that can be used in multiple ways). The architecture of models themselves also has components which are reused, and we make choices about when to keep them frozen versus when to retrain them. Any look at architecture diagrams in ML papers shows one level of these components.
> exponential increases of training data and effort will give us modest increases in impressive plausibility but no foundational increase in reliability.
I think really the problem is that we're fixated on mostly-solving an ever broader set of problems rather than solving the existing problems more reliably. There's plenty of results about ensembling and learning theory that give us a direction to increase increase reliability (by paying for more models of the same size), but we seem far more interested in seeing if we can most of the time solve problems at a higher level of sophistication. That's a choice that we're making. Similarly Boiten mentions the possibility of models with explicit confidences -- and there's been plenty of work on that but b/c there's a tradeoff with model size (i.e. do you want to spend your resources on a bigger model, or on explicitly representing variance around a smaller set number of parameters?) but people seem mostly uninterested.
I think there are real reasons to be concerned about the specific path we're on, but these aren't the good ones.
Author here. Serious systems undefined, fine - I'd view them as systems that take decisions, not just make predictions (see final quoted line!).
"Want to trust a system at the same level as ... a theorem" - straw man, explicitly denied in the article. Components .previously used and tested in diverse ways, testing coverage of some sort, would already be really good. Yes AI systems have architectural components such as embeddings, but they relate to types of functionalities and applications, not to functionalities.
The point about solving more problems at a low reliability level rather than solving problems at a higher reliability level is highly interesting though!
"Some" reason is Santa Claus, right? Unlike the eniac, energy consumption only goes up, exponentially. Climate change is no joke, people are dying so that you can churn out more fake pics and SEO spam.
I’m surprised this article merits 700+ comments. Why y’all engage with such drivel?
It’s well established that disruptive technologies don’t appear to have any serious applications, at first. But they get better and better, and eventually they take over.
PG talks about how new technologies seem like toys at first, the whole Innovator Dilemma is about this…so well established within this community.
Just ignore it and figure out where the puck is moving toward.
I am a simple man. In 2022 I glanced trough Attention is all you need and forgot about it. A lot of people made money. A lot of people believed that the end of programmers and designers is absolute. Some people on the stage announced The death of coding. Others bravely explored the future in which people are not needed for creative work.
Aside of the anger that this public stupidity produced in me, I always knew that this day will come.
Maybe next time someone will have the balls not to call a text-generator's with inherent hallucination Intelligence? Who knows. Miracles can happen.:)
To push something to the limit requires a lot of funding if public never got overexcited about some tech many really cool things would have never being tried. Also LLM are pretty useful even as is. It sure made me more productive.
I just imagine the world in which the industry defined by deterministic nature and facts has the bravery to call spade a spade. LLM's have a function. Machine learning also. But calling LLM's Intelligence and pushing the hype to overdrive?
the launch of ChatGPT had an amount of hype that was downright confusing for someone who had previously downloaded and fine tuned GPT2. Everyone who hadn't used a language model said it was revolutionary but it was obviously evolutionary
and I'm not sure the progress is linear, it might be logarithmic.
genAI in its current state has some uses.. but I fear that mostly ChatGPT is hallucinating false information of all kinds into the minds of uninformed people who think GPT is actually intelligence.
Everyone who actually works on this stuff, and didn't have ulterior motives in hyping it up to (over)sell it, have been identifying themselves as such and providing context for the hype since the beginning.
The furthest they got before the hype machine took over was introducing the term "stochastic parrot" to popular discourse.
If you mean "exactly as architected currently", then yes, current Transformer-based generative models can't possibly be anything other than a dead end. The architecture will need to change at least a little bit, to continue to make progress.
---
1. No matter how smart they get, current models are "only" pre-trained. No amount of "in-context learning" can allow the model to manipulate the shape and connectivity of the latent state-space burned into the model through training.
What is "in-context learning", if not real learning? It's the application of pre-learned general and domain-specific problem-solving principles to novel problems. "Fluid intelligence", you might call it. The context that "teaches" a model to solve a specific problem, is just 1. reminding the model that it has certain general skills; and then 2. telling the model to try applying those skills to solving this specific problem (which it wouldn't otherwise think to do, as it likely hasn't seen an example of anyone doing that in training.)
Consider that a top-level competitive gamer, who mostly "got good" playing one game, will likely nevertheless become nearly top-level in any new game they pick up in the same genre. How? Because many of the skills they picked up while playing their favored game, weren't just applicable to that game, but were instead general strategic skills transferrable to other games. This is their "fluid intelligence."
Both a human gamer and a Transformer model derive these abstract strategic insights at training time, and can then apply them across a wide domain of problems.
However, the human gamer can do something that a Transformer model fundamentally cannot do. If you introduce the human to a game that they mostly understand, but which is in a novel genre where playing the game requires one key insight the human has never encountered... then you will expect that the human will learn that insight during play. They'll see the evidence of it, and they'll derive it, and start using it. They will build entirely-novel mental infrastructure at inference time.
A feed-forward network cannot do this.
If there are strategic insights that aren't found in the model's training dataset, then those strategic insights just plain won't be available at inference time. Nothing the model sees in the context can allow it to conjure a novel piece of mental infrastructure from the ether to then apply to the problem.
Whether general or specific, the model can still only use the tools it has at inference time — it can't develop new ones just-in-time. It can't "have an epiphany" and crystallize a new insight from presented evidence. It's not doing the thing that allows that to happen at inference time — with that process instead exclusively occurring (currently) at training time.
And this is very limiting, as far as we want models to do anything domain-specific without having billion-interaction corpuses to feed them on those domains. We want models to work like people, training-wise: to "learn on the job."
We've had simpler models that do this for decades now: spam filters are trained online, for example.
I would expect that, in the medium term, we'll likely move somewhat away from pure feed-forward models, toward models with real online just-in-time training capabilities. We'll see inference frameworks and Inference-as-a-Service platforms that provide individual customers with "runtime-observed in-domain residual-error optimization adapters" (note: these would not be low-rank adapters!) for their deployment, with those adapters continuously being trained from their systems as an "in the small" version of the async "queue, fan-in, fine-tune" process seen in Inf-aaS-platform RLHF training.
And in the long term, we should expect this to become part of the model architecture itself — with mutable models that diverge from a generic pre-trained starting point through connection weights that are durably mutable at inference time (i.e. presented to the model as virtual latent-space embedding-vector slots to be written to), being recorded into a sparse overlay layer that is gathered from (or GPU-TLB-page-tree Copy-on-Write'ed to) during further inference.
---
2. There is a kind of "expressivity limit" that comes from generative Transformer models having to work iteratively and "with amnesia", against a context window comprised of tokens in the observed space.
Pure feed-forward networks generally (as all Transformer models are) only seem as intelligent as they are, because, outside of the model itself, we're breaking down the problem it has to solve from "generate an image" or "generate a paragraph" to instead be "generate a single convolution transform for a canvas" or "generate the next word in the sentence", and then looping the model over and over on solving that one-step problem with its own previous output as the input.
Now, this approach — using a pure feed-forward model (i.e. one that has constant-bounded processing time per output token, with no ability to "think longer" about anything), and feeding it the entire context (input + output-so-far) on each step, then having it infer one new "next" token at a time rather than entire output sequences at a time — isn't fundamentally limiting.
After all, models could just amortize any kind of superlinear-in-compute-time processing, across the inference of several tokens. (And if this was how we architected our models, then we'd expect them to behave a lot like humans: they'd be be "gradually thinking the problem through" while saying something — and then would sometimes stop themselves mid-sentence, and walk back what they said, because their asynchronous long-thinking process arrived at a conclusion, that invalidated previous outputs of their surface-level predict-the-next-word process.)
There's nothing that says that a pure feed-forward model needs to be stateless between steps. "Feed-forward" just means that, unlike in a Recurrent Neural Network, there's no step where data is passed "upstream" to be processed again by nodes of the network that have already done work. Each vertex of a feed-forward network is only visited (at most) once per inference step.
But there's nothing stopping you from designing a feed-forward network that, say, keeps an additional embedding vector between each latent layer, that isn't overwritten or dropped between layer activations, but instead persists outside the inference step, getting reused by the same layer in the next inference step, where the outputs of layer N-1 from inference-step T-1 are combined with the outputs of layer N-1 from inference-step T to form (part of) the input to layer N at inference-step T. (To have a model learn to do something with this "tool", you just need to ensure its training is measuring predictive error over multi-token sequences generated using this multi-step working-memory persistence.)
...but we aren't currently allowing models to do that. Models currently "have amnesia" between steps. In order to do any kind of asynchronous multi-step thinking, everything they know about "what they're currently thinking about" has to somehow be encoded — compressed — into the observed-space sequence, so that it can be recovered and reverse-engineered into latent context on the next step. And that compression is very lossy.
And this is why ChatGPT isn't automatically a better WolframAlpha. It can tell you how all the "mental algorithms" involved in higher-level maths work — and it can try to follow them itself — but it has nowhere to keep the large amount of "deep" [i.e. latent-space-level] working-memory context required to "carry forward" these multi-step processes between inference steps.
You can get a model (e.g. o1) to limp along by dedicating much of the context to "showing its work" in incredibly-minute detail — essentially trying to force serialization of the most "surprising" output in the latent layers as the predicted token — but this fights against the model's nature, especially as the model still needs to dedicate many of the feed-forward layers to deciding how to encode the chosen "surprising" embedding into the same observed-space vocabulary used to communicate the final output product to the user.
Given even linear context-window-size costs, the cost of this approach to working-memory serialization is superlinear vs achieved intelligence. It's untenable as a long-term strategy.
Obviously, my prediction here is that we'll build models with real inference-framework-level working memory.
---
At that point, if you're adding mutable weights and working memory, why not just admit defeat with Transformer architecture and go back to RNNs?
Predictability, mostly.
The "constant-bounded compute per output token" property of Transformer models, is the key guarantee that has enabled "AI" to be a commercial product right now, rather than a toy in a lab. Any further advancements must preserve that guarantee.
Write-once-per-layer long-term-durable mutable weights preserve that guarantee. Write-once-per-layer retained-between-inference-steps session memory cells preserve that guarantee. But anything with real recurrence, does not preserve that guarantee. Allowing recurrence in a neural network, is like allowing backward-branching jumps in a CPU program: it moves you from the domain of guaranteed-to-halt co-programs to the domain of unbounded Turing-machine software.
If you expect the AI to do independent work, yes, it is a dead end.
These LLM AIs need to be treated and handled as what they are: idiot savants with vast and unreliable intelligence.
What does any advanced organization do when they hire a new PhD, let them loose in the company or pair them with experienced staff? When paired with experienced staff, they use the new person for their knowledge but do not let them change things on their own until much later, when confidence is established and the new staffer has been exposed to how things work "around here".
The big difference with LLM AIs is they never graduate to an experienced staffer, they are always the idiot savant that is really dang smart but also clueless and needs to be observed. That means the path forward with this current state of LLM AIs is to pair them with people, personalized to their needs, and treat them as very smart idiot savants great for strategy and problem solving discussion, where the human users are driving the situation, using the LLM AIs like a smart assistant that requires validation - just like a real new hire.
There is an interactive state that can be achieved with these LLM AIs, like being in a conversation with experts, where they advise, they augment and amplify individual persons. A group of individuals adept with use of such an idiot savant enhanced environment would be incredibly capable. They'd be a force unseen in human civilization before today.
Criticisms like this are levied against an excessively narrow (obsolete?) characterisation of what is happening in the AI space currently.
After reading about o3's performance on ARC-AGI, I strongly suspect people will not be so flippantly dismissive of the inherent limits of these technologies by this time next year. I'm genuinely surprised at how myopic HN commentary is on this topic in general. Maybe because the implications are almost unthinkably profound.
Anyway, OpenAI, Anthropic, Meta, and everyone else are well aware of these types of criticisms, and are making significant, measurable progress towards architecturally solving the deficiencies.
> I strongly suspect people will not be so flippantly dismissive of the inherent limits of these technologies by this time next year.
People are flippantly dismissive of the inherent limits because there ARE inherent limitations of the technology.
> Maybe because the implications are almost unthinkably profound.
Maybe because the stuff you're pointing to are just benchmarks and the definitions around things like AGI are flawed (and the goalposts are constantly moving, just like the definition of autonomous driving). I use LLMs roughly 20-30x a day - they're an absolutely wonderful tool and work like magic, but they are flawed for some very fundamental reasons.
Humans are not machines , they have both rights that machines do not have and also responsibilities and consequences that machines will not have, for example bad driving will cost you money, injury , prison time or even death.
Therefore AI has to be much better than humans at the task to be considered ready to be a replacement.
——
Today robot taxis can only work in fair weather conditions in locations that are planned cities. No autonomous driving system can drive in Nigeria or India or even many european cities that were never designed for cars any time soon .
Working in very specific scenarios is useful , but hardly measure of their intelligence or candidate for replacing humans for the task
I hear people say this kind of thing but it confuses me.
1. What does inherit limitations mean?
2. How do we know something is an inherit limitation
3. Is it a problem if arguments for a particular inherit limitation also apply to humans?
From what I’ve seen people will often say things like AI can’t be creative because it’s just a statistical machine, but humans are also “just” statistical machines. People might mean something like humans are more grounded because humans react not just to how the world already works but how the world reacts to actions they take, but this difference misunderstands how LLMs are trained. Like humans LLMs get most of their training from observing the world, but LLMs are also trained with re-enforcement learning and this will surely be an active area of research.
One of many, but this is a simple one - LLMs are only limited to knowledge that is publicly available on the internet. This is "inherit" because thats how LLMs are essentially taught the information they retrieve today.
> The question of whether a computer can think is no more interesting than the question of whether a submarine can swim. ~ Edsger W. Dijkstra
LLMs / Generative Models can have a profound societal and economic impact without being intelligent. The obsession with intelligence only make their use haphazard and dangerous.
It is a good thing court of laws have established precedent that organizations deploying LLM chatbots are responsible for their output (Eg, Air Canada LLM chatbot promising a non-existent discount being responsibility of Air Canada)
Also most automation has been happening without LLMs/Generative Models. Things like better vision systems have had an enormous impact with industrial automation and QA.
The conclusion of the article admits that in areas where stochastic outputs are expected these AI models will continue to be useful.
It’s in area where we demand correctness and determinism that they will not be suitable.
I think the thrust of this article is hard to see unless you have some experience with formal methods and verification. Or else accept the authors’ explanations as truth.
But o3 is just a slightly less stupid idiot savant...it still has to brute force solutions. Don't get me wrong, it's cool to see how far that technique can get you on a specific benchmark.
But the point still stands that these systems can't be treated as deterministic (i.e. reliable or trustworthy) for the purposes of carrying out tasks that you can't allow "brute forced attempts" for (e.g. anything where the desired outcome is a positive subjective experience for a human).
A new architecture is going to be needed that actually does something closer to our inherently heuristic based learning and reasoning. We'll still have the stochastic problem but we'll be moving further away from the idiot savant problem.
All of this being said, I think there's plenty of usefulness with current LLMs. We're just expecting the wrong things from them and therefore creating suboptimal solutions. (Not everyone is, but the most common solutions are, IMO.)
The best solutions need to be rethinking how we typically use software since software has been hinged upon being able to expect (and therefore test) dertiministic outputs from a limited set of user inputs.
I work for an AI company that's been around for a minute (make our own models and everything). I think we're both in an AI hype bubble while simultaneously underestimating the benefits of current AI capabilities. I think the most interesting and potentially useful solutions are inherently going to be so domain specific that we're all still too new at realizing we need to reimagine how to build with this new tech in mind. It reminds me of the beginning of mobile apps. It took awhile for most us to "get it".
> After reading about o3's performance on ARC-AGI, I strongly suspect people will not be so flippantly dismissive of the inherent limits of these technologies by this time next year.
If I wasn't so slammed with work I have half a mind to go dredge up at least a dozen posts that said the same thing last year, and the year before. Even OpenAI has been moving the goalposts here.
Nah, the trick with o3 solving IQ tests seems to be that they bruteforce solutions and then pick the best option. That's why calls that are trivial for humans end up costing a lot.
It still can't think and it won't think.
LANGUAGE models (keyword: language) is a language model, it should be paired with a reasoning engine to translate the inner thought of the machine into human language. It should not be the source of decisions because it sucks at doing so, even though the network can exhibit some intelligence.
We will never have AGI with just a language model. That said, most jobs people do are still at risk, even with chatgpt-3.5 (especially outside of knowledge work, where difficult decisions need to be taken). So we'll see the problems with AGI and the job market way earlier than AGI, as soon as we apply robotics and vision models + chatgpt 3.5 level intelligence. Goodbye baristas, goodbye people working in factories.
Let's start working on a reasoning engine so we can replace those pesky knowledge workers too.
We’ve had coffee machines that can make a perfect coffee with a touch of a button for at least a decade. How does GPT3.5 remove baristas given they could have already been removed?
Reading the o1 announcement you could have been saying the same thing a year ago yet it's worse than Claude in practice and if it was all that's available - I wouldn't even use it if it was free - it's that bad.
If OpenAI has demonstrated one thing is that they are a hype production machine and they are probably getting ready for next round of investment. I wouldn't be surprised if this model was equally useless as o1 when you factor in performance and price.
At this point they are completely untrustworthy and untill something lands publicly for me to test it's safe to ignore their PR as complete BS.
For most tasks - but not all. I normally paste my prompt in both and while Claude is generally superior in most aspects, there are tasks at which o1 performed slightly better.
It doesn’t really matter. “It works and is cost/resource-effective at being an AGI” is a fundamentally uninteresting proposition because we’re done at that point. It’s like debating how we’re going to deal with the demise of our star; we won’t, because we can’t.
”If intelligence lies in the process of acquiring new skills, there is no task X that solving X proves intelligence”
IMO it especially applies to things like solving a new IQ puzzle, especially when the model is pretrained for that particular task type, like was done with ARC-AGI.
For sure, it’s very good research to figure out what kind of tasks are easy for humans and difficult for ML, and then solve them. The jump in accuracy was surprising. But still in practice the models are unbeliavably stupid and lacking in common sense.
My personal (moving) goalpost for ”AGI” is now set to whether a robot can keep my house clean automatically. Its not general intelligence if it can’t do the dishes. And before physical robots, being less of a turd at making working code would be a nice start. I’m not yet convinced general purpose LLMs will lead to cost-effective solutions to either vs humans. A specifically built dish washer however…
You remember when Google was scared to release LLMs? You remember that Googler that got fired because he thought the LLM was sentient?
There is likely a couple of surprised still left in LLMs but no one should think that any present technology in its current state or architecture will get us to AGI or anything that remotely resembles it.
> Maybe because the implications are almost unthinkably profound.
laundering stolen IP from actual human artists and researchers, extinguishing jobs, deflecting responsibility for disasters. yeah, I can't wait for these "profound implications" to come to fruition!
It's even worse. AI is a really smart but inexperienced person who also lies frequently. Because AI is not accountable to anything, it'll always come up with a reasonable answer to any question, if it is correct or not.
To put it in other words: it is not clear when and how they hallucinate. With a person, their competence could be understood and also their limits. But a llm can happily give different answers based on trivial changes in the question with no warning.
In a conversation (conversation and attached pictures at https://bsky.app/profile/liotier.bsky.social/post/3ldxvutf76...), I delete a spurious "de" ("Produce de two-dimensional chart [..]" to "Produce two-dimensional [..]") and ChatGPT generates a new version of the graph, illustrating a different function although nothing else has changed and there was a whole conversation to suggest that ChatGPT held a firm model of the problem. Confirmed my current doctrine: use LLM to give me concepts from a huge messy corpus, then check those against sources from said corpus.
LLM's are non-deterministic: they'll happily give different answers to the same prompt based on nothing at all. This is actually great if you want to use them for "creative" content generation tasks, which is IMHO what they're best at. (Along with processing of natural language input.)
Expecting them to do non-trivial amounts of technical or mathematical reasoning, or even something as simple as code generation (other than "translate these complex natural-language requirements into a first sketch of viable computer code") is a total dead end; these will always be language systems first and foremost.
This confuses me. You have your model, you have your tokens.
If the tokens are bit-for-bit-identical, where does the non-determinism come in?
If the tokens are only roughly-the-same-thing-to-a-human, sure I guess, but convergence on roughly the same output for roughly the same input should be inherently a goal of LLM development.
Most any LLM has a "temperature" setting, a set of randomness added to the otherwise fixed weights to intentionally cause exactly this nondeterministic behavior. Good for creative tasks, bad for repeatability. If you're running one of the open models, set the temperature down to 0 and it suddenly becomes perfectly consistent.
The model outputs probabilities, which you have to sample randomly. Choosing the "highest" probability every time leads to poor results in practice, such as the model tending to repeat itself. It's a sort of Monte-Carlo approach.
The trained model is just a bunch of statistics. To use those statistics to generate text you need to "sample" from the model. If you always sampled by taking the model's #1 token prediction that would be deterministic, but more commonly a random top-K or top-p token selection is made, which is where the randomness comes in.
It is technically possible to make it fully deterministic if you have a complete control over the model, quantization and sampling processes. The GP probably meant to say that most commercially available LLM services don't usually give such control.
> If the tokens are bit-for-bit-identical, where does the non-determinism come in?
By design, most LLM’s have a randomization factor to their model. Some use the concept of “temperature” which makes them randomly choose the 2nd or 3rd highest ranked next token, the higher the temperature the more often/lower they pick a non-best next token. OpenAI described this in their papers around the GPT-2 timeframe IIRC.
Computers are deterministic. LLMs run on computers. If you use the same seed for the random number generator you’ll see that it will produce the same output given an input.
There's no need for there to be changes to the question. LLMs have a rng factor built in to the algorithm. It can happily give you the right answer and then the wrong one.
i love how those changes are often just a different seed in the randomness... as just chance.
run some repeated tests with "deeper than surface knowledge" on some niche subjects and got impressed that it gave the right answer... about 20% of the time.
"AI is a really smart but inexperienced person who also lies frequently."
Careful. Here "smart" means "amazing at pattern-matching and incredibly well-read, but has zero understanding of the material."
Sure, we're also pattern matching, but additionally (among other things):
1) We're continually learning so we can update our predictions when our pattern matching is wrong
2) We're autonomous - continually interacting with the environment, and learning how it respond to our interaction
3) We have built in biases such as curiosity and boredom that drive us to experiment, gain new knowledge, and succeed in cases where "pre-training to date" would have failed us
For one, a brain can’t do anything without irreversibly changing itself in the process; our reasoning is not a pure function.
For a person to truly understand something they will have a well-refined (as defined by usefulness and correctness), malleable internal model of a system that can be tested against reality, and they must be aware of the limits of the knowledge this model can provide.
Alone, our language-oriented mental circuits are a thin, faulty conduit to our mental capacities; we make sense of words as they relate to mutable mental models, and not simply in latent concept-space. These models can exist in dedicated but still mutable circuitry such as the cerebellum, or they can exist as webs of association between sense-objects (which can be of the physical senses or of concepts, sense-objects produced by conscious thought).
So if we are pattern-matching, it is not simply of words, or of their meanings in relation to the whole text, or even of their meanings relative to all language ever produced. We translate words into problems, and match problems to models, and then we evaluate these internal models to produce perhaps competing solutions, and then we are challenged with verbalizing these solutions. If we were only reasoning in latent-space, there would be no significant difficulty in this last task.
At the end of the day, we're machines, too. I wrote a piece a few months ago with an intentionally provocative title, questioning whether we're truly on a different cognitive level.
I asked ChatGPT to help out: -----------------------------
"The distinction between AI and humans often comes down to the concept of understanding. You’re right to point out that both humans and AI engage in pattern matching to some extent, but the depth and nature of that process differ significantly."
"AI, like the model you're chatting with, is highly skilled at recognizing patterns in data, generating text, and predicting what comes next in a sequence based on the data it has seen. However, AI lacks a true understanding of the content it processes. Its "knowledge" is a result of statistical relationships between words, phrases, and concepts, not an awareness of their meaning or context"
Yeah, it's just the fact that you pasted in an AI answer, regardless of how on point it is. I don't think people want this site to turn into an AI chat session.
I didn't downvote, I'm just saying why I think you were downvoted.
That's reasonable. I cut back the text. On the other hand I'm hoping downvoters have read enough to see that the AI-generated comment (and your response) are completely on-topic in this thread.
I use llms as tools to learn about things I don't know and it works quite well in that domain.
But so far I haven't found that it helps advance my understanding of topics I'm an expert in.
I'm sure this will improve over time. But for now, I like that there are forums like HN where I may stumble upon an actual expert saying something insightful.
I think that the value of such forums will be diminished once they get flooded with AI generated texts.
Of course the AI's comment was not insightful. How could it be? It's autocomplete.
That was the point. If you back up to the comment I was responding to, you can see the claim was: "maybe people are doing the same thing LLMs are doing". Yet, for whatever reason, many users seemed to be able to pick out the LLM comment pretty easily. If I were to guess, I might say those users did not find the LLM output to be human-quality.
That was exactly the topic under discussion. Some folks seem to have expressed their agreement by downvoting. Ok.
I think human brains are a combination of many things. Some part of what we do looks quite a lot like an autocomplete from our previous knowledge.
Other parts of what we do looks more as a search through the space of possibilities.
And then we act and collaborate and test the ideas that stand against scrutiny.
All of that is in principle doable by machines. The things we currently have and we call LLMs seem to currently mostly address the autocomplete part although they begin to be augmented with various extensions that allow them to take baby steps in other fronts. Will they still be called large language models once they will have so many other mechanisms beyond the mere token prediction?
We don't care what LLMs have to say, whether you cut back some of it or not it's a low effort wasted of space on the page.
This is a forum for humans.
You regurgitating something you had no contribution in producing, which we can prompt for ourselves, provides no value here, we can all spam LLM slop in the replies if we wanted, but that would make this site worthless.
It's actually even worse than that: the current trend of AI is transformer-based deep learning models that use self-attention mechanisms to generate token probabilities, predicting sequences based on training data.
If only it was something which we could ontologically map onto existing categories like servants or liars...
Don't count it out yet as being problematic for software engineering, bu not in the way you probably intend with your comment.
Where I see software companies using it most is as a replacement for interns and junior devs. That replacement means we're not training up the next generation to be the senior or expert engineers with real world experience. The industry will feel that badly at some point unless it gets turned around.
It’s also already becoming an issue for open-source projects that are being flooded with low-quality (= anything from “correct but pointless” to “actually introduces functional issues that weren’t there before”) LLM-generated PRs and even security reports —- for examples see Daniel Stenberg’s recent writing on this.
Agree. I think we are already seeing a hollowing out effect on tech hiring at the lower end. They’ve always been squeezed a bit, but it seems much worse now.
Hallucinations can be mostly eliminated with RAG and tools. I use NotebookLM all of the time to research through our internal artifacts, it includes citations/references from your documents.
Even with ChatGPT you can ask it to find web citations and if it uses the Python runtime to find answers, you can look at the code.
And to prevent the typical responses - my company uses GSuite so Google already has our IP, NotebookLM is specifically approved by my company and no Google doesn’t train on your documents
Facts can be checked with RAG, but the real value of AI isn't as a search replacement, but for reasoning/problem-solving where the answer isn't out there.
How do you, in general, fact check a chain of reasoning?
I can’t tell a search engine to summarize text for a technical audience and then another summary for a non technical audience.
I recently came into the middle of a cloud consulting project where a lot of artifacts, transcripts of discovery sessions, requirement docs, etc had already been created.
I asked NotebookLM all of the questions I would have asked a customer at the beginning of a project.
What it couldn’t answer, I then went back and asked the customer.
I was even able to get it to create a project plan with work streams and epics. Yes it wouldn’t have been effective if I didn’t already know project management, AWS and two decades+ of development experience.
Despite what people think, LLMs can also do a pretty good job at coding when well trained on the APIs. Fortunately, ChatGPT is well trained on the AWS CLI, SDKs in various languages and you can ask it to verify the SDK functions on the web.
I’ve been deep into AWS based development since LLMs have been a thing. My opinion may change if I get back into more traditional development
> I can’t tell a search engine to summarize text for a technical audience and then another summary for a non technical audience.
No, but, as amazing as that is, don't put too much trust in those summaries!
It's not summarizing based on grokking the key points of the text, but rather based on text vs summary examples found in the training set. The summary may pass a surface level comparison to the source material, while failing to capture/emphasize the key points that would come from having actually understood it.
I write the original content or I was in the meeting where I’m giving it the transcript. I know what points I need to get across to both audiences.
Just like I’m not randomly depending on it to do an Amazon style PRFAQ (I was indoctrinated as an Amazon employee for 3.5 years), create a project plan, etc, without being a subject matter expert in the areas. It’s a tool for an experienced writer, halfway decent project manager, AWS cloud application architect and developer.
If I had a senior member of the team that was incredibly knowledgeable but occasionally lied, but in a predictable way, I would still find that valuable. Talking to people is a very quick and easy way to get information about a specific subject in a specific context, so I could ask them targetted questions that are easy to verify, the worst thing that happens is I 'waste' a conversation with them.
Sure, but LLMs don't lie in a predictable way. Its just their nature that they output statistical sentence continuations, with a complete disregard for the truth. Everything that they output is suspect, especially the potentially useful stuff that you don't know whether it's true or false.
They do lie in a predictable way: if you ask them for a widely available fact you have a very high probability of getting the correct answer, if you ask them for something novel you have a very high probabilty of getting something made up.
If I'm trying to use some tool that just got released or just got a big update, I wont use AI, if I want to check the syntax of a for loop in a language I don't know I will. Whenever you ask it a question you should have an idea in your mind of how likely you are to get a good answer back.
I suppose, but they can still be wrong on the common facts like number of R's in strawberry that are counter-intuitive.
I saw an interesting example yesterday of type "I have 3 apples, my dad has 2 more than me ..." where of the top 10 predicted tokens, about 1/2 led to the correct answer, and about 1/2 didn't. It wasn't the most confident predictions that lead to the right answer - pretty much random.
The trouble with LLMs vs humans is that humans learn to predict facts (as reflected in feedback from the environment, and checked by experimentation, etc), whereas LLMs only learn to predict sentence soup (training set) word statistics. It's amazing that LLM outputs are coherent as often as they are, but entirely unsurprising that they are often just "sounds good" flow-based BS.
I think maybe this is where the polarisation of those who find chatGPT useful and those who don't comes from. In this context, the number of r's in strawberry is not a fact: its a calculation. I would expect AI to be able to spell a common word 100% of the time, but not to be able to count letters. I don't think in the summary of human knowledge that has been digitised there are that many people saying 'how many r's are there in strawberry', and if they are I think that the common reply would be '2', since the context is based on the second r. (people confuse strawbery and strawberry, not strrawberry and strawberry).
Your apples question is the same, its not knowledge, it's a calculation, it's intelligence. The only time you're going to get intelligence from AI at the moment is to ask a question that a significantly large number of people have already answered.
True, but that just goes to show how brittle these models are - how shallow the dividing line is between primary facts present (hopefully consistently so) in the training set, and derived facts that are potentially more suspect.
To make things worse, I don't think we can even assume that primary facts are always going to be represented in abstract semantic terms independent of source text. The model may have been trained on a fact but still fail to reliably recall/predict it because of "lookup failure" (model fails to reduce query text to necessary abstract lookup key).
Lying means stating things as facts despite knowing or believing that they are false. I don’t think this accurately characterizes LLMs. It’s more like a fever dream where you might fabulate stuff that appears plausibly factual in your dream world.
That sounds mostly like an incentives problem. If OpenAI, Anthropic, etc decide their LLMs need to be accurate they will find some way of better catching hallucinations. It probably will end up (already is?) being yet another LLM acting as a control structure trying to fact check responses before they are sent to users though, so who knows if it will work well.
Right noe there's no incentive though. People keep paying good money to use these tools despite their hallucinations, aka lies/gas lighting/fake information. As long as users don't stop paying and LLM companies don't have business pressure to lean on accuracy as a market differentiator, no one is going to bother fixing it.
Believe me, if they could use another LLM to audit an LLM, they would have done that already.
It's inherit to transformers that they predict the next most likely token, its not possible to change that behavior without making them useless at generalizing tasks (overfitting).
LLMs run on statistics, not logic. There is no fact checking, period. There is just the next most likely token based on the context provided.
Yes, most people who disagree with this have no clear understanding of how a LLM works. It is just a prediction mechanism for the next token. The implementation is very fancy and takes a lot of training, but it's not doing anything more than next token prediction. That's why it is incapable of doing any reasoning.
Yeah its an interesting question, and I'm a little surprised I got down voted here.
I wouldn't expect them to add an additional LLM layer unless hallucinations from the underlying LLM aren't acceptable, and in this case that means it is unacceptable enough to cost them users and money.
Adding a check/audit layer, even if it would work, is expensive both financially and computationally. I'm not sold that it would actually work, but I just don't think they've had enough reason to really give it a solid effort yet either.
Edit: as far as fact checking, I'm not sure why it would be impossible. An LLM wouldn't likely be able to run a check against a pre-trained model of "truth," but that isn't the only option. An LLM should be able to mimic what a human would do, interpret the response and search a live dataset of sources considered believable. Throw a budget of resources at processing the search results and have the LLM decide if the original response isn't backed up, or contradicts the source entirely.
The problem is it's still a computer. And that's okay.
I can ask the computer "hey I know this thing exists in your training data, tell me what it is and cite your sources." This is awesome. Seriously.
But what that means is you can ask it for sample code, or to answer a legal question, but fundamentally you're getting a search engine reading something back to you. It is not a programmer and it is not a lawyer.
The hype train really wants to exaggerate this to "we're going to steal all the jobs" because that makes the stock price go up.
They would be far less excited about that if they read a little history.
It won't steal them all, but it will have a major impact by stealing the lower level jobs which are more routine in nature -- but the problem is that those lower level jobs are necessary to gain the experience needed to get to the higher level jobs.
It also won't eliminate jobs completely, but it will greatly reduce the number of people needed for a particular job. So the impact that it will have on certain trades -- translators, paralegals, journalists, etc. -- is significant.
I find it fascinating that I can achieve about 85-90% of what I need for simple coding projects in my homelab using AI. These projects often involve tasks like scraping data from the web and automating form submissions.
My workflow typically starts with asking ChatGPT to analyze a webpage where I need to authenticate. I guide it to identify the username and password fields, and it accurately detects the credential inputs. I then inform it about the presence of a session cookie that maintains login persistence. Next, I show it an example page with links—often paginated with numbered navigation at the bottom—and ask it to recognize the pattern for traversing pages. It does so effectively.
I further highlight the layout pattern of the content, such as magnet links or other relevant data presented by the CMS. From there, I instruct it to generate a Python script that spiders through each page sequentially, navigates to every item on those pages, and pushes magnet links directly into Transmission. I can also specify filters, such as only targeting items with specific media content, by providing a sample page for the AI to analyze before generating the script.
This process demonstrates how effortlessly AI enables coding without requiring prior knowledge of libraries like beautifulsoup4 or transmission_rpc. It not only builds the algorithm but also allows for rapid iteration. Through this exercise, I assume the role of a manager, focusing solely on explaining my requirements to the AI and conducting a code review.
The thing that makes the smarter search use case interesting is how LLMs are doing their search result calculations: dynamically and at metadata scales previously impossible.
LLM-as-search is essentially the hand-tuned expert systems AI vs deep learning AI battle all over again.
Between natural language understanding and multiple correlations, it's going to scale a lot further than previous search approaches.
After using them for a long time I am convinced they have no true intelligence beyond what is latent in training data. In other words I think we are kind of fooling ourselves.
That being said they are very useful. I mostly use them as a far superior alternative to web search and as a kind of junior research assistant. Anything they find must be checked of course.
I think we have invented the sci-fi trope of the AI librarian of the galactic archive. It can’t solve problems but it can rifle through the totality of human knowledge and rapidly find things.
And a plagiarism machine. It's like a high school student that thinks they can change a couple of words, make sure it's grammatically correct and it's not plagiarism because it's not an exact quote....either that or it just completely makes it up. I think LLMs will be revolutionary but just not in the way people think. It may be similar to the Gutenberg press. Before the printing press words were precious and closely held resources. The Gutenberg press made words cheap and abundant. Not everyone thought it was a good thing at the time but it changed everything.
It seems to me predicting things in general is a pretty good way to bootstrap intelligence. If you are competing for resources, predicting how to avoid danger and catch food is about the most basic way to reinforce good behavior.
This would fall down into a semantic debate over what is meant by intelligence.
There is a well known phenomenon known as the AI effect: when something works we start calling it something else, not AI. Heuristics and complex reasoning trees were once called AI. Fuzzy logic with control systems was once called AI. Clustering was once called AI. And so on…
This certainly has one root in human or carbon-based life cheuvanism but I think there’s something essential happening too. With each innovation we see its limits and it causes us to go back and realize that what we colloquially call intelligence was more than we thought it was.
Intelligence predicts, but is prediction intelligence?
Again, here by intelligence I mean what complex living organisms and humans do.
I still believe there are things going on here not modeled by any CS system and not well understood. Not magic, just not solved yet. We are reverse engineering billions of years of evolution folks. We won’t figure it all out in a few decades.
Demonstrably, humans do think, and arguably demonstrably, early life would go down a path of simple predictions (in the form of stimulus -> response). And demonstrably, evolution did lead to human level intelligence.
So I don’t think there needs to be a semantic debate over where in the process intelligence started. The early responses to stimulus is a form of prediction, but not one that requires thinking.
There can be much disagreement that prediction is at the core of intelligence, or if optimizing ability to predict leads to intelligence. But from the established facts, it is the case the higher forms of life were bootstrapped from the lower ones, and also our biochemistry does have reward functions. Successfully triggering those rewards will generally hinge on making successful predictions. Take from that what you will.
Prediction is a huge part of what intelligence does. I was questioning “prediction maximalism.”
Intelligence is also very good at pattern recognition. Did people once argue for pattern recognition maximalism?
Biological (including human) intelligence is clearly multi-modal and I strongly believe there are aspects that are barely understood if at all.
The history of CS and AI is a history of us learning how to make machines that are unbelievably good at some useful but strictly bounded subset of what intelligence can do: logic, math, pattern recognition, and now prediction.
I think we may still be far from general intelligence and I’m not even sure we can define the problem.
I've convinced myself I'm a multi-millionaire, but all other evidence easily contradicts that. Some people put a bit too much into the "putting it out there" and "making your own reality"
I mean, it’s known that there’s no intelligence if you simply look at how it works on a technical level - it’s a prediction of the next token. That wasn’t really ever in question as to whether they have “intelligence”
To you & I that's true. But especially for the masses that's not true. It seems like at least once to day that I either talk to someone or hear someone via tv/radio/etc who does not understand this.
An example that amused me recently was a radio talk show host who had a long segment describing how he & a colleague had a long argument with ChatGPT to correct a factual inaccuracy about their radio show. And that they finally convinced ChatGPT that they were correct due to their careful use of evidence & reasoning. And the part they were most happy about was how it had now learned, and going forward ChatGPT would not spread these inaccuracies.
That anecdote is how the public at large sees these tools.
Ironically if you explain to those talk show hosts how they are wrong about how ChatGPT learns (or doesn't learn) and use all the right arguments and proofs so that they finally concede, chances are that they too won't quite learn from that and keep repeating their previous bias next time.
To people who really understand them and are grounded, I think you're right. There has been a lot of hype among people who don't understand them as much, a lot of hype among the public, and a lot of schlock about "superintelligence" and "hard takeoff" etc. among smart but un-grounded people.
The latter kind of fear mongering hype has been exploited by companies like ClosedAI in a bid for regulatory capture.
A little humility would do us good regardless, because we don't know what intelligence is and what consciousness is, we can't truly define it nor do we understand what makes humans conscious and sentient/sapient.
Categorically ruling out intelligence because "it's just a token predictor" puts us at the opposite of the spectrum, and that's not necessarily a better place to be.
EDITED My ASCI art pyramid did not work. So imagine a pyramid with DATA at the bottom, INFORMATION on top of the data, and KNOWLEDGE sitting on top of the INFORMATION, with WISDOM at the top.
And then trying top guess where AI is? Some people say that Information is the knowing, what, knowledge the how, and Wisdom the why.
In general conversation, “intelligence”, “knowledge”, “smartness”, “expertise”, etc are used mostly interchangeably.
If we want to get pedantic, I would point out that “knowledge” is formally defined as “justified true belief”, and I doubt we want to get into the quagmire of whether LLM’s actually have beliefs.
I took OP’s point in the casual meaning, i.e. that LLMs are like what I would call an “intelligent coworker”, or how one might call a Jeopardy game show contestant as intelligent.
I would say "knowledge" rather than "intelligence"
The key feature of LLMs is the vast amounts of information and data they have access to, and their ability to quickly process and summarize, using well-written prose, that information based on pattern matching.
I was so stupid when GPT3 came out. I knew so little about token prediction, I argued with folks on here that it was capable of so many things that I now understand just aren't compatible with the tech.
Over the past couple of years of educating myself a bit, whilst I am no expert I have been anticipating a dead end. You can throw as much training at these things as you like, but all you'll get is more of the same with diminishing returns. Indeed in some research the quality of responses gets worse as you train it with more data.
I am yet to see anything transformative out of LLMs other than demos which have prompt engineers working night and day to do something impressive with. Those Sora videos took forever to put together, and cost huge amounts of compute. No one is going to make a whole production quality movie with an LLM and disrupt Hollywood.
I agree, an LLM is like an idiot savant, and whilst it's fantastic for everyone to have access to a savant, it doesn't change the world like the internet, or internal combustion engine did.
OpenAI is heading toward some difficult decisions, they either admit their consumer business model is dead and go into competing with Amazon for API business (good luck), become a research lab (give up on being a billion dollar company), or get acquired and move on.
One of the core tenet of technology is that it makes the job less consuming of a person resources (time, strength,…). While I’ve read a lot of claims, I’ve yet to see someone make a proper argument on how LLMs can be such a tool.
> A group of individuals adept with use of such an idiot savant enhanced environment would be incredibly capable. They'd be a force unseen in human civilization before today
More than the people who landed someone on the moon?
They would be capable of landing someone on the moon, if they chose to pursue that goal, and had the finances to do so. And they'd do so with fewer people too.
It would have to be trained in 100% of all potential scenarios. Any scenario that happens for which they're not trained equals certain disaster, unlike a human who can adapt and improvise based on things AI does not have; feelings, emotions, creativity.
You're still operating with the assumption the AI is doing independent work, it is not, it is advising the people doing the work. That is why people are the ones be augmented and enhanced, and not the other way around: people have the capacity to handle unforeseen scenarios, and with AI as a strategy advisor they'll do so with more confidence.
I have witnessed no evidence that would support this claim. The only contribution of LLMs to mathematics is in being useful to Terry Tao: they're not capable of solving novel orbital mechanics problems (except through brute-force search, constrained sufficiently that you could chuck a uniform distribution in and get similar outputs). That's before you get into any of the engineering problems.
https://deepmind.google/discover/blog/funsearch-making-new-d... seems to be a way. The LLM is the creative side, coming up with ideas-and in which a case the “mutation’ caused by hallucinations may be useful. Combined with an evaluation evaluator to protect against the bad outputs.
Pretty close to the idea of human brainstorming and has worked. Could it do orbital math? Maybe not today but the approach seems as feasible as the work Mattingly did for Apollo 13.
You do not have them solving such problems, but you do have them in the conversation as the human experts knowledgeable in that area work to solve the problem. This is not the LLM AIs doing independent work, this is them interactively working with the human person that is capable of solving that problem, it is their career, and the AI just makes them better at it, but not by doing their work, but by advising them as they work.
But they aren't useful for that. Terry Tao uses them to improve his ability to use poorly-documented boilerplatey things like Lean and matplotlib, but receiving advice from them‽ Frankly, if a chatbot is giving you much better advice than a rubber duck, you're either a Jack-of-all-Trades (in which case, I'd recommend better tools) or a https://ploum.net/2024-12-23-julius-en.html Julius (in which case, I'd recommend staying away from anything important).
> With o1, you can kind of do this. I gave it a problem I knew how to solve, and I tried to guide the model. First I gave it a hint, and it ignored the hint and did something else, which didn’t work. When I explained this, it apologized and said, “Okay, I’ll do it your way.” And then it carried out my instructions reasonably well, and then it got stuck again, and I had to correct it again. The model never figured out the most clever steps. It could do all the routine things, but it was very unimaginative.
I agree with his overall vision, but transformer-based chatbots will not be the AI algorithm that supports it. Highly-automated proof assistants like Isabelle's Sledgehammer are closer (and even those are really, really crude, compared to what we could have).
> The big difference with LLM AIs is they never graduate to an experienced staffer, they are always the idiot savant that is really dang smart but also clueless and needs to be observed.
Basically this. They already have vastly better-than-human ability at finding syntax errors within code, which on its own is quite useful; think of how many people have probably dropped out of CS as a major after staying up all night and failing to find a missing semicolon.
… One odd thing I’ve noticed about the people who are very enthusiastic about the use of LLMs in programming is that they appear to be unaware of any _other_ programming tools. Like, this is a solved problem, more or less; code-aware editors have been a thing since the 90s (maybe before?)
> code-aware editors have been a thing since the 90s
These will do things like highlight places where you're trying to call a method that isn't defined on the object, but they don't understand the intent of what you're trying to do. The latter is actually important in terms of being able to point you toward the correct solution.
true.. in the past few days I used my time off to work on my hobby video game - writing the game logic required me to consider problems that, are quite self-contained and domain specific, and probably globally unique (if not particularly complex).
I started out in Cursor, but I quickly realized Claude's erudite knowledge of AWS would not help me here, but what I needed was to refactor the code quickly and often, so that I'd finally find the perfect structure.
For that, IDE tools were much more appropriate than AI wizardry.
Try being a TA to freshmen CS majors; a good 1/3 change majors because they can't handle the syntax strictness coupled with their generally untrained logical mind. They convince themselves it is "too hard" and their buddies over in the business school are having a heck of a lot of fun throwing parties...
Sounds like CS is not for them, and they find something else to do which is more applicable to their skills and interest. This is good. I don't think you should see a high drop out rate from a course as necessarily indicating a problem.
Losing potentially good talent because they don't know how or where to look for mistakes yet is foolhardy. I'm happy for them to throw in the towel if the field is truly not for them, but I would wager that a not-insignificant portion of that crowd would be able to meaningfully progress once they get past the immediate hurdles in front of them.
Giving them an LLM to help with syntax errors, at this stage of the tech, is deeply unhelpful to their development.
The foundation of a computer science education is a rigorous understanding of what the steps of an algorithm mean. If the students don't develop that, then I don't think they're doing computer science anymore.
The use of a LLM in this case is to show them where the problem is so that they can continue on. They can't develop an understanding of the algorithm they're studying if they can't get their program to compile at all.
> Giving them an LLM to help with syntax errors, at this stage of the tech, is deeply unhelpful to their development.
I mean if the alternative is quitting entirely because they can't see that they've mixed tabs with spaces, then yes, it's very very helpful to their development.
I dropped out of cs half because I didn’t enjoy the coding because they dropped us into c++ and I found the error messages so confusing.
I discovered python five years later and discovered I loved coding.
( the other half of the reason is we spent two weeks designing an atm machine at a very abstract level and I thought the whole profession would be that boring.)
Compilers can detect errors in the grammar, but they cannot infer what your desired intent was. Even the best compilers in the diagnostics business (rustc, etc) aren't mind-readers. A LLM isn't perfect, but it's much more capable of figuring out what you wanted to do and what went wrong than a compiler is.
I spent 8 hours trying to fix a bug once because notepad used smart quotation marks (really showing my age here - and now I'm pretty annoyed that the instructor was telling us to use notepad, but it was 2001 and I didn't know any better).
I did something like that once too, a long time ago. And because of that I see syntax errors of such I’ll within seconds now, having learned once the hard way.
This was about a million years ago. I had just installed a pirated copy of Windows XP (FCKGW-RHQQ2...) and was in the first quarter of my physics degree, taking a class in C. Different times....
This all sounds plausible, but personally I find being paired to a new idiot-savant hire who never learns anything from the interaction incredibly exhausting. It can augment and amplify one’s own capabilities, but it’s also continuously frustrating and cumbersome.
I think you’ve exactly captured the two disparate views we see on HN:
1. LLMs have little value, are totally unreliable, will never amount to much because they don’t learn and grow and mature like people do., so they cannot replace a person like me who is well advanced in a career.
2. LLMs are incredible useful and will change the world because they excel at entry level work and can replace swaths of relatively undifferentiated information workers. LLM flaws are not that different from those workers’ flaws.
I’m in camp 2, but I appreciate and agree with the articulation of why they will not replace every information worker.
You really shouldn't say LLMs "never graduate" to experienced staff - rather that they haven't yet. But there are recent and continuing improvements in the ability of the LLMs, and in time, perhaps a small amount of time, this situation may flip.
I'm talking about the current SOTA. In the future, all bets are off. For today, they are very capable when paired with a capable person, and that is how one uses them successfully today. Tomorrow will be different, of course.
> A group of individuals adept with use of such an idiot savant enhanced environment would be incredibly capable. They'd be a force unseen in human civilization before today.
I'm sorry but your comment is a good example of the logical shell game many people play with AI when applying it to general problem solving. Your LLM AI is both an idiot and an expert somehow? Where is this expertise derived from and why should you trust it? If LLMs were truly as revolutionary as all the grifters would have you believe then why do we not see "forces unseed in human civilization before today" by humans that employ armies of interns? That these supposed ubermensch do not presently exist is firm evidence in support of current AI being a dead end in my opinion.
Humans are infinitely more capable than current AI, the limiting factor is time and money. Not capability!
What I find curious is that the people who sell the AI as the holy grail that will make any jobs obsolete in a few year at the same time claim that there's huge talent shortage and even engage in feud on immigration and spend capital to influence immigration policies.
Apparently they don't believe that AI is about to revolutionize things that much. This makes me believe that significant part of the AI investment is just FOMO driven, so no real revolution is around the corner.
Although we keep seeing claims that AI achieved PHD level this Olympics level that, people who actually own these keep demanding immigration policy changes to bring actual humans from overseas for year to come.
Have you maybe confused the time periods in the different discussions? I think the AI making jobs obsolete part is in the next few years, whereas the talent shortage issue is right now - although as usual, it's a wage issue, not a talent issue. Pay enough and the right people will turn up.
Who knows about the future, right? I'm just trying to read the expectations of the people who have control over both the AI, Capital and Politics and they don't strike me as optimistic about AI actually doing much in near future.
And that might be a FOMO or they can simply exit with profit as long as they can flame up the hype. An of course, they may be hoping to have it in long term.
They are not replacing their workers despite claiming that AI is currently as good as a PHD and they certainly don't go to AI medical doctors despite claiming that their tool is better than most doctors.
Is that so? I'm not in the US, so I don't have a good idea of what's going on there. But wasn't there relatively high unemployment among developers after all these Big Tech layoffs post pandemic? Shouldn't companies there have an easy time finding local talent?
Sorry for the potentially silly question. I just spent some time trying to research it and came up with nothing concrete.
But at the same time there's an ongoing infighting among Trump supporters because tech elites came up as pro - skilled immigration where the MAGA camp turned against them. The tech elites claim that there's a talent shortage. Here's a short rundown that Elon Musk agrees with: https://x.com/AutismCapital/status/1872408010653589799
Every software firm, notable and small, has had layoffs over the past two years, but somehow there's still a "STEM shortage" and companies are "starving for talent" or some such nonsense?
> I would call ‘LLM-functionalism’: the idea that a natural language description of the required functionality fed to an LLM, possibly with some prompt engineering, establishes a meaningful implementation of the functionality.
My boy. More people need common sense like this talked into them.
The reliance on large datasets for training AI models introduces biases present in the data, which can perpetuate or even exacerbate societal inequalities. It's essential to approach AI development with caution, ensuring robust ethical guidelines and comprehensive testing are in place before integrating AI into sensitive areas.
As we continue to innovate, a focus on explainability, fairness, and accountability in AI systems will be paramount to harnessing their potential without compromising societal values.
"Today’s most advanced AI models have many flaws, but decades from now, they will be recognized as the first true examples of artificial general intelligence."
Norvig seems to be using a loose technical definition of AGI, roughly "AI with some degree of generality", which is hard to argue with, although by that measure older GOFAI systems like SOAR might also qualify.
Certainly "deep learning" in general (connectionist vs symbolic, self-learnt representations) was a step in the right direction, and LLMs a second step, but it seems we're still a half dozen MAJOR steps away from anything similar to animal intelligence, with one critical step being moving beyond full dataset pre-training to new continuous learning algorithms.
I agree. The systems in place already solve generalized problems not directly represented in the training set or algorithm . That was, up until the last few years , the off the shelf definition of AGI.
And the systems in place do so at scales and breadths that no human could achieve.
That doesn’t change the fact that it’s effectively triple PHD uncle Jim, as in slightly unreliable and prone to bullshitting its way through questions, despite having a breathtaking depth and breadth of knowledge.
What we are making is not software in any normal sense of the word, but rather an engine to navigate the entire pool of human knowledge, including all of the stupidity, bias, and idiosyncrasies of humanity, all rolled up into a big sticky glob.
It’s an incredibly powerful tool, but it’s a fundamentally different class of tool. We cannot expect to apply conventional software processes and paradigms to LLM based tools any more than we could apply those paradigms to politics or child rearing and expect useful results.
> The systems in place already solve generalized problems not directly represented in the training set or algorithm
Tell me a problem that an LLM can solve that is not directly represented in the training set or algorithm. I would argue that 99% of what commercial LLMs gets prompted about are stuff that already existed in the training set. And they still hallucinate half lies about those. When your training data is most the internet, it is hard to find problems that you haven't encountered before
o3 solved a quarter of the challenging novel problems on the FrontierMath benchmark, a set of problems "often requiring multiple hours of effort from expert mathematicians to solve".
I’m having a hard time taking this comment seriously, since solving novel problems is precisely what LLMs are valuable for. Sure, most problems are in some way similar in pattern to some other, known one, but that describes 99.9 percent of what 99.9 percent of people do. De novo conceptual synthesis is vanishingly rare and I’m not even sure it exists at all.
My observation is that every wave of neural networks has resulted in a dead end. In my view, this is in large part caused by the (inevitable) brute force mathematical approach used and the fact that this can not map to any kind of mechanistic explanation of what the ANN is doing in a way that can facilitate intuition. Or as put in the article "Current AI systems have no internal structure that relates meaningfully to their functionality". This is the most important thing. Maybe layers of indirection can fix that, but I kind of doubt it.
I am however quite excited about what LLMs can do to make semantic search much easier, and impressed at how much better they've made the tooling around natural language processing. Nonetheless, I feel I can already see the dead end pretty close ahead.