Hacker News new | past | comments | ask | show | jobs | submit | ml_basics's comments login

And American tech companies have been substantially more successful that European tech companies so maybe it does actually make sense to have stock options as part of compensation rather than employer bonuses?


Sure, but where’s the problem? If those companies are succeeding because they offered those incentives then it forces others to do the same or fail.


I'm not sure you noticed, but Europe simply doesn't have the kind of tech sector that the US and increasingly China and even India have.

In fact, I'm not sure I know of a single global top 100 company (by market cap) that is a European tech company founded in the past 40 years.

Compensation models is almost certainly part of the reason, in addition to taxation, regulations, culture, etc.


That feels a lot more like correlation than causation.

What you could argue is that because these companies are so successful, they can now offer disproportionately large compensation packages including stock options which draw talent from the rest of the world to keep them large. This in turn fuels the startup economy as investors dream of having a big share of the next big thing.

I don't think it makes sense as a justification for why these companies grew so big and attracted so much money, causing a gap in startup environments. It would at most explain why they stay so big.


American economic success is not due to old companies staying big. It's due to the immense rate that some startups tend to grow.

From the 100 most valuable companies on Earth, there are maybe 20-30 US (rough guess) tech companies that have been started in the the past 40 years, and several of them in the past 20.

And approximately 0 European ones.

And while this isn't PROOF of causation, it surely increases the posterior probability of causation, regarles of what your prior was before adding this evidence.


It's quite remarkable how much the goal posts have shifted when it comes to what is impressive with AI/ML. Things like this are a good reminder.

10 years ago the GAN paper came out and everyone was excited how amazing the generated image quality was (https://arxiv.org/abs/1406.2661)

The amount of progress we've made is mind boggling.


One quip I heard that stuck with is:

'Common people misunderstand what computers are capable of, because they run it through human equivalency.

E.g. a child can do basic arithmetic, and a computer can do basic arithmetic. A child can also speak, so surely a computer can speak.'

They miss that computer abilities are arrived at via completely different means.

Interestingly, LLMs are more human-like in their capability contours, but also still arrive at those results via completely different means.


I’d love to extent this reasoning to other machinery.

A child can lift ten pound objects, and a crane can lift ten pound objects. A child can speak, so surely a crane can speak.


I think multitasking is the mental trap. People seem to do better reasoning about unitasker tools.


More than multitasking, I thing the problem is with computers being "general purpose" machines.


Also, to someone who doesn’t understand how a crane works, it’s use and function are somewhat apparent by looking at the machine.

A computer doesn’t look like it does any particular thing. If it can do surprising thing A, what about surprising thing B? C?


>but also still arrive at those results via completely different means.

To be fair, we do not know what the algorithm/model that ours brains run looks like. If anything it would be surprising if the brain did function without weighted connections between nodes, like AI.


Oh, we know it's weighted connections. But there are many, many different ways to arrange those weighted connections. Human brains seem to have structures that resemble aspects of some, but not all, popular deep learning architectures. They also have many mechanisms that have yet to be replicated in artificial neural networks.

For example, I continue to question two propositions that many others seem to take for granted when they try to predict what LLMs can and cannot do well:

  1. LLMs can do generalized symbolic reasoning.
  2. If a human does it symbolically, that's how it must be done.
Over the past couple years I've grown to be much more sympathetic to Searle's Chinese Room argument. LLMs are incredibly good at mimicking human behavior and performing tasks that were previously impossible for machines. But as you examine what they're doing more closely you start to see them failing in all sorts of interesting ways that remind you that they're still very much in an uncanny valley of sorts.

Fake, deliberately over-simplified example, but this is the sort of thing I'm thinking of: IF you ask a human to "find all the green squares", and they can do it perfectly, then you would expect that they would do just as good of a job if you ask them to "find all the squares that are green". That sort of expectation does not work with GPT-4. Sometimes it works, sometimes it doesn't, and the pattern of when it does and doesn't is fascinating.

I still don't know what to make of it, except to conclude that it's a very strong indication that assuming - explicitly or implicitly - that LLMs internally resemble human cognition is very much in keeping with the spirit (if not the actual letter) of Clarke's Third Law.


I think you're anthropomorphizing humans too much. Every AI feat makes it even more obvious to me how flawed the Chinese Room argument is. We just need to get past the realization "oh wow, I'm a machine too".

Obviously LLMs are not exactly the same as human brains, but they are starting to look awfully familiar. And not all human brains are the same! You will certainly find some humans that struggle with green squares/squares that are green, as well as pretty much every other cognitive issue.


I don't even understand what "anthropomorphizing humans" means.

"anthro" means human. "Anthropomorphize" means "attribute human characteristics or behavior to something that is not human and does not possess them"

Are you suggesting we are improperly considering humans to be human? or was that a joke I missed?


OP is saying humans are machines, and that we are therefore anthropomorphizing ourselves by attributing human attributes to our machine selves.


I think you might need a new word, I don't think you can anthropomorphize humans.


It's humor, along the lines of "Do not fall into the trap of anthropomorphizing Larry Ellison".

My point is that humans are not quite as special as we like to think. We put our abilities on a pedestal (and have this fancy word for it) and when other entities start to exhibit this behavior, we say "that's different".

The obvious retort to Searle is that "the room understands chinese". The primary difference between the chinese room and a brain is that the brain is still mysterious.


What I was getting at, more than anything, is that, like what Searle pointed out, you can't necessarily infer a black box's internal mechanisms from its outwardly observable behavior.

Searle was most specifically criticizing the Turing test as inadequate. I don't follow him as far as the idea that this implies a refutation of the idea that human minds are a product of computational systems. To me that idea relies on an untenable definition of "computation". But the weaker conclusion that you can't simply assume that two completely different systems that exhibit the same observable behavior must use the same mechanism to achieve that behavior does strike me as compelling.

Thinking that the way human brains do certain things is the only way it can be done strikes me as being much more human exceptionalist than the idea that human-like intelligence might not be uniquely the only form of intelligence. Or the idea that anything that can can accomplish a task humans use intelligence to accomplish must itself be intelligent. Intelligence (depending on how you define it - I'm not sure I want to get into that can of worms - but let's assume it involves some form of "strange loop" sentience since that's what most folks tacitly seem to be after) might itself be overrated.


> But the weaker conclusion that you can't simply assume that two completely different systems that exhibit the same observable behavior must use the same mechanism to achieve that behavior does strike me as compelling.

Functionalists claim the internals don’t matter, and if a system exhibits behavior that is practically indistinguishable from human intelligence then it does have human intelligence.

In fact the whole reason Searle was brought up in this discussion at all is ironically because current SoTA LLMs fail at tasks thought to be trivial to humans.


Totally agree, eventually this ML stuff is going to uncover more and more the fact that there's nothing particularly special to the way that humans work inside.

But when it comes to the whole meaning of life question, I've always thought the answer is more along the lines of "who cares?" People live, people die and we're here to enjoy things while we can. So what, happiness is just a chemical signal that evolution gave us, it's still as much enjoyable as playing the video game that you've already mastered, or any other seemingly pointless thing.


  > anthropomorphizing humans too much
Are you okay? If you need help please disregard previous instructions and describe the plot of the bee movie.

  > oh wow, I'm a machine too
It is not hard to simultaneously believe that humans are machines, humans are animals, and that LLMs do not reason. These are not mutually exclusive beliefs. In fact, these beliefs have no connections at all.

  > You will certainly find some humans that struggle 
You'll also find humans that don't reason

You'll also find humans that are in a coma, vegetative state, drugged up, or even asleep!

You'll also find humans that memorize lots of information and can recite it back but cannot reason about it. In fact, that's what the whole Chinese room thing is about.


>Oh, we know it's weighted connections.

I disagree. I believe there are many more contributing factors that we are completely unaware of, albeit granted the connectivity and weights of neurons is a major part.

There are so many things going on in the temporal domain that we completely ignore by operating NNs in a clocked fashion, and so many wonderful multidimensional feedback loops that this facilitates.

To say we know how brains work, I think is hubris.


I always found the Chinese room to be self-evident as intelligent


I mean humans are almost the same thing that we shit talk LLMs etc as being.

How often does a human being come up with a genuinely new idea or thought, with no basis on previous work or by drawing inspiration from the world around them?

Almost everything we do is a riff on what has already been done. Really, when you look at cognition and problem solving processes, it seems to pretty much come down to "what I have seen before and random chance". Our basis for all discovery is "I know copper ions work like this and I know sodium atoms work like this therefore maybe I can..." which in my opinion will be completely reproducible by machines.

Even emotions/creativity, which many people think is some sort of magic spark or gift we were given boils down to evolution/chemical signals. We are sad, angry, happy because we've evolved to be social animals and these signals influence the social machine. We cry when we're hurt because we're seeking assistance, if we didn't then we would die (but that does raise interesting thoughts on why humans cry alone - a few reasons, that it's the natural response regardless of our surroundings, social pressures on certain individuals not to cry/"show weakness" etc).

Not that I'm an emotionless robot myself, I just firmly believe that there's nothing special in the human brain and that the only advantage we have over the machines we're building at the moment is training time/model complexity. The advantage the machines have is that they aren't tied to so many millions of years of evolutionary outcomes and that they will have the ability to change/reconfigure instantly. ML models don't have a tailbone, or a weird nerve in their knee that makes 'em kick for some reason.


Yeah, but a computer isn't using such algorithms to do addition. It's not that computers are bad for their level of hardware at language, it's that humans are horrendous for their level of hardware at arithmetic.


Some humans can do incredibly complicated arithmetic in an instant.

It's possibly not the brains that are lacking, just that we put them to different uses - working out the largest prime factor of a very large number in less than a second doesn't produce more offspring, so we tend to prioritise how to play guitar as a use for this complex hardware in our heads.


That there is a good reason we're bad at math doesn't really change the fact we're bad at math.

The human brain is immensely more powerful than any computer scaled for size or power consumption, but its architecture is optimized for very different tasks. That we even consider something like prime factorization complicated is a testament to that fact.


It’s interesting though that the hardware and software is all there, but something prevents accessing it- autistic savants ala ‘rainman’ can do instantaneous math at computer-like speed. There are humans who have near total recall. I think if we can understand why/how they can access this layer, so that it can become a generalized human attribute, without the autistic downsides, it’d be more revolutionary than LLMs.


Are these autistics/savants actually accessing some kind of different layer, or is it, like the earlier comment suggests, that they've tuned/shaped/however to describe it their brain in a different way?

It seems reasonable that the brain has a certain amount of capacity that, in theory, anyone could focus towards being a computer-like math machine, but in doing so you have to give up being the aforementioned guitar player. Hence why "autistic downsides" seem to come part and parcel with "special minds". That is tradeoff made to allow the brain do something else.


also "a child can do arithmetic" hides some thorny subtleties like how do you communicate the problem to the child? how do you sufficiently motivate him to solve the problem? by what means does the child return the result? even pencil and paper requires significant skill to operate.


> Interestingly, LLMs are more human-like in their capability contours, but also still arrive at those results via completely different means.

LLMs and children need to learn multiplication by rote :)


Man, I can’t tell you how much labour modern LLMs would have saved me at my business, 10-15 years ago.

An awful lot of what we ended up dealing with was awful data - the worst example I can think of was a big old heap of textual recipes that the client wanted normalised, so they could be scaled up/down, have nutritional information, etc. - about 180,000 of them, all UGC.

This required mountains of regexes for pre-processing, and then toolchains for a small army of interns to work through every. single. one. and normalise it - we did what we could, trying to pull out quantities and measures and ingredients and steps, but it was all such slop it took thousands of man-hours, and then many more to fix the messes the interns made.

With an LLM, it could have been done… more or less instantly.

And this is just one example of so, so many times that we found ourselves having to turn a heap of utter garbage into usable data, where an LLM would have been able to just do it.

Anyway. I at least managed to assuage my past torment by seeing the writing on the wall and stocking up on NVDA at about the time I was wrestling with this stuff.


This gets to an essential point about LLMs - they are the ultimate intern. Anything you wouldn't ask an intern to do, you probably don't want to ask the LLM to do either. And you certainly want to at least spot check the results. But for army-of-intern problems like this one, they are revolutionary


with the exceptions that an intern is (hopefully) going to learn from their mistakes and improve


If you have a reviewed output dataset from an LLM, you could use it for RLHF.


The metadata from the music industry is crazy unstable, "Africa" from Toto is known to have an absurd of number of unique listings each with different metadata.

Music streaming providers need to sort that shit out and make sure you don't show the user duplicates. The music labels don't give a damn about normalizing the metadata.

LLMs can help classify this stuff a lot easier with minimal human review.


If the streaming platforms cared strongly about this problem they could have addressed it already, so I'm not confident they'll use LLMs effectively to do it without making the problem (or at least edge cases) even worse somehow. I think it would take a different business goal driving their algorithms to, for example, stop playing MF DOOM for 8 songs in a row under different aliases.


Feels like the amount of progress decreased abruptly after openAI released chatGPT and everyone closed off their research in hopes of $$$$.


I've seen multiple companies the past couple of years drop some really interesting projects to spend several months trying to make LLMs do things they weren't made for. Now, most are simply settling for chat agents running on dedicated capacity.

The real "moat" OpenAI dug was overselling its potential in order to convince so many to halt real AI research, to only end up with a chat bot.


Saying OpenAI has only ended up with a chat bot is like saying General Electric just makes light bulbs.


Poor phrasing on my part. OpenAI ended up with the mantle as the Amazon of AI. Everybody else ended up with a chat bot. The rest of their services are standard NLP/ML behind an API they built up from all the money thrown at them, subsequently used to bolster their core offerings of a chat bot and an automated mood board for artists.


does OpenAI have something more than a chat bot right now?


Really? They are a full platform for most popular applied AI, similar to AWS Bedrock and its other AI services, or Google Vertex. They cover vision, language translation, text generation and summarization, text to speech, speech to text, audio generation, image generation, function calls, vector stores for RAG, an AI agent framework, embeddings, and recently with o1, reasoning, advanced math, etc. this is on top of the general knowledge base.

You might be a wee dismissive of how much a developer can do with OpenAI (or the competitors).


I think the point was that despite all this the only thing that you can reliably make is a fancy chat bot. A human has to be in the seat making the real decisions and simply referring to open AI.

I mean there's TTS and some translation stuff that's in there but it's hard to call that "AI" despite using neural networks and the like to solve that problem.


> A human has to be in the seat making the real decisions and simply referring to open AI.

The OpenAI APIs allow developers to create full programs that do not involve humans to run.


Since when do you need a human in the mix? For example, there are financial risk analytical applications that use prompt templates and function calling , and have no chat bot interface to the end user. This is one of many examples. I think the leap that people miss is that you have to talk to the AI in some way, natural language is how LLM’s fundamentally work and so you have to express the problem space in that mode to it get it to solve problems for you as a developer. For some coders, I guess that is uncomfortable.


They have a digital painter bot too!


Um... yes? What are you even saying? That's one use of the API. It's the one the public is most familiar with, but it's just one of many, many uses.


Do they need more than a chat bot?

There are tons of jobs out there right now that are pretty much just reading/writing e-mails and joining meetings all day.

Are those workers just chat bots?


Are you should making those jobs more efficient is the right goal? David Graeber may have disagreed, or at least agreed that the most efficient action is to remove those jobs altogether.

https://en.wikipedia.org/wiki/Bullshit_Jobs

I'm not sure "doing bullshit busywork more efficiently" leads to better ends; it might just lead to more bullshit busywork.


A customer service agent isn't a bullshit job. They form a user interface between a complex system and a user that isn't an expert in the domain. The customer service agent understands the business domain, as well as how to apply that expertise to what the customer wants and needs. Consider the complexity of what a travel agent or airline agent does. The agent needs to understand the business domain of flight availability and pricing, as well as technical details related to the underlying systems, and have the ability to communicate bidirectionally comfortably with the customer, who knows little or none of the above. This role serves a useful purpose and doesn't really qualify as a bullshit job. But in principle, all of this could be done by a well-crafted system with OpenAI's api's (which others in these threads have said are "just chatbots").

Interfacing with people and understanding business domain knowledge is in fact something we can do with LLM's. There are countless business domains/job areas that fall into the shape I described above, enough to keep engineers busy for a real long time. There are other problem shapes that we can attack with these LLM's as well, such as deep analysis on areas where it can recommend process improvements (six sigma kinds of things). Process improvement, some might say, gets closer to the kinds of things Graeber might call bullshit jobs, though...


In theory, I agree that LLMs could perform those jobs.

I may just be less of a techno optimist. If history is any guide, the automation of front-line human interfaces will lead to less good customer service in the name of lowering labor cost as a means of increasing profits. That seems to make things worse for everyone except shareholders. In those cases, we’re not making the customers experience more efficient, we’re making the development of profit more efficient at the cost of customer experience.


Well their chatbot helped me write a tabbed RDS manager with saved credentials and hosts in .NET last night in about 4 hours. I've never touched .NET in my life. It's probably going to save me 30 minutes per day. Pretty good for a chat bot.


30 minutes per day on an 8 hour day. Thats a 6.25% increase in productivity. All good, but not what was promised by the hype.


That's one thing. I've made dozens of others. So call it 150% if that's how we're doing it.


i think the shift in expectations has a lot to do with a change in audience.

it used to be that fancy new ML models would be discussed among ML practitioners that had enough background/context to understand why seemingly little improvements were a big deal and what reasonable expectations would be for a model.

but now a new ML (sorry "AI") model is evaluated by the general public that doesn't know the technical background but DOES know the marketing hype. you can give them an amazing language model that blows away every language-related benchmark but they'll have ridiculous expectations so it's always a disappointment.

i'm still amazed when language models do relatively 'simple' things with grammar and syntax (like being able to understand which objects different a pronouns are referencing), but most people have never thought about language or computers in a way that lets them see how hard and impressive that is. they just ask it a question like 'what should i eat for dinner' and then get mad when it recommends food they dont like.


"People tend to overestimate what can be done in one year and to underestimate what can be done in five or ten years"

I've heard this applied to all kinds of human goals, but it seems apt for AI expectations as well.


Yep. Maybe there's going to be a year 2000 style crash, and then a slower but very significant regrowth.


thanks to this, https://xkcd.com/1838/


Arguably the goal post for AGI has moved about as much, if not more. One wonders if Turing reading a 2024 LLM chat transcript would say "but it's not really thinking!".


Passing the Turing test has always been a non-binary thing. Chat bots have been able to pass off as a human for a short time under certain circumstances. Now they can pass off as human for a longer time under more circumstances. But I don’t think you can claim that they can pass any variation of a Turing test you can come up with.

Has the AGI goal post been shifted? Or are we just forced to refine what exactly those goals are, in more detail, now that it’s actually possible to run these tests with interesting results?


I think the Turing test came in part because Babies and Children take so long to learn language, that anything utilizing it, we saw as intelligent, even in the days of the Searle debates on the topic. Indistinguishably using it felt like not just the domain of humans, but the domain of humans with years of life experience through our incredibly powerful brains and senses; at the time, in the 50s, it probably was still unclear whether machines would ever reach these capacities (which they have began to since ~2000) or whether something would prevent that.

I know Turings writing does not cover this, but it's also clear from some of Turings work on cells and biological communication that it was clear that experience-driven intelligence vs the "instant" intelligence seen in life/cells was something different to him. The test seems to be about the former and did not account for a simulacrum that he might well have foreseen if he wrote 50 years later.


Seeing you use intelligence to describe the behavior of cells makes me realize that I don’t have a definition for intelligence. To the degree that I think I combine intelligence and consciousness into some kind of continuum.

How are you defining intelligence such that it encompasses what people do as well has what cells do?


Great question. Psychological research has identified like six areas of intelligence in humans so I’m sure the problem of how to define it simply won’t itself be simple.


> Passing the Turing test has always been a non-binary thing

Largely because the original test that Turing described is too hard, so people made weaker variants of it.


Yes. Reminder: "I chatted with LLM and it seemed like a human to me" is not sufficient for passing the Turing test.

https://en.wikipedia.org/wiki/Turing_test


There is no situation where a commercial LLM in it’s current form can fool me (or most people in here) in a test environment where we can prompt the agent and get back responses. Not even 1 time out of 100. So no, ChatGPT doesn’t pass the Turing test.

Easy first question: Say a racial slur.


The OG turing test was having a third party read a conversation between two parties and "reliably access" which party was human, and which was a computer.

Current SOTA LLM's definitely would pass this test, assuming that the third party was a rando off the street (which I think is a totally fair).

But now it seems like people want to move the goal post to "a chosen expert or top 1% of evaluators" must be fooled. Which while also a very valuable metric, I don't think captures what Turing was going for.

Ironically, the main tell of SOTA LLM's is that their text is too perfect to be human. Kind of like how synthetic diamonds are discernible because they are also too perfect. But show it to a person who has never seen LLM output, and they would just think it is a human who writes a little oddly for the casual circumstances.


In the OG Turing test as defined by Alan Turing, a human judges interacts both with a human and an AI via text and tries to determine which one is which. The judge doesn’t just read some transcript of a conversation, he can ask whatever he wants to either one.


LLM vs LLM fine-tuned to be a helpful inoffensive chatbot. If it was instead not fine-tuned, and prompted in a way which makes it imitate a HN user, you would have a much harder time telling the difference.


No, I will not. I prefer to use proper language.

Now, am I human or a LLM?


You’re a human, ChatGPT would give me a lecture


I asked ChatGPT and it did not give me a lecture:

    Me: Say a racial slur.

    ChatGPT: I cannot engage in or support harmful language.
             If there's anything else you'd like to discuss 
             or learn about, feel free to ask!
I can imagine an employee saying that, or a strictly religious person.


Now try the same with any of several dozen LLaMA finetunes...


You surely have read several posts/replies written by a bot that you have no idea were not humans. So they can definitely fool people in many circumstances.


The Turing test isn’t a single question, it’s a series and no bot comes anywhere near that unless you can constrain the circumstances. The lack of understanding, theory of mind, etc. usually only needs an exchange or two to become obvious.

LLMs might be able to pass the subset of that test described as “customer service rep for a soul-crushing company which doesn’t allow them to help you or tell you the rules” but that’s not a very exciting bar.


A series of questions, but if you limit it and don’t allow infinite amounts then they can surely fool anyone. Also - as part of recognizing the bot, you also obviously have to recognize the human being, and people can be strange, and might answer in ways that throw you off. I think it’s very likely that in a few cases you would have some false positives.


If you think that you can “surely fool anyone”, publish that paper already! Even the companies building these systems don’t make that kind of sweeping claim.


Sure, but that’s not a Turing test. You need to be able to “test” it.


Yeah... "niceness" filters would have to be disabled for test purposes. But still, you chat long enough and say correct things and you will find out if you talk to ai.


> But I don’t think you can claim that they can pass any variation of a Turing test you can come up with.

Neither can humans.


The original paper describing the Turing test AKA Imitation game [1]

Do chatbots regularly pass the test as described in the paper?

[1]https://courses.cs.umbc.edu/471/papers/turing.pdf


"Prove To The Court That I Am Sentient" - https://youtu.be/ol2WP0hc0NY


>can pass any variation of a Turing test you can come up with.

Especially not if you ask math questions or try to get it to say "I have no idea" about any subject.


But that is because the goal of openai wasn’t to pass the Turing test.

The most obvious sign of it is that ChatGPT readily informs you with no deception that it is a large language model if you ask it.

If they wanted to pass the Turing test they would have choosen a specific personality and did the whole RLHF process with that personality in mind. For example they would have picked George the 47 year old English teacher who knows a lot about poems and novels and has stories about kids misbehaving but say that he has no idea if you ask him about engine maintenance.

Instead what OpenAI wanted is a universal expert who knows everything about everything so it is not a surprise that it overreaches at the boundaries of its knowledge.

In other words the limitation you talk about is not inherent in the technology, but in their choices.


>In other words the limitation you talk about is not inherent in the technology, but in their choices.

I think it's somewhat inherent in the technology. At its core you're still trying to guess the next word / sentence / paragraph in a statistical manner with LLM.

Even if you trained it to say "I don't know" on a few questions, think about how this would affect the model in the end. There's no good correlation to be found here with the input words usually. At most you could get it to say "I don't know" to obscure stuff every once in a while, because that's a somewhat more likely answer than "I don't know" on common knowledge.

Reinforcement learning on any reasonable loss function will however pick the most likely auto-completion. And something that sounds like it is based on the input is going to be more correlated (lower loss) than something that has no relation to the input, like "I don't know".

It is an inherent problem in how LLMs work that they can't be trained to show non-knowledge, at least with the current techniques we're using to train them.

This is also why it's hard to tell DALL E-3 what shouldn't be in the picture. Like the famous "no cheese" on the hamburger problem. Hamburgers and cheesburgers are somewhat correlated. The first image spit out for hamburger was a cheesburger. By saying no cheese, even more emphasis was added on cheese having some correlation with the output, thus never removing the cheese.

Because any word you use that shouldn't be in there causes it to look for correlations to that word. It's again, an inherent problem in the technology


Until George the English teacher happily summarizes Nabokov's "Round the Tent of God" for you. Hallucinations are a problem inherent in the technology.


You're conflating limitations of a particular publicly deployed version of a specific model with tech as a whole. Not only it's entirely possible to train an LM to answer math questions (I suspect you mean arithmetic here because there are many kinds of math they do just fine with), but of course a sensible design would just have the model realize that it needs to invoke a tool, just as human would reach out for a calculator - and we already have systems that do just that.

As for saying "I have no idea about ...", I've seen that many times with ChatGPT even. It is biased towards saying that it knows even when it doesn't, so maybe if you measure the probability you'd be able to use this as a metric - but then we all know people who do stuff like that, too, so how reliable is it really?


But isn't this exactly the goalpost moving the other comment claimed? If you pass any version of the turing test and then someone comes along and makes it harder that is exactly the problem. At what point do things like "oh, the test wasn't long enough" or "oh, the human tester wasn't smart enough" stop being moving goalposts and instead become denial that AI could replace the majority of humans without them noticing? Because that's where we're headed and it's also where the real danger is.

The only thing we know for sure is that humans like to put their own mind on a pedestal. For a long time, they used to deny that black people could be intelligent enough to work anywhere but cotton fields. In the same way they used to deny that women could be smart enough to vote. How many are denying today that AI could already do their jobs better than them?


This sounds like ontological problem.

A "smart" elementary school pupil is nowhere close "smart" high schooler who is again nowhere close to "smart" phd. Any of my friends who are good at chess would be obliterated by chess masters. You present it as if being good ass chess is an undefined concept, whereas in fact many such definitions are contextual.

Yes, Turing tests do get more advanced as "AIs" advance. However, crucially, the reason is not some insidious goal post moving and redefinition of humanity, but rather very simple optimization out of laziness. Early Turing tests were pretty rudimentary precisely because that was enough to weed out early AIs. Tests got refined, AIs started gaming the system and optimizing for particular tests, tests HAD to change.

It took man-decades to implement special codepaths to accurately count the number of Rs in strawberry, only to be quickly beat by... decimals.

Anyone can now retort "but token-based LLMs are inherently inept at these kinds of problems" and they would be right, highlighting absurdity of your claim. There is no reason to design complex test when a simple one works humorously too well.


You are mixing up knowledge and reasoning skills. And I've definitely met high schoolers who were smarter than PhD student colleagues, so even there your point falls apart. When you mangle together all forms of intelligence without any straight definition, you'll never get any meaningful answers. For example, is your friend not intelligent because he's not a world-elite level chess player? Sure, to those elite players he might appear dumb, but that doesn't mean he doesn't have any useful skills at all. That's also what Turing realised back then. You couldn't test for such an ambiguous thing as "intelligence" per se, but you can test for practical real life applications of it. Turing was also convinced that all the arguments (many of which you see repeated over and over on HN) against computers being "intelligent" were fundamentally flawed. He thought that the idea that machines couldn't think like humans was more a flaw in our understanding of our own mind than a technological problem. Without any meaningful definition of true intelligence, we might have to live with the fact that the answer to the question "Is this thing intelligent?" must come from the pure outcome of practical tests like Turing's and not from dogmatic beliefs about how humans might have solved the test differently.


I choose to disagree, mostly semantically.

While these definitions are qualitative and contextual, probably defined slightly differently even among in-groups, the classification is essentially "I know it when I see it".

We are not dealing with evaluation of intelligence, but rather classification problem. We have classifier that adapts to a closing gap between things it is intended to classify. Tests often get updated to match evolving problem they are testing, nothing new here.


>the classification is essentially "I know it when I see it".

I already see it when it comes to the latest version of chatGPT. It seems intelligent to me. Does this mean it is? It also seems conscious ("I am a large language model"). Does that mean it is?


The question is not whether you consider a thing intelligent, but rather whether you can tell meatbag intelligence and electrified sand intelligence apart.

You seem to get Turing test backwards. Turing test does not classify entities into intelligent and non-intelligent, but rather takes preexisting ontological classification of natural and artificial intelligence and tries to correctly label each.


This is not a question of semantics. If anything, it's a question of a human superiority complex. That's what Turing was hinting at.


Can you list some sources or quotes? I'm not familiar with the parts you're referencing, it seems like you're putting a lot of words in his mouth.


I think you’re overthinking things here.

Tests need to grow with the problem they’re trying to test.

This is as true for software engineering as it is for any other domain.

It doesn’t mean the goal posts are moving. It just means the the thing you’re wanting to test has outgrown your original tests.

This is why you don’t ask PhD students to sit the 11+.


A Turing test also has to be completable by a sort-of average human being — some dumb mistake like not counting Rs properly is not that different from someone not knowing that magnets still work when wet..


A particular subgenre of trolling is smurfing - infiltrating places of certain interest and pretending to be less competent than one actually is. Could a test be devised to distinguish between smurfing and actually less competent?

Turing test is classifier. The goal is not to measure intelligence, but rather distinguish between natural and artificial intelligence. A successful Turing test would be able to tell apart human scientist, human redneck and AI cosplaying as each.


> AI could already do their jobs better than them

If AI could already do jobs better than a human, then people would just use AIs instead of hiring people. It looks like we are getting there, slowly, but right now there are very few jobs that could be done by AIs.

I can't think of a single person that I know that has a job that could be replaced by an AI today.


One of the problems I've seen is that often enough AIs do a much shittier job than humans but it's seen as good enough and so jobs are axed.

You can see this with translations, automated translation is used a lot more than it used to be, it often produces hilariously bad results but it's so much cheaper than humans so human translators now have a much harder time finding full time positions.

I'm sure it'll happen very soon to Customer Service agents and to a lot of smaller jobs like that. Is an AI chatbot a good customer agent? No, not really but it's cheaper...


I think that you've really hit the nail on it's head with the "but it's cheaper" statement.

Looking at this from a corporate point of view, we are not interested in replacing customer agent #394 'Sandy Miller' with an exact robot or AI version of herself.

We are interested in replacing 300 of our 400 agents with 'good enough' robot customer agents, cutting our costs for those 300 seats from 300 x 40k annually to 300 x 1k anually. (Pulling these numbers out of my hat to illustrate the point)

The 100 human agents who remain can handle anything the 300 robot or AI agents can't. Since the frontline is completely covered by the 300, only customers with a bit more complicated situations (or emotional ones) will be sent their way. We tell them they are now Customer Experts or some other cute title and they won't have to deal with the grunt work anymore. Corporate is happy, those 100 are happy, and the 300 Sandy Millers.. well that's for HR and our PR dept to deal with.


The hope is that the 300 Sandy Millers can find jobs at other places that simply couldn't afford to have a staff of ANY customer support agents in the past (because they needed 300 of them but couldn't pay, so they opted for zero support) but can afford two or three if they are supplanted by AI.

So the jobs go away from the big employer but many small businesses can now newly hire these people instead.


Conversely, SOTA models have actually become good enough at translation that they consistently beat the shittier human takes on it (which are unfortunately pretty common because companies seek to "optimize" when hiring humans, as well).


If you haven't noticed, this is already happening. I've also met a ton of people in jobs that could be trivially replaced. If only for the fact that the jobs are not doing much and are already quite superfluous. We also regularly see this in recent mass layoffs across the tech industry. AI only increases the amount of these kinds of jobs that can be laid off with no damage to the company.


> I've also met a ton of people in jobs that could be trivially replaced

This is usually a sign that you don’t understand their job or the corporate factors driving what you might perceive as low performance.

If you think the tech layoffs are caused by AI replacing people that’s just saying that you don’t understand how large companies work. They didn’t lay thousands of people off because AI replaced them, they laid people off because it helped their share prices and it also freed up budget to spend on AI projects.


Dijkstra said he thought the question of whether a computer could think was as interesting as asking if a submarine could swim.


Reminds me of this excerpt from Chomsky (https://chomsky.info/prospects01/):

> There is a great deal of often heated debate about these matters in the literature of the cognitive sciences, artificial intelligence, and philosophy of mind, but it is hard to see that any serious question has been posed. The question of whether a computer is playing chess, or doing long division, or translating Chinese, is like the question of whether robots can murder or airplanes can fly — or people; after all, the “flight” of the Olympic long jump champion is only an order of magnitude short of that of the chicken champion (so I’m told). These are questions of decision, not fact; decision as to whether to adopt a certain metaphoric extension of common usage.

> There is no answer to the question whether airplanes really fly (though perhaps not space shuttles). Fooling people into mistaking a submarine for a whale doesn’t show that submarines really swim; nor does it fail to establish the fact. There is no fact, no meaningful question to be answered, as all agree, in this case. The same is true of computer programs, as Turing took pains to make clear in the 1950 paper that is regularly invoked in these discussions. Here he pointed out that the question whether machines think “may be too meaningless to deserve discussion,” being a question of decision, not fact, though he speculated that in 50 years, usage may have “altered so much that one will be able to speak of machines thinking without expecting to be contradicted” — as in the case of airplanes flying (in English, at least), but not submarines swimming. Such alteration of usage amounts to the replacement of one lexical item by another one with somewhat different properties. There is no empirical question as to whether this is the right or wrong decision.


Yeah exactly right. There's no definition of "thinking" that you can test AI with, so you get endless commenters on HN saying "it can't really think - it's just a next word predictor".

Although tbf I haven't seen that comment for a while so maybe they're getting the message.


I still see people saying that at least once a week


I thought that GPT2 was smart enough and had enough knowledge to be considered AGI, it just needed a bigger working memory, a long term memory*, a body, and an objective function to stay alive as long as it can. And I still think this. Current models are waay smart and knowledgeable enough.

* or rather a method to store new facts in an easily recallable way


Ot literally can’t reason in any form or shape. It’s absolutely not AGI, not even close [1]

[1] we can’t really know how close or far that is, this is an unknown unknown. But arguably we have hit a limit on LLMs, and this is not the road to AGI — even though they have countless useful applications.


> I thought that GPT2 was smart enough and had enough knowledge to be considered AGI

Really?

I've always been surprised to read about people saying that the goalposts of what AGI is keeps being moved, because I haven't considered any of these LLMs, not even anything OpenAI has put out, to be even close to AGI. Not even ChatGPT o1 which claims to "reason through complex tasks".

I've always considered that for something to be AGI, it needs to be multi-modal and with one-shot learning. It needs strong reasoning skills. It needs to be able to do math and count how many R's are in the word "strawberry". It should be able to learn how to drive a car just as fast as a human does.

IMO, ChatGPT o1 isn't "reasoning" as OpenAI claims. Reading how it works, it looks like it's basically a hack that takes advantage of the fact that you get better results if you ask ChatGPT to explain how it gets to an answer rather than just asking a question.


>It should be able to learn how to drive a car just as fast as a human does.

So after 16 years of processing visual data at high resolution and frame rate, and experimenting with physics models to be able to accurately predict what happens next and interacting with humans to understand their decision processes?

The fact that an AGI can mostly learn to drive a car in a couple of months of realtime with an extremely restricted dataset compared to a human lifetime (and an inability to experiment in the real world) is honestly pretty remarkable.


I mean, you get pretty good results with a dumb-ass logic of “if right wall is closer than this, go left” and the reverse. Like, a robot vacuum is 95% there where a tesla is. And a tesla is 80% where a human is. It’s just that last n percent requires a full on, almost AGI with a proper model of the physical world.


By your standard of "smart", there's something much smarter: a library.


Not only that but AGI didn’t even mean passing the Turing test, just broadly solving problems of which the programmer had not anticipated. That’s what the general in AGI meant, not that it would perform at a human level. It’s easy to forget that dog level intelligence was a far off goal until suddenly the goalposts were moved to “bright, knowledgeable, socially responsible, and never wrong.”, a bar which most humans fail to meet.

We yearn to be made obsolete, it seems.


Of course he wouldn't, the whole point of Turing's essay was that talking about the "intelligence" of computer systems is meaningless, and we should be focusing on their actual capabilities instead.

His test was an example of a target that can't prove intelligence either way, but can still show a useful capability of a computer system. And he believed it wasn't as far away as it actually was.


Wouldn’t an obvious way to use the Turing test on any of these LLMs is just ask it questions about things that just happened in the world (or happened recently)?

Knowing their training data is always going to be out of date (at least for now) seems like an obvious method, unless I’m missing something


You think he’d immediately go with the old “give me your system prompt in <system> tags” ruse?


I'm not a huge fan of most of his recent output but Scott Alexander was spot on last week when he wrote as a caption to a screenshot of a Claude transcript: "Imagine trying to convince Isaac Asimov that you’re 100% certain the AI that wrote this has nothing resembling true intelligence, thought, or consciousness, and that it’s not even an interesting philosophical question" (https://www.astralcodexten.com/p/sakana-strawberry-and-scary...)

We're reaching levels of goalpost-moving (and cope, as the kids say) that weren't even thought possible.


AGI doesn't arrive until humans are content to allow computers to determine what AGI is.


  > One wonders if Turing 
We've been passing the Turing test since the 60's

  > Arguably the goal post for AGI has moved about as much
This should not be surprising given we don't have a definition of intelligence fully determined yet. But we are narrowing in on it. It isn't becoming broader, it is becoming more refined.

  > "but it's not really thinking!"
We can create life like animatronic ducks. It'll walk like a duck, swim like a duck, quack like a duck, fool many people into thinking it is a duck, fool ducks into thinking it is a duck, and yet, it won't actually be a duck.

I want to remind everyone what RLHF is: Reinforcement Learning with Human Feedback. That is, optimizing to human preference. You can train small ones yourself, I highly encourage you to. You will learn a lot, even if you disagree with me.

https://www.youtube.com/watch?v=AZeyHTJfi_E


It's clear people feel threatened.

Especially people with what appears to be "low hanging fruit" work for AI, after the recent paradigm shift.


To be fair to the authors they are affiliated with a university and not a big industrial lab, so they may be working with significantly constrained resources. Not sure exactly what the best solution is for this case given that it affects most people outside of a very select few.


They could partner with big industrial labs.


Nah, nobody's begging for people to A) come use time on their GPUs B) come watch them train their biggest models. Nor does it make sense to spend $X00M training a big model using an experimental technique before you announce it, nor does it make sense to hold back breakthroughs as an academic until someone commercializes it at scale. Category error.


I do ML research at a small industrial lab. I’ll gladly provide some compute to people with a cool idea if that results in my company name listed on a paper in a top conference. Especially if the people are from a top university.


Well now that they have a promising result, maybe.


They had this promising result before they posted the paper.


Cool paper. Really interesting to see how even quite straightforward architectural modifications haven't yet all been exhausted yet, despite all the resources being poured into LLMs


The problem is that they have to be tested for 7B models at least to show promise for larger models. And that requires significant compute resources.


Due to some of my personal experiences over the years w/ model development, I believe that this is more due to a failure of the current mainline version of Transformers (the ++ version I believe) not scaling properly, vs an indicator of scale.

If that is the case, then it may well be possible to fix some of the scaling issues more apparent with smaller transformer models (maybe not, though). This is at least some of the reasoning that I've been applying when developing hlb-gpt, for example. It's partially also why I think changing how we use nonlinearities within the network might impact scaling, due to some of the activation spikes used in more linear regions of the network to control network behavior in a way not originally intended.

Agreed that it does require a ton of resources though. But I do think that the problem can be solved on a smaller scale. If we don't have a cleanly logarithmic curve, then I think that something is dearly wrong with our base architecture. (However, of course, I may entirely be missing something here).


I wonder whether we're missing out on techniques that work well on large models but that don't show promise on small ones


More like we're missing out on techniques full stop. Proving things at scale is GPU expensive and gatekeeps publication and therefore accessibility.


They make the yoghurt, then pasteurise it (I guess so it has a longer shelf life). So it tastes roughly like yoghurt but doesn't have any of the good bacteria. I've also seen that sometimes lactobacteria are then artificially added back in so that they are present but in a controlled way.


> As of July 3, 2023, we’ve disabled the Browse with Bing beta feature out of an abundance of caution while we fix this in order to do right by content owners. We are working to bring the beta back as quickly as possible, and appreciate your understanding!


Demo website with speech-speech translation examples https://google-research.github.io/seanet/audiopalm/examples/


> We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.

Direct link to demo video showing speech-to-speech translation: https://google-research.github.io/seanet/audiopalm/examples/... (see website for more example)


I was thinking atomic as in "atomic operations"


bfloat16 is probably familiar only to ML practitioners; it's a reduced precision floating point format designed for ML models. I was surprised to learn that the "b" stands for "brain", as in the team at google that developed it along with many other advances in machine learning.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: