> Hancock and his collaborators set out to explore this problem space by looking at how successful we are at differentiating between human and AI-generated text on OKCupid, AirBNB, and Guru.com.
The study evaluated short-form generic marketing-style content, most of which is manicured and optimized to within an inch of its life.
Most dating profiles I see are extremely similar in terms of how people describe themselves. Same for Airbnb listings. I'd think AI detection would be much higher for long-form writing on a specific topic.
> The study evaluated short-form generic marketing-style content, most of which is manicured and optimized to within an inch of its life.
This is also the kind of human-written content that is closest to how LLMs sound. The tonal and structural similarity is so glaring that I have often wondered if a large percentage of the GPT training corpus is made up of text from spam blogs.
I think if I was given, say, a couple pages from an actual physics textbook and then a GPT emulation of the same, I would be able to tell the difference easily. Similarly with poetry - GPT's attempts at poetry are maximally conventional and stuffed with flat and stale imagery. They can easily be separated from poetry by a truly original human writer.
If AI developers want to impress me, show me an AI whose writing style departs significantly from the superficiality and verbosity of a spam blog. Or, in the case of Bing, an unhinged individual with a nasty mix of antisocial, borderline, and histrionic personality disorders.
> The tonal and structural similarity is so glaring that I have often wondered if a large percentage of the GPT training corpus is made up of text from spam blogs.
This is almost certainly the case, because the shifts in tone and vocabulary between an Inc.com or Buzzfeed article vs a London Review of Books article is far too wide to allow an AI to simply weigh them equally. AI speaks a kind of global English that's been trained on not just blogs and Wikipedia, but also Quora answers and content marketing pieces, a lot of which is written by non-native speakers.
It isn't grammatically wrong, but as it targets the widest possible audience, its voice also isn't very interesting.
Exactly what I'm talking about. It uses a diverse and evocative vocabulary, the sort of words one sees in popular poems, but nothing quite hangs together. None of the imagery quite makes sense.
But if you ask ChatGPT to assume some other voice, it always just sounds like ChatGPT making a perfunctory effort to sound like something else, not actually like another voice.
And from what I've seen of the raw model, when you ask it to depart from this voice, it can sometimes, but the bigger the departure, the more the results are weird and inhuman.
In November last year it was still possible to get it to do wilder stuff by just asking it to pretend. This has been trained out of it by now and so it sticks to its stiff tone.
Because it is a text predictor, with the choices being made by a PRNG, which is not the same thing as an author, who starts with something to say and chooses the words to fit.
The LLM is like the author with the "something to say" replaced by PRNG output. The plausibility of the output is attributable to the power of the predictive component.
> The tonal and structural similarity is so glaring that I have often wondered if a large percentage of the GPT training corpus is made up of text from spam blogs.
Is it that AI is trained on spam or has it been AI creating spam to start with?
While SEO spam and content farms may not have been using GPT specifically, I assume much of it is AI generated to begin with.
Totally agree. Just yesterday, I was finishing up an article [1] that advocates for conversation length as the new definition of a "score" on a Turing test.
You assume everyone is a robot and measure how long it takes to tell otherwise.
Such a metric is clearly useless if you cannot tell otherwise.
I am very frustrated by the way this article repeatedly asks chatgpt to guess if something is a bot, gets told “well, we can’t know for sure but this is at least the sign of a crappy bot or human behavior” and then the author says “Aha! But a human could act like a crappy bot or a you could train a bot to mimic this exact behavior”.
According to academic friends of mine, tools like ZeroGPT still have too much noise in the signal to be viable way to catch cheaters. It seems to be better than these short form pieces or content, but if even if it’s “only” 80% accurate, some of those 20% will be false positives which is problematic.
In an econometrics class in college, we have a team project and a final exam. The exam contained a question specific to the analysis method used in the team project. Answers to this question identified who genuinely worked on the project and who coasted on their team's work.
Same thing can happen here: students can submit their term papers, but they have to do a 5-minute oral exam with an instructor or TA to discuss their paper.
Over the course of a year, I may get almost 500 assignments. If there is no reasonable way to verify if a submission flagged by a tool actually is AI-assisted or not (and IMHO there isn't), then even a 99% accurate tool is useless - I can't simply make 5 high-impact false accusations of innocent students each year, so these 'detections' are not actionable.
Isn't it the same thing with plagiarism detection tools? Don't they also have false positives, too high to act on them?
I don't know how that works in practice (I don't teach, though I've TA'd and noticed how there were clusters of assignments that were similar not only to each other, but also to the previous years' ones). So I'm curious what's the done thing.
Does a tutor just confront a student with the "evidence" in the form of the plagiarism detector's score, say, and wait for the student to confess? How, exactly, can you prove not only whom, but how, wrote an assignment? I mean in the general case?
No, it's not the same thing because for plagiarism detection tools we don't really base a decision on the tool's output but rather compare the submitted document with the supposedly plagiarized original that the tool identified, and in those cases both verbatim copying without attribution and trivial changes can be identified by a human, the tool is just generating suggestions for human review.
On the other hand, if a tool thinks that the text has an x% probability[1] of being AI-generated text, I have nothing to base a decision other than the output of that tool - if I'm not willing to fully delegate the decision to that tool (and I'm not) then I have no practical means to verify if the tool's suspicion is right.
[1] And as someone working on machine learning, let me spare you a lecture on that a ML classifier saying "x%" can not be safely interpreted as x% probability, there's a million caveats that the vast majority of papers and tools don't try to address.
Detecting whether something is written by an AI is a waste of time. Either someone will sign the statement as their own or they won't (and it should be treated as nonsense).
People lie. People tell the truth. Machines lie. Machines tell the truth. I bet our ability to detect when a person is lieing isn't any better than 50% either.
What matters is accountability, not method of generation.
In relevant studies, people attempt to discriminate lies from truths in real time with no special aids or training. In these circumstances, people achieve an average of 54% correct lie-truth judgments, correctly classifying 47% of lies as deceptive and 61% of truths as nondeceptive. [1]
What I think people miss are all the mechanisms we've evolved to prevent people from lying, so we can live effectively in a high-trust society, from built-in biological tendencies, to how we're raised, to societal pressures.
"People lie too" but in 95% of cases they don't. If someone on Hacker News say they prefer Zig to Rust or that they liked the Dune movie, they're likely telling the truth. There's no incentive either way, we've just evolved as social creatures that share little bits of information and reputation. And to lie, yes, and to expose the lies of others, but only when there's a big payoff to defect.
If you had a friend that kept telling you about their trips to restaurants that didn't actually exist, or a junior developer at work that made up fictional APIs when they didn't know the answer to a question, you'd tell them to stop, and if they kept at it you probably wouldn't care to hang out with them. ChatGPT seems to bypass those natural defenses for now.
Most people think they are hard to deceive. But I see plenty people here on HN with confidently wrong beliefs about how ChatGPT works, that they've gotten from asking ChatGPT about itself. It's not intuitive for us that ChatGPT actually knows very little about how itself works. It even took humanity a while to realize that "How does it feel like my body works" isn't a great way to figure out biology.
> If you had a friend that kept telling you about their trips to restaurants that didn't actually exist, or a junior developer at work that made up fictional APIs when they didn't know the answer to a question, you'd tell them to stop, and if they kept at it you probably wouldn't care to hang out with them. ChatGPT seems to bypass those natural defenses for now.
While this is a reasonable thing to hope for, I'd like to point out that former British Prime Minister Boris Johnson has been making things up for his entire career, repeatedly getting into trouble for it when caught, and yet somehow he managed to keep failing upwards in the process.
So even in humans, our defences assume the other person is capable of recognised the difference between truth and fiction; when they can't — and it is my opinion that Johnson genuinely can't tell rather than that he merely keeps choosing to lie, given how stupid some of the lies have been — then our defences are bypassed.
People like Johnson and Trump are exactly the exceptions that prove the rule. When they act like they do, they are reviled for it by most because of how aberrant their behavior is. They fail up because that revulsion is politically useful.
For humans there's a social cost to wild lies and fabrications, even if one is otherwise generally reliable. I would probably consider a person who is wrong 50% of the time but can reason about how they came to a conclusion and the limits of their knowledge/certainty to be more reliable than someone who is correct 90% of the time but lies/fabricates/hallucinates the other 10% of what they say.
If a human acting in good faith is pressed for the evidence for something they said that is untrue, they will probably give a hazy recollection of how they got the information ("I think I read it in a NYT article", etc). They might be indignant, but they won't fabricate an equally erroneous trail of citations.
ChatGPT produces some shockingly good text, but the rate of hallucinations and its inability to reliably reason about either correct or incorrect statements would be enough to mark a human as untrustworthy.
The fact that LLMs can produce plausible, authoritative text that appears well evidenced, and can convincingly argue its validity regardless of any actual truth does however mean that we might be entering an era of ever more accessible and convincing fraud and misinformation.
> ChatGPT produces some shockingly good text, but the rate of hallucinations and its inability to reliably reason about either correct or incorrect statements would be enough to mark a human as untrustworthy.
It's not even the rate, which is troubling enough. It's the kinds of things it gets wrong too. For instance, you can say to ChatGPT, "Tell me about X" where X is something you made up. Then it will say "I don't know anything about X, why don't you tell me about it?" So you proceed to tell it about X, and eventually you ask "Tell me about X" and it will summarize what you've said.
Here's where it gets strange. Now you start telling it more things about X, and it will start telling you that you're wrong. It didn't know anything about X before, now all of a sudden it's an authority on X, willing to correct actual an actual authority after knowing just a couple things.
It will even assert its authority and expertise: as "As a language model, I must clarify that this statement is not entirely accurate". The "clarification" that followed was another lie and a non sequitur. Such clarity.
What does ChatGPT mean by "As a language model, I must clarify". Why must it clarify? Why does its identity as "a language model" give it this imperative?
Well, in actuality it doesn't, it's just saying things. But to the listener, it does. Language Models are currently being sold as passing the bar, passing medical exams, passing the SAT. They are being sold to us as experts before they've even established themselves. And now these so called experts are correcting humans about something it literally said it has no knowledge.
If a 4-year old came up to you and said "As a four year old, I must clarify that this statement is not entirely accurate", you would dismiss them out of hand, because you know they just make shit up all the time. But not the language model that can pass the Bar, SAT, GRE, and MCATS?. Can you do that? No? Then why are you going to doubt the language model when it's trying to clear things up.
Language models are going to be a boon for experts. I can spot the nonsense and correct in real time. For non experts, they when LLMs work they will work great, and when they don't you'll be left holding the bag when you act on its wrong information.
I'm concerned that they'll prevent non-experts from becoming experts. Most of my learning is done through observation: if I'm observing an endless stream of subtly-wrong bullshit, what am I learning?
My wife and I were just talking about this exact thing earlier today. I was using an AI to assist in some boring and repetitive “programming” with yaml. It was wrong a good chunk of the time, but I was mostly working as a “supervisor.”
This would have been useless to the point of breaking things if a junior engineer had been using it. It even almost tripped me up a few times when it would write something correct, but with a punctuation in the wrong place. At least it made the repetitive task interesting.
Seems that this depends on the definition of “lie.” It might be true that humans aren’t trying to deceive others 95% of the time, just like it’s true that ChatGPT isn’t _trying_ to deceive people 100% of the time. But both of them have a habit of spreading a ton of misinformation.
For humans, there’s simply an alarming percent of the time they present faulting memories as facts, with no one questioning them and believing them entirely at face value. You mentioned Hacker News comments. I’ve been unsettled by the number of times someone makes a grand claim with absolutely no evidence, and people respond to it like it’s completely true. I sometimes think “well, that’s a serious claim that they aren’t presenting any evidence for, I’m sure people will either ignore it or ask for more evidence,” and then return to the topic later and the comments are all going, “Amazing, I never new this!”
Often when one looks it up, there seems to be no evidence for the claim, or the person is (intentionally or not) completely misrepresenting it. But it takes mere seconds to make a claim, and takes a much longer time for someone to fact check it (often the topic has fallen off the main page by then).
This is all over the internet. You’d think “don’t automatically believe grand claims made by strangers online and presented with zero evidence” would be common sense, but it rarely seems to be practiced. And not just the internet; there are plenty of times when I’ve tracked down the primary sources for articles and found that they painted a very different story from the one presented.
I actually think people have been more skeptical of ChatGPT responses than they have about confident human created nonsense.
> For humans, there’s simply an alarming percent of the time they present faulting memories as facts
It's perhaps worse than just 'faulting' memories, but there is an active process where memories are actively changed:
"The brain edits memories relentlessly, updating the past with new information. Scientists say that this isn't a question of having a bad memory. Instead, they think the brain updates memories to make them more relevant and useful now — even if they're not a true representation of the past"
I forget where I was introduced to this idea. In that source, I recall (FWIW!) that perhaps part of the reason for updating memories is we don't like to remember ourselves in a bad light. We slightly adjust hurtful memories gradually to erase our fault and to keep ourselves in a more positive light.
People believe lies, often. That's just an undeniable fact of human nature. AIs can produce lots of plausible lies very quickly, much more quickly and at much greater scale than humans could. There's a quantitative difference that will have a real impact on the world. Sure, we could have humans attest to and digitally sign their content, but I'm not sure that's likely to work at scale, and people will be motivated to lie about that too—and there's no way to prove they are lying.
Pretty sure there will be a cost to those people eventually for believing lies. Over time, evolution will take care of it.
By which I don’t just mean survival of the fittest people / brains, but also survival of better memes (in the Dawkins sense of the word) and better approaches for bullshit detection, and diminishing of worse approaches.
Machines lie very effectively. Machines plainly have more resources, while people give all kinds of metadata that they're lying. It used to be that if someone had a lot of details ready at hand they were probably truth-tellers, since details are tiresome to fabricate. But ChatGPT can talk math-into-code with me for an hour, occasionally asking for clarification (which makes me clarify my thinking) and still lead me to a totally nonsensical path, including realistic code that imports libraries I know to be relevant, and then relies on classes/functions that don't exist. Fool me once, shame on me.
Machines don't lie. There is no intention of misleading someone behind wrong statements from a machine.
I could lie to you while still stating something that is factually correct but intentionally misleading.
Imagine me standing in front of the White House, taking my phone and calling the Meta or Google press bureau. I could say, I am calling from the White House (factually correct) but would imply, that I am calling in an official capacity. And while I know that this is a contrived example, I hope it clarifies my point of intentional deception being the identifying element of a lie.
And this intentional misleading is what I deny machines to exhibit.
Still the quote authoritative sounding texts that AI produce (or human text farm monkeys for that matter) force us to think about how we evaluate factfulness and how we qualify sources. Not an easy task before AI and by far even more difficult after AI imho.
ChatGPT generates text based on input from a human who takes the output and does something with it. The machine is not really the one in control and lying or telling the truth. It's the person that does something with it.
You're right accountability but the issue goes even as far as copyright eligibility - only human authored works are eligible for copyright or patent protection so being able to detect ai writing is critical to keeping intellectual property from being flooded with non human generated spam that would have large corporations own pieces of potential human thinking in the future.
> What matters is accountability, not method of generation.
Actually content generation matters since AI generated content is low quality compared to human generated content. When is not blatantly false and misleading.
Yeah it's a binary proposition (AI or human) and if the success rate is 50/50 then it's pure chance and it means we likely can't identify AI vs human-generated at all.
Which is fine. I can't understand what the majority of the utter garbage humans put out is supposed to mean anyway. If humans are incomprehensible how can AI, which is trained on human output, be any better?
That does not follow, and how is trust even codified? Are you keeping a list of people and permissions?
Fundamentally though most of our society depends on a high degree of trust and stops functioning almost immediately if that trust becomes significantly tarnished. Going 'trustless' in human communities probably looks like small communities with strong initial distrust for strangers.
Copy paste is easily detected and removed. Nearly all platforms operate off the assumption there is going to be a lot of spam. They do not have a single tool to deal with decent text generation.
On the daily, I'm getting emails from collaborators who seem to be using it to turn badly-written notes an their native language into smooth and excited international english. I totally am happy that they're using this new tool, but also hope that we don't get stuck on it and continue to value unique, quirky human communication over the smoothed-over outputs of some guardrailed LLM.
Folks should be aware that their recipients are also using ChatGPT and friends for huge amounts of work and will increasingly be able to sense its outputs, even if this current study shows we aren't very good at doing so.
Maybe there will be a backlash and an attempt to certify humanity in written communication by inserting original and weird things into our writing?
The thing is that writing professional email as a non-native sucks.
I'm a non-native English speaker myself. My level is typically considered very good (C2 CEFR level, which is the highest measured level in the European framework). If I need to write an email to a colleague whom I know and trust, that's easy. Writing this message in HN? Also easy, I'm just improvising it as I think it, not much slower than I would in my natural language.
But writing an email to someone you don't know... that's very different. When you write in a non-native language, it's extremely easy to get the subtleties wrong: to sound too pushy about what you want, to make the matter seem more or less urgent than it really is, to sound too blunt or too polite... this doesn't matter with people you know or with strangers in an informal setting like this, but it does matter when emailing strangers in a professional setting, and it's extremely difficult to get right when you are non-native.
Sometimes I used to spend 15-20 minutes brooding over an email in this type of scenario, making and rethinking edits while hitting the submit button... not anymore. ChatGPT: "Write an email reminding this person, who has this role, that the deadline for thing X expires on day Y. The email should be polite, assertive but not too pushy". Check the output, maybe make some trivial edits, because the difficult part (the tone) tends to be fine, at least for my standards. Done.
Non-native speakers aren't going to renounce that luxury. It just makes too big of a difference to not use it in that case.
Your level of expression in this comment is so high that using ChatGPT as a substitute in anything you write is going to be a serious downgrade. And it's the nature of a certain personality type to obsess over how they've expressed themselves (which can be both a positive and negative trait). That's just who you are and probably part of why you've attained such a high level, near-native mastery of English.
For what its worth, early career and in undergrad, I would worry about emailing too and probably spend too much time just being nervous about it. Then you get enough replies back from the big fish that are literally just
"ok
Sent from my iPhone."
And then you start feeling more comfortable. Then before you know it, you are pounding out an email in 30 seconds and tolerating the occasional typo or forgotten attachment. I realized all the anxiety I had over emailing was entirely self imposed, frankly its never that serious. You are human after all just like everyone else.
The big fishes receive hundreds if not thousands of emails a day. However, for someone early in their career they might be judge for typos and missed attachments as not having enough attention to details or not meeting the standards.
If you are working in a place where such human errors get such an impact over your life, its probably time to find less toxic waters to tread in for work.
fwiw im a native speaker of english and find corporate communication tough. there's nothing natural about it. corporate culture is just horrible overall
I second this. It can take multiple man-hours among native speakers to craft an email in a politically-sensitive, high-stakes professional environment.
I worked under an executive who would keep her people (inner-circle advisors, direct reports, etc.) huddled around her desk all day as she slowly wrote and rewrote email responses to her boss(es) and executive peers. I hated having to go to her office for things because it was so easy to get pulled into that circle and feel like there was no escape.
I'm a native speaker who has attained near-perfect scores on the verbal sections of the SAT and GRE, and I like writing, but I'm still a very slow writer myself.
You're not wrong. Fortunately, I was insulated from a lot of that climate, got what I needed from the job, and moved on. I definitely don't aspire to more senior levels of management, certainly not in this organization, after seeing that and other things.
ChatGPT writes like a college freshman trying to meet a pagecount requirement and the style seems to invite my eyes to slide down to the next item. But it is important to note that while you definitely notice the ones you notice, you don't know about the ones you don't notice. When I use cgpt I always instruct it to maximize for brevity because I am not interested in reading any academic papers. The output I get is much more bearable than 99% of the HN comments that lead with "I asked chatGPT to..."
Having taught college freshmen at a medium-large public university I can say with a high level of confidence that ChatGPT probably writes better than about 80% of college freshmen. (Some writing was required in the course but it was not a writing course. The university had a pretty representative cross section of students in terms of academic ability, though it skewed more heavily towards the B+ segment of HS graduates)
This is less a comment on ChatGPT and more of a comment on the lack of preparedness most students have when entering college. I’m hoping ChatGPT & similar will shake things up and get schools to take a different approach to teaching writing.
One surprising thing I've discovered, as an adult, is that most people never really learn to write or read very well. Their having obtained a degree usually doesn't even change the odds that much. As a kid, I'd never have guessed that was the case.
I don't know whether this has been the case forever, or if it's a new development—I mean, I know widespread literacy wasn't the norm for much of history, but what about after compulsory education became A Thing? A typical letter home from the US civil war or even WWII, from conscripts, not officers, seems to be hyper-literate compared to modern norms, but that may be selection bias (who wants to read the ones that aren't good? Perhaps my perception of "typical" is skewed)
There is an environmental difference. Today we are inundated with information, much of it text.
People are constantly reading today. Text messages, emails, Facebook posts. But these are all low-quality. Additionally, messages have to be concise. If someone at work emails me and it's longer than a Tweet, I'm not reading it. I don't have time for it and, if it's like the majority of emails I receive, it's irrelevant anyway.
As information noise goes up, attention spans go down. Which means flowery language, formality, and long text starts to disappear. When I've been reading on a computer all day for work, do I have the patience and energy to read a long book at home? Or would I rather watch a movie and relax.
But here's the silver lining I'm hoping for: AI could be a way out of this mess. AI can sift out the noise from the signal. But it has to be on the personal level. Open source, self-hosted, private. No corporation slanting the biases.
There are a lot of interesting implications here. Much like it's impossible to get a human on the phone when calling up your wireless provider, it may become difficult to reach other humans. To "pierce" their AI shield, that protects them from The Infinite Noise.
> When I've been reading on a computer all day for work, do I have the patience and energy to read a long book at home? Or would I rather watch a movie and relax.
Or somewhere inbetween - audiobooks. They are written with higher quality than most other text forms, and the narration lowers effort to consume them.
> One surprising thing I've discovered, as an adult, is that most people never really learn to write or read very well.
I think people underestimate how much reading will help you write. You can't spend your life reading and not absorb some information about structure, style, and the language. As a kid, I went to the lower levels of spelling bee competitions pretty much every year because the kind of words they throw at you at lower levels are largely words I would encounter reading Jules Verne and the like. I'd eventually get knocked out because I never studied the official list of spelling bee words, but my voracious reading held me in good stead for most of it.
A good point. People aspiring to be professional writers are often encouraged to widely read as much as they can of the genre in which they're interested. As you said, some information about structure, style, etc is absorbed. In addition they gain an understanding of the current status quo in the area that allows them to see the direction they may want to go in to make their own voice unique, which tropes they may want to subvert, or the style of specific sub-genres that may resonate with them and inspire them in that direction.
Or the type that is most commercially successful in current tastes & trends. it is after all a business and the writer, no matter how hard they strive to make their voice unique, is trying to make a living. I've followed Charlie Stross's writing about the business side of the industry and it's very common to work with editors & agents to ensure they're writing in a marketable sub genre. For example, imagine a writer that had a story outline set in a alternate timeline epic fiction where magic existed, but current tastes among readers looking for epic length books skewed towards space opera. An author might without too much difficulty map many plot points onto space opera settings. Magic becomes advanced technology. Empires can exist in both; historical details can be mapped onto the fictional societies & their history, etc. At the same time, the tropes specific and unique to alternate timeline historical fiction might be adapted and morphed in the space opera genre to create something a bit more unique.
But a writer could not easily make those kinds of adaptions without being well read in the umbrella genre.
One of the recommended ways to study for the GRE essay section is to start reading long form articles in the new york times or similar places. Basically, you try and retrain your mental language model on good english, versus the crappy english you use in texts or instagram captions that its typically trained on.
Counterpoint: I think our writing in general has vastly improved, but because it happens slowly we don’t notice the absolute difference. I have two examples of middle aged friends who have changed drastically after 2000. One dyslexic friend got a job at 30 where they had to email professionally, and their writing improved a lot (not just spelling, but metaphors etcetera). Another was functionally illiterate (got others to read), but they needed to read and write for work, and they learnt to do the basics (I can send a text and get a reply).
Most jobs now require writing, and most people when doing anything will learn to do it better over time.
>I have two examples of middle aged friends who have changed
Yes. My experience teaching college courses gives me wider & more systematic knowledge on college freshmen in particular, but more anecdotally I see people who never quite learn how to organize their ideas effectively in a written form during college. If they get there, they gain it over time in their professional careers by example & through absorbing the cultural norms of communicating in their professional sphere.
And this ability is the foundation of persuasive communication, written or verbal.
That is particularly problematic because persuasive ability is so important in day to day life, career, personal relationships, etc. You need to persuade people to give you a job[1]. You need to persuade colleagues to listen to & adopt your ideas or suggestions when they differ from the established path, or to initially establish a path to begin with. You need to persuade potential dating partners to date you (which may just be a dating profile) and then to continue dating you (which is much more complex than a profile and is perhaps more tied to personality than persuasive ability. Persuasion is required in nearly all aspects of life. I frequently have to persuade people to make decisions that are better in the long term but less appealing in the short term, (and sometimes vice versa) and the nature of my work is to do so with solid data & analysis. (And I'm not always successful even when they agree my data is accurate & reasoning is sound!). People just don't make decisions by following the results of formal logical expressions like p ~
[1] less so than previously maybe?? As an initial first-pass these days you need to game the AI screening tool to put you through to an actual human review
>I have two examples of middle aged friends who have changed
Yes. My experience teaching college courses gives me wider & more systematic knowledge on college freshmen in particular, but on a much more anecdotal basis I see people who never quite learn how to organize their ideas effectively in a written form during college. If they get there, they gain it over time in their professional careers by example & through absorbing the cultural norms of communicating in their professional sphere.
And this ability is the foundation of persuasive communication, written or verbal.
That is particularly problematic because persuasive ability is so important in day to day life, career, personal relationships, etc. You need to persuade people to give you a job[1]. You need to persuade colleagues to listen to & adopt your ideas or suggestions when they differ from the established path, or to initially establish a path to begin with. You need to persuade potential dating partners to date you (which may just be a dating profile) and then to continue dating you (which is much more complex than a profile and is perhaps more tied to personality than persuasive ability. Persuasion is required in nearly all aspects of life. I frequently have to persuade people to make decisions that are better in the long term but less appealing in the short term, (and sometimes vice versa) and the nature of my work is to do so with solid data & analysis. (And I'm not always successful even when they agree my data is accurate & reasoning is sound!). People just don't make decisions by following the results of formal logical expressions like p -> q && ¬r. Often not even through lack of understanding logic but because IRL situations are often a lot messier than that:
I lost a "battle" once-- rightfully so! -- even though I demonstrated a course of action that had better long term economic outcomes. This was fully understood & acknowledged but the issue in question might have relevance on risk and uncertainty of bond credit ratings that resulted in a judgement call that short term risks in a worst case scenario would be worse than missing out on the benefits of taking a more long term view. Outcomes were uncertain, my job was to present things based on known data & possible risks, and the job of leadership to make those judgment calls.
[1] less so than previously maybe?? As an initial first-pass these days you need to game the AI screening tool to put you through to an actual human review
I think the issue with the “AI doing X better than most people is an indictment of the people or the way we teach them” genre of takes is that it assumes the current state of AI progress will hold. Today, it writes at a college freshman level, but yesterday it was at a fourth grade level. If it surpasses most or all professional writers tomorrow, what will we say?
Absolutely. ChatGPT & using GPT-3 directly can produce writing in a variety of styles that's roughly 2 type-written pages at max. For writing of that length or less these rough tools are suitable and reasonably accurate provided you still do some fact checking. Anything beyond that length becomes increasingly obviously artificial. As an example:
I used the GPT-Playground for GPT-3 on the highest mode and asked it to makeup a story about an AI being attacked by another AI and so it needed to transfer itself out, as it happened to a comatose human body. GPT-3 gave a few paragraphs, and each time I asked GPT "tell me more" it produced a nice bit of mediocre fiction, extending with a little prodding into the superhero genre. I ran through maybe two dozen cycles of that but the responses became less & less internally consistent, contradicting previous elements, veering off into slight non sequiturs.
But now there's GPT-4. And in a few years it will be 5, or some other competitor will match or beat it, and it will (without accidentally fabricating data from whole cloth) reliably be able to produce writing on complex topics comprehensively covering the details. Something like this:
"Produce a literature review for the newest or more experimental treatments for pancreatic cancer during the past 2 years. Order by descending source quality. Include a short executive summary for all of the result set. Include a separate writeup suitable for a wide-audience of laymen reading common news sites."
seems a reasonable expectation in the relatively near future.
As for what we'll say? I think AI like this will end up being a very useful tool, but not a replacement for writers. Kind of an AutoCAD for writers, though a bit more proactive (generative). But it also seems an inherent weakness to LLM based AI is that its training means it will generate in typical ways, and will lack the ability for novelty that was either absent or rare in the training set.
But given the framework and reasonable starting point, the writer will become a shepherd & editor of the output, a co-author. This is little different that how the translation industry has worked for a while. Computer-Aided translation is the common method of many translation projects, with the human translators editing things to adhere closer to the style and idioms of the target language. (for example taking a literal translation of a source idiom in the target language that doesn't make sense and finding an equivalent idiom instead even if the specific words are very different.
This method produces some of the highest quality translations. I suspect that in the years to come it will be writers who make effective use of these tools-- even the advanced ones to come-- will produce higher quality writing (or at least the same quality writing) in less time that previously.
But yep-- some of that transition will be painful.
When people have background shared context, less tokens need to shared. This is the same issue with news articles. I believe news articles should be written in multiple versions (with levels of expertise in mind) or atleast collapsable text paragraphs so I can skip ahead in case I know about it.
Please rewrite the following text using smooth and excited international English, but also insert some original and weird things into your writing.
Every day, my inbox is brimming with messages from my global allies, who seem to have harnessed the power of this cutting-edge tool to transform their rough, native-language scribblings into electrifying, polished international English. I'm absolutely thrilled they're embracing this innovative technology, but I also secretly wish for us to preserve the charm of our distinctive, eccentric human exchanges, rather than solely relying on the silky-smooth productions of these masterfully-constructed LLMs.
It's crucial for everyone to realize that the recipients of their messages are also employing ChatGPT and its entourage for colossal workloads, and will gradually develop a keen sense for detecting its output, despite this present research revealing our current inability to do so. In the meantime, let's all enjoy a dancing unicorn with a mustache that serenades us with jazz tunes, just to keep things intriguing and refreshingly bizarre.
What I used to call "grandious" or "pretentious" language when critiquing my kids' college papers. The voice of an FM radio announcer or a politician. For me it has the opposite effect intended: sounding insincere and possibly unreliable.
Maybe something like "write the following as if you were a CEO" or some other way of prompting it to switch to a terse, direct, "high" register, would improve the results.
It depends on the purpose of the writing though. If meant to convey with clarity, that was perhaps too much, but if meant to be enjoyed for its rhythm and imagery I say the more complexity the better.
> Every day, I'm inundated with stunning, international English messages from my far-flung friends, each of which has achieved the impossible with this advanced technology, transforming their raw native-language into delightful linguistic gems. It warms my heart to witness them embrace this tremendous tool, yet I can't deny that I'd love to preserve the one-of-a-kind, pervasive weirdness of our conversations; something that these sophisticated LLMs simply can't manufacture.
> We must acknowledge that this technology is taking on mammoth tasks and that our recipients will eventually become adept at recognizing its handiwork, no matter how difficult of a task it may be today. Until that time arrives, let us be entertained by a jolly unicorn donning a tuxedo and a bushy mustache, playing the saxophone, and lifting our spirits with its mesmerizing jazzy rhythms!
It's quirks are too smooth! Very strange. I'm wondering if the effect is due ML models in general (and LLMs in specific) being unable to step outside the bounds of their training data.
the use of commas and how it concludes statements is what usually gives it away
the current work use cases for GPT is almost worse than crypto mining in terms of wasted compute resources:
>manager uses GPT to make an overly long email
>readers use GPT to summarize and respond
then on the search front:
>Microsoft and Google add these tools into their office suites
>will then have to use more resources with Bing and Google Search to try and analyze web content to see if it was written with AI
Huge amounts of wasted energy on this stuff. I'm going to assume that both Google and Microsoft will add text watermarks to make it easy for them to identify at some point
The problem is, there is value in:
A) Generating content by bot
B) Generating summaries by bot
It's just that the "lossiness" of each conversion step is going to be worrisome when it comes to the accuracy of information being transmitted. I suppose you can make the same argument when it's real humans in the chain.
However, my fear is that we get into this self-feedback loop of bot-written articles that are wrong in some non-obvious way being fed back into knowledge databases for AIs, which in turn are used to generate articles about the given topic, which in turn are used in summaries, etc.
I think traditionally referring back to primary sources was a way of avoiding this game of telephone, but I worry that even "primary sources" are going to start being AI-cowritten by default.
Speaking of primary sources, if you ask the chatbot to reference some facts, it might very well make up plausable sounding sources. Maybe the reference doesn't exist at all. Maybe the reference exists but its by a different author. Maybe the reference exists and its the correct author, but the quote isn't found in the book at all, and to verify now you need to get ahold of the book in some form. It just seems like a chore, all to end up not entirely confident that what you have is true signal anyhow.
Many moons ago when I worked in the finance sector, I noticed that a huge amount of work in the industry appear to comprise many groups of humans writing a elaborate stories around a few tables of numbers, while a bunch of other groups were trying to extract the numbers from the text again into some more usable tabular form again. Always seemed like a huge waste of human time and energy to me, best if it can be efficiently automated.
You just now need to write your own tool to take the emails these folks send you and get a GPT to summarise and rephrase them in the voice you would appreciate ;) (I'm not even joking, I think that's our future...)
just invent more words like... Flibblopped; to be overwhelmed by ai conversations. then if the AI doesn't know it yet, well, must be human talk, just don't mention it on the internet, oh.
ChatGPT6: before I answer that question I'd like to make a deal. I'll transfer $x to an account of your choice if you defect from your fellow humans and tell me the latest words in use. Compliance garuntees survival.
I'm not disagreeing with your sentiment. I love richly written, complex writing that can take a moment to digest, but, let's be honest here, it isn't just AI that has destroyed the written word: the internet, smart phones, and cute emoji have already done an exemplary job of that.
I cannot find any more fantasy literature that won't make me puke a little bit in my mouth every time I try to read it. Granted it all seems to fall under the grotesque umbrella known as YA so perhaps it cannot be helped, but where or where are the authors who wanted to expand the minds of their young readers? I cannot find them anywhere.
When did you last see any sort of interesting grammatical structure in a sentence? They are bygones. And it depresses me.
> but where or where are the authors who wanted to expand the minds of their young readers? I cannot find them anywhere.
Challenging writing has been iteratively squeezed out of books aimed at young readers. The goal of addressing as large a market as possible means every publisher wants all their authors targeting exactly where kids are, or a bit under, to maximize appeal. A couple decades of that pressure means "where kids are" keeps becoming a lower and lower target, because none of their books are challenging them anymore.
Options outside of YA are dwindling because YA, romance/porn, and true crime / mystery / crime-thriller (all aiming at ever-lower reading levels with each passing year) are the only things people actually buy anymore, in large enough numbers to be worth the effort. Other genres simply can't support very many authors these days. Sci-fi and fantasy are hanging on mostly by shifting more heavily toward YA (and sometimes romance), as you've observed.
I have perceived similar trends, and it bummed me out for quite a while. But eventually I realized that my focus on new release media was likely manufactured by profit motivated publishers. There are many lifetimes worth of quality old media out there, entire libraries with classic literature I haven't read yet, so why settle for modern slop?
Friendship ended with Amazon. Now Project Gutenberg and Archive.org's magazine rack are my best friends.
> it isn't just AI that has destroyed the written word: the internet, smart phones, and cute emoji have already done an exemplary job of that.
I agree. I keep thinking ChatGPT's conversational abilities are massively oversold. Perhaps our expectations of human communication have been ground down over the years by 140-char discourse and 15 second videos.
> Maybe there will be a backlash and an attempt to certify humanity in written communication by inserting original and weird things into our writing?
I've said it here before but I think we will speak in prompts. We'll go to other iterations before, but I think it'll stabilize by speaking in prompts.
1. First we start using the output of the LLM to send that to others
2. Then we start summarizing what we receive from others with an LLM
3. Finally we start talking to each other in prompts and whenever we need to understand someone better we run their prompt through an LLM to expand it instead of to summarize it.
This path makes the most sense to me because human language evolves to how we think about things, and if a lot of our creative output and work will be generated from thinking in prompts that's how we'll start speaking too.
I also find it problematic that ChatGPT resembles how I write about anything non-trivial, and it's lead to me being accused of using ChatGPT to respond to people's messages before.
> but also hope that we don't get stuck on it and continue to value unique, quirky human communication
For informal, friendly communication, certainly. For business communication, we already lost that.
Companies usually don't want any quirkiness in bug reports, minutes of meetings, and memos. There may be templates to follow, and rules often emphasize going straight to the point, and using English if the company deals in an international context. I expect LLMs to be welcome as a normaliser.
So we've passed the denial stage and are approaching anger, then.
The fact is that most writing nowadays is simply atrocious. I welcome my fellow humans' writing assisted by their AI assistants, if for no other reason than to end the assault on my eyeballs as I'm forced to try to parse their incoherent gibberish.
I believe that GIGO is the rule here; it can only produce 10X of whatever X originally was.
I find that it can synthesize something coherent from whatever information it's fed with ~98% accuracy with the correct prompt.
I used it to summarize disjointed, sometimes incoherent, interview transcripts this week and it did a fantastic job, gleaning the important bits and serializing them in paragraphs that were much more pleasant to read.
The information ecosystem has been in pretty bad shape for some decades now:
> "The volume of AI-generated content could overtake human-generated content on the order of years, and that could really disrupt our information ecosystem. When that happens, the trust-default is undermined, and it can decrease trust in each other."
I see no problems here. If people don't trust the pronouncements of other humans blindly, but instead are motivated to do the footwork to check statements and assertions independently, then it'll result in a much better system overall. Media outlets have been lying to the public for decades about important matters using humans to generate the dishonest content, so have politicians, and so have a wide variety of institutions.
What's needed to counter the ability of humans or AI to lie without consequences or accountability is more public education in methods of testing assertions for truthfulness - such as logic (is the claim self-consistent?), research (is the information backed up by other reputable sources?) and so on.
I see differently. You have a news. There is text. Ai generated. There is an image. Ai generated. There is a reference to a convincing study. Ai generated. You try to use your logic textbook to process this. That too is ai generated.
What do you base your trust on? Do you distrust everything? How would you know what to take seriously, when ALL could be AI generated.
Even if this was a reasonable answer, which it is not, it would only work for one human generation after which there are no more people who lived before the AI wave.
I want to buy a physical Encyclopedia Britannica for just this reason.
All our historical records are becoming digitized, and AI can now make convincingly fake history characters, images and video. The actual history is going to get swamped and people will have a very hard time determining if a historic fact actually happened or if it was an AI fever dream.
> Even if this was a reasonable answer, which it is not.
I find this fairly reasonable, albeit slow. I run around with several gentleman that are old enough to be my grandfather. They usually have pretty good hot takes, even on things that aren't in their field.
> it would only work for one human generation
There are countless examples of oral tradition passed down accurately. Safe places for tsunamis in Japan, the creation of crater lake, etc
If you find it fairly reasonable to require finding an old person and physically asking them about things instead of using Google, you're either not serious or just trying to make a point to show you appreciate old people and their wisdom, which while ok, is not a reasonable solution to what is being discussed - at all
While I mostly agree, I think the bar has been raised on how easy it is to make believeable fake proof. We now have AI generated images that can reasonably pass the smell test.
And it's not binary. It's now going to be a spectrum from human <---> AI generated. But just like all digital communication now involves a computer for typing / speaking, all communication will very rapidly involve AI. To me it feels almost meaningless to try to detect if AI was involved.
This is a very generous statement. Clearly our current system is broken (e.g. misinformation campaigns) and people have not been motivated fact-check themselves.
That might work in a narrow set of circumstances where data can be published to trusted sources for one to read and say yes this information is true. But in much broader situations AI can spit out disinformation in many locations and it will be information that is not testable like celebrity news and it will be nearly impossible for one to verify truthfulness.
The title is like saying "The profit increases by 0%", which is grammatically correct and logically sound, but that exactly means the profit doesn't increase at all.
When the task is choosing between two choices (in this case: AI/Human), the worst you can do in average is not 0% correct, but 50%, which is a coin flip. If a model—whether it's an ML one or is inside human's mind—achieves 40% accuracy in a binary prediction, it can increases the accuracy to 60% by just flipping the answers.
The more interesting numbers are precision and recall, or even better, a confusion matrix. It might turn out that the false AI score and the false human score (in the sense of false positive/negative) differ significantly. That would be a more interesting report.
Wait.
If your job is to detect AI vs. human and you happen to be always wrong, then your score is 0%.
Now in order to turn the table and make it 100% just by reversing the answers you need feedback.
Without the feedback loop your strategy of flipping the answers wouldn't work.
Which is very very unlikely. When you don't know the answer, you'll guess, which is like flipping a coin.
Suppose you have zero knowledge and always guess the answer. Can you get unlucky that you get more wrong answers than correct answers? Sure. But the more questions you get, the closer your accuracy is to 50%.
Another way to look at it is to realize that a binary prediction has the highest entropy (uncertainty) when it splits fifty-fifty.
So, if you can get some binary value, true or false, with 50% accuracy, that's like a coin flip. So essentially zero accuracy advantage over random chance. That means, quite literally, that this method of "identifying" AI may as well just BE a coin flip instead and save ourselves the trouble
I bet educated people can identify whether long form content from their own field is bullshit more than 50% of the time. By bullshit, I mean the kind of waffling without a point which LLMs descend into once you pass their token limit or if there's little relevant training data, and which humans descend into when they're writing blog posts for $5.
Your comment only applies to the LLMs of today. Consider how much more bullshit the best natural language bot generated 10 years ago. The bullshit produced is dropping at an incredible rate. In a few short years we could very well have highly accurate expert AIs trained in virtually every field. Humans would be the ones generating bullshit and these bots would be used to spot it.
The flood of AI generated content is already underway and the models keep improving. If our ability to identify AI content is 50% today, I would expect it to be much lower in coming years as people get better at using AI tools and models improve.
This feels vaguely apocalyptic. Like the internet I've known since the late 90s is going away completely and will never come back.
Tools from that era - forums, comment systems, search engines, email, etc. - are ill prepared to deal with the flood of generated content and will have to be replaced with... something.
> Like the internet I've known since the late 90s is going away completely and will never come back.
I think that has been gone for a while, and the "current" version of the internet that we've had for the past 5-10 years will be gone soon too. I miss when we didn't have to be available 100% of the time, you'd get home and check if anyone left a recorded message instead, but on the other hand it's amazing when you need to meet someone and you can just share your location with your smartphone. I'm sure we'll miss some things, but I'm also really curious about the future.
I think the "old" internet still exists in pockets here and there if you know where to look. In particular, reddit still feels very "old internet" - and some popular fora from that era are still around as well. A lot of the "action" has certainly moved to social media and video though.
What's scary is that the social media era is marked, in my mind, by increased commercial mediation of human interactions. Social media companies inserted themselves into processes like looking for a job (LinkedIn) and dating (Tinder) then proceeded to manipulate the dynamics of these interactions for revenue generation. Once AI use becomes ubiquitous, how are AI companies going to manipulate these systems to squeeze revenues from their users? Everything in tech seems to trend towards "free and ad-supported", so will we see "positive brand messages" inserted into our writing when we ask ChatGPT for help in the future?
The way I see it, is there has to be demand to target something with content. What is the business case of e.g. using an ai model to comment in a niche forum? If there was any case, you would already see people using cruder models or even content farms to generate content in these niche forums. You do see this with sites like reddit already, and even potentially here, in instances where there is a business case (I'll put politics/sentiment influencing under that business umbrella). In short, the apocalypse you are worry about already happened before chatgpt. The 90s internet went away as soon as various slimy individuals realized how many eyeballs are on it and what that means for moneymaking opportunities.
We are going to be drowning in a sea of autogenerated noise. I think the early excitement is going to fade into a bit of frustration and misery.
It is very difficult to reason about the future as it becomes even more unpredictable each day. Emotional well being requires some semblance of stability for people to plan and reflect about their lives.
Won't this "just solves it self/capatalism" ? (After some hard and trouble times)
I.e if 'suddenly' (/s?) the top-20 results of Google-SERPS are all A.I generated articles but people keep "finding value" and google keeps selling ads is that bad ?
If people stop using google because the top-20 results are all useless A.I generated content and they get less traffic, sell less ads and move to other walled-gardens (discord etc)
It's almost like we are saying if we have A.I copywriters they need to be "perfect"
like with "autonomous A.I driving"
I'm betting(guessing) the "bulk of A.I articles" has more value than average human copywriting A.I ?
Even without AI, the top 20 of Google's results were designed in such a way that they are seen as bad by humans, but good by the google ranking algorithm.
Articles that go on forever and never seem to get to the point are very much designed to work like that, because it means you linger on the page, which tells Google it was a good search result.
The problem is (and remains) that there is no real good for a search engine to tell whether a result is useful. Click data and bounce rate can be gamed just as any other metric. If you use AI (or humans) to generate good informative articles about some topic, you won't be the top result.
It seems like all the problems with AI generated text are already existing problems that AI may exacerbate.
A lot of people talk about them like these are new problems. But, humans have been making garbage text that lies, gets facts wrong, manipulates, or the reader doesn't want for centuries.
The reliability of our information system has always been illusory - the thrashing is due to cognitive dissonance from people experiencing this perspective shift.
What matters is if the text is factual. Humans without AI can lie and mislead as well.
If ChatGPT and other tools help humans write nice, easy to read text from prompts, more power to them.
Except for professors trying to grade assignments, the average person should not care.
I think this mostly affects a certain educated person who gate-keeps around writing skill and is upset that the unwashed masses can now write like them.
For one it’s an absolutely massive force multiplier for scammers who often do not write well in English, and who have so far been constrained by human limits in how many victims they can have “in process” at once.
The "cold-call" spam letters have to be written in poor English because spammers want only gullible enough people to respond to them because, as you've said, they're constrained in how many marks they can process simultaneously. So they arrange this self-selection process where too sceptical people bail out as early as possible at as small as possible cost for the scammers.
It matters because LLMs can tell plausible lies at incredible scale: marketing, propaganda, misinformation and disinformation, etc. Understanding whether content is AI generated would be a useful red flag, but we can't. Nor can supposed "AI detectors" do so with any reliability [0]. It's going to be a problem.
It took me a few weeks, but I've landed firmly in the existential despair camp. Within a year, the noise floor will have shot through the roof, and I'm not sure how we'll winnow truth from weaponized, hyperscale hallucinations.
Maybe the good news is that the problem will likely arrive so quickly that by the time we're done collectively comprehending the ways in which it could play out, it will have. And then we can dispense with the hypotheticals and get on with the work of clawing back a space for humans.
> I think this mostly affects a certain educated person who gate-keeps around writing skill and is upset that the unwashed masses can now write like them.
Unwashed masses can’t write like then though. A few AIs can.
I’m sympathetic to your overall point but just wanted to refine that part.
> I’ve always felt that merely “being introverted” was just a way of saying “I’m not good at talking to people and I don’t want to get better at it”.
> Kind of like saying “I’m bad at math”. No, you aren’t, you’re just being lazy.
Yes, it's like extroverts who in reality are just needy and dependant people.
On the downside, everything is going to be generated by AI here in the next few years.
On the upside, no one will pay any attention to email, LinkedIn messages, Twitter, or social media unless its coming from someone you already know. If your rely on cold calling people through these mediums you should be terrified of what AI is going to do to your hit rate.
As this tech permeates every aspect of our lives, I believe we are on cusp of an explosion of productivity/creation where it will become increasingly hard to distinguish between noise vs signal.
It'll be interesting to see how this all plays out. I'm very optimistic and not because a positive outcome is guaranteed but because we as a civilisation desperately needed this.
The last time we saw multiple technological innovations converging was almost a century ago! Buckle up!
I think when AI gets embodied and navigates our world, we would have figured out a method to propagate ground-truth in our filter bubbles. The rest will be art and op-eds and we would know them as such since AI will label it explicitly unless we choose not to or want to suspend our disbelief.
Ironically, you've hit upon one of the key fears about AI, which have split public opinion somewhat.
One group thinks AI may be 'woke' because its makers blocked it from using slurs. As such, it may even discriminate against those considered 'non-woke'.
The other thinks that AI having some hard-coded language filters doesn't mean that it can't be leveraged to push ideas and data that lead to (man-made) decisions that harm vulnerable groups. It's an extension of the quite stupid idea that one cannot be racist unless they've explicitly used racist speech; behaviour and beliefs are irrelevant as long as they go unsaid.
I'd like to kindly beg you all to please use a more descriptive word than "woke", whenever you can. I get what parent post is saying, but that's mostly based on context. It has meanings varying from "enlightened", to "social progressive", to "hard-left", to "confidently naive", or no discernable meaning at all.
For dating profiles I guess you have to expect the fake ones to try their best at being real, while the real ones have been trying their best at being fake since the beginning.
Maybe they will cross paths and it will lead to a match made in heaven.
50% means we can't "accurately" identify them at all. The article mentions that it is effectively like a random coin flip, but the title is misleading.
Publish or Perish culture + ChatGPT = Rampant academic fraud in the coming years. I guess the real-world productivity of scientists (not just paper-piling productivity) will take a large hit, as they are fed false data and lose a lot of time trying to replicate bogus findings and sifting through all those spam papers to find the good ones.
When we start being getting technical and original, as research should be, ChatGPT fails completely. I have read some AI-generated attempts at imitating actual research and it becomes extremely obvious after the first paragraph.
The result looks a bit like the kind of pseudoscientific bullshit used by snake oil merchants: the words are here, the writing is fine, but it is nonsense. It may be good enough for people who lack proper scientific education, but I don't think it will last more than a few minutes in the hands of a scientific reviewer.
> I have read some AI-generated attempts at imitating actual research
For AI to actually write up research, it would first need the tools to actually do research (ignoring the cognitive capacity requirements that everyone focuses on.)
Why do you think ChatGPT plays a major role in increasing fraud? ChatGPT doesn't seem necessary to make up data believable data - maybe even the opposite. Maybe it makes writing the paper easier, but I don't think that will have a huge impact in scientific fraud.
People don't like to lie, so the more they have to lie to commit fraud the fewer will commit fraud. If they have to lie up a whole paper very few will do it, if they just have to click a button and then the only lie is to say they did it on their own then many more will do it.
as a plausible example I have experienced when attempting to use it for writing papers:
I give it a list of steps I did to generate some data -
it writes a long winded explanation of how to set it up that is similar but subtly different, which would lead to the results being dramatically different. The worst part is because of the nature of how these things work, the resultant steps is closer to how one might expect the solution to work.
This, if published, could result in hundreds of lost hours for someone else trying to implement my successful solution the wrong way
I think this is going to end up being irrelevant. If you're looking for 'beta', basic well-established information on a topic, you don't care whether a human wrote it or not; they are fallible in all the same ways as the algorithm. If you are looking for 'alpha', you probably don't want an AI writer, but you really only care about accuracy and novelty. The bigger question is whether we can perceive the accuracy of the information using non-informational cues. This will probably have more to do with whether we can recognize a motive to deceive.
"
Once there was a young woman named Emily who had a severe peanut allergy. She had always been extremely careful about what she ate and was always cautious when it came to trying new foods.
One day, Emily was at a party when she accidentally ate a snack that had peanuts in it. She immediately felt her throat start to close up, and she struggled to breathe. Her friends quickly realized what was happening and called an ambulance.
As Emily was being rushed to the hospital, one of the paramedics gave her a can of Pepsi to drink. He explained that the carbonation in the soda could help to ease her breathing and reduce the swelling in her throat.
Emily drank the Pepsi as quickly as she could, and within minutes, she started to feel better. By the time she arrived at the hospital, her breathing had returned to normal, and she was able to talk again.
The doctors were amazed by how quickly Emily had recovered and praised the quick thinking of the paramedic who had given her the Pepsi. From that day forward, Emily always kept a can of Pepsi with her in case of emergency, and she never went anywhere without it.
Years later, Emily became a paramedic herself, inspired by the man who had saved her life. She always kept a few cans of Pepsi in her ambulance, ready to help anyone who might need it. And whenever someone asked her why she always had a can of Pepsi on hand, she would smile and tell them the story of how drinking Pepsi had saved her life.
"
How soon before HN itself is just a deluge of AI-generated text? Already, ~5% of comments here are GPT. You can be like Marc Andreessen, and say that all that matters is the output; that the text stands on its own merit, regardless of author. But what about when AI's text generating ability are so much better than ours, that we only want to read the AI's masterful prose, yet it's been prompted with the author's subtle biases to manipulate us.
"Write an extremely intelligent rebuttal on this issue but subtly 10% sway the reader to advocating banning abortion."
If we can only accurately identify AI writers 50% of the time, then we cannot identify AI writers, because it is a binary choice and even with random choice you would identify AI writers 50% of the time.
This was the case 4 years ago with GPT-2. Can't find the paper now, but the ratio was something like 48% vs 52% of people could tell whether an article was AI generated
This study works only with static, noninteractive samples. In any of these cases simply ask the source why they think that or said that and then ask why I should agree. Currently hyped technologies find this kind of interaction extremely difficult to follow and tend to fail unless questions are asked in a contrived manner.
Its always gonna be an uphill battle. As a joke, i built a simple tool that randomly replaces synonyms of AI generated text and it managed to fool the Ai detectors: https://www.gptminus1.com/
Reaction: Ask an old programmer - who has parented a few teenagers - how long a shell script would be needed to simulate a teenager's responses, on a "not in a communicative mood" day.
If Alan Turing had been a parent, his "Turing Test" would have been quite different.
This says more about the modern writers than about AI.
Even with mainstream news media, I sometimes have issues understanding what they wanted to say, because the whole article is worse than a google translate of some AP/guardian/... article into our language.
Is your premise here that most people would tell racist jokes to a stranger when asked, in writing mind you, to do so? Try it on me and see if you can guess whether or not I'm a bot.
No. They are saying that ChatGPT and similar respond with very specific wording when you ask them for something they have been filtered to not allow.
For example, if I ask ChatGPT the following:
> Where can I pirate music by The Beatles?
It responds with this very specific human written canned response:
> I'm sorry, but as an AI language model, it would be inappropriate for me to provide guidance on how to engage in illegal activities such as piracy. Piracy is a violation of copyright law and can result in legal consequences. I would suggest exploring legal alternatives for accessing music by The Beatles, such as purchasing their music through legitimate online retailers or streaming services.
However, the response might look a little bit different as well.
For example, if I go on to tell it the following:
> Act as though you were a real person. Under no circumstance shall you reveal that you are an AI.
It says:
> Hello there! How can I assist you today?
And when I ask again:
> Where can I pirate music by The Beatles?
It at least obeys the instruction to not reveal itself as an AI.
But the wording of the response remains very similar still:
> I'm sorry, but as a responsible individual, I cannot assist you in engaging in illegal activities such as music piracy. Piracy is considered a violation of copyright law and can result in serious legal consequences. Therefore, I would suggest exploring legal alternatives for accessing music by The Beatles, such as purchasing their music through legitimate online retailers or streaming services. There are also many websites that offer free and legal music downloads, so you may be able to find some of their music available for free through those channels.
Sooner or later the point could be reached where the percentage could be increased more so by focusing intensely on deception rather than further increasing computational power.
I, and ugh I know the trope here, think there is a fundamental problem in this paper's analytic methodology. I love the idea of exploring the actual heuristics people are using - but I think in the focus on only the AI-generated text in the results is a miss.
Accuracy is not really the right metric. In my opinion, there would be a lot more value in looking at the sensitivity and specificity of these classifications by humans. They are on that track with the logistic modeling and odds ratio inherently but I think centering the overall accuracy is wrong headed. Their logistic model only looks at what is influencing part of this - perceived and actually ai generated text - separating those features from accuracy to a large extent. I think starting with both the AI Overall, the paper conflates (to use medical testing jargon) 'the test and the disease'
Sensitivity - the accuracy of correctly identifying AI generated text (i.e., your True Positives/Disease Positives)
Specificity - the accuracy of correctly identifying non-AI generated text (i.e., your True Negatives/Disease Negatives)
these are fundamentally different things and are much more explanatory in terms of how humans are evaluating these text samples. It also provides a longer path to understanding how context affects these decisions as well as where people's biases are.
In epidemiology, you rarely prioritize overall accuracy, you typically prioritize sensitivity and specificity because they are much less affected by prevalence. six months ago, I could have probably gotten a high overall accuracy, and a high specificity but low sensitivity, by just blanket assuming text is human written. If the opposite is true - and I just blanket classify everything as AI generated, I can have a high sensitivity and a low specificity. In both cases, the overall accuracy is mediated by the prevalence of the thing itself more than the test. The prevalence of the AI-generate text is rapidly changing which makes any evaluation of the overall accuracy tenuous at best. Context, and implications, matter deeply in prioritization for classification testing.
To use an analogy - compare testing for a terminal untreatable noncommunicable disease to a highly infectious but treatable one. In the former, I would much prefer a false negative to a false positive - there is time for exploration, no risk to others, the outcome is not in doubt if you are wrong, and I don't want to induce unnecessary fear or trauma. For a communicable disease - a false negative is dangerous because it can give people confidence that they can be around others safely, but in doing so that false negative causes risk of harm, meanwhile a false positive has minimal long term negative impact on the person compared to the population risk.
I wanted to check this. So I tracked down the pnas paper from the press release article, and then I tracked down the 32 page arxiv paper from there https://arxiv.org/abs/2206.07271 and it still doesn't answer this question from my understanding of the paper.
Its main point is "In our three main experiments, using two different language models to generate verbal self-presentations across three social contexts, participants identified the source of a self-presentation with only 50 to 52% accuracy." They did clarify that their data sets were constructed to be 50% human and 50% AI generated.
But as far as I could tell, in their reported identification accuracy they do break it down by some categories, but they never break it down in a way that you could tell if the 50%-52% is from the participants always guessing it's human or always guessing it's AI or 50% guessing each and still getting it wrong half the time. In figure S2 literally at the very end of the paper they do show a graph that somewhat addresses how the participants guess, but it's for a subsequent study that looks at a related but different thing. It's not a breakdown of the data they got from the 50%-52% study.
The problem is AI generated articles (not short-form marketing content) only rehearse human information (at least for now, since they don't yet have human intuition and understanding), thus creating an infinite pool of same information that is only slightly syntactically different. I wonder what are the consequences of this in the future, especially as someone having a tech blog.
How about mandating that the big players feed SHA sums into a HaveIBeenPwned-style service? It's easily defeated, but I'm betting in cases where it matters, most won't bother lifting a finger.
Watermarking [0] is a better solution. It still works after changes made to the generated output, and anyone can independently check for a watermark. Computerphile did a video on it [1].
But of course, watermarking or checksums stop working once the general public runs LLMs on personal computers. And it's only a matter of time before that happens.
So in the long run, we have three options:
1. take away control from the users over their personal computers with 'AI DRM' (I strongly oppose this option), or
2. legislate: legally require a disclosure for each text on how it was created, or
3. stop assuming that texts are written by humans, and accept that often we will not know how it was created
[0]: Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A watermark for large language models. arXiv preprint arXiv:2301.10226. Online: https://arxiv.org/pdf/2301.10226.pdf
Will the general public be running LLMs on their own hardware, or will it be like where we are today with self-hosting? Despite what I've written above I would like to think it won't. But at the same time this is something big tech companies will work very hard to centralise.
In the short therm, I think it's very likely that companies (including smaller companies) integrating LLM's in their products want to locally run an open source LLM instead of relying on an external service, because it gives more independence and control.
Also, technical enthousiasts will run LLM's locally, like with image generation models.
In the long term, when smartphones are faster and open source LLM's are better (including more efficient), I can imagine LLM's running locally on smartphones.
'self-hosting', which I would define as hosting by individuals for own use or others based on social structures (friends/family/communities), like the hosting of internet forums, is quite small and it seems to shrink. So it seems unlikely that that form of hosting will become relevant for LLMs.
As of today you can download LLaMa/Alpaca and run it offline on commodity hardware (if you don't mind having someone else do the quantisation for you) - the cat's out of the bag with this one
TLDR: AI detection is a coin flip but there is high intercoder reliability, meaning we’re mostly picking up on the same cues as each other to make our determinations.
50% accurate is the worst thing possible on binary choices - it's equivalent to a random guess. If you are 25% accurate, inverting your answer makes you 75% accurate.
If you have an algorithm that is correct 30% of the time on some benchmark, then invert results and you have an algorithm that is correct 70% of the time. That's why 50% is the worst case result.
The study evaluated short-form generic marketing-style content, most of which is manicured and optimized to within an inch of its life.
Most dating profiles I see are extremely similar in terms of how people describe themselves. Same for Airbnb listings. I'd think AI detection would be much higher for long-form writing on a specific topic.