I used ChatGPT as a reporting assistant. It didn't go well

Benjammer · on March 20, 2024

The problem is that people like this author are trying to literally treat it like a person instead of an LLM. Like honestly if you look at the linked chat convo early in the article, this person kind of just sucks at prompt engineering, imo.

"At times I was able to get the chat agent to give me what I wanted, but I had to be very specific and I often had to scold it."

You can't just half-ass a paragraph of disjointed system instructions into the user input and expect clean results, in my experience. You need to leverage the custom system instructions, give example responses if possible, and be very, very specific and direct with instructions. You need to explain the type of response you want, and you also need to describe any applicable constraints (or lack thereof) on the response content.

"When you are asked something, it is crucial that you cite your sources, and always use the most authoritative sources (government agencies for example) rather that sites like Wikipedia"

This is not sufficient to achieve what the author intends. It's written in a speech-like roundabout style (e.g. "it is crucial that"), and there's a typo right in the middle on an important word (than --> that). LLM can work around typos in most cases, but here it is vaguely possibly imo that this is what is causing it to continue citing wikipedia in responses.

"At times, the tool was too eager to please, so I asked it to tone it down a little: “You can skip the chit chat and pleasantries.”"

I have found in my experience playing with ChatGPT that this is just the completely wrong mental model to have of the tool in order to get what you want out of it. You have to treat it more like a prose-language programming tool, not like a person with emotions that you are conversing with...

JohnFen · on March 20, 2024

> The problem is that people like this author are trying to literally treat it like a person instead of an LLM.

It's hard to blame them, though. The author and many others are using the LLM in exactly the way that they're told it should be used. You don't learn that this is wrong unless you're an LLM nerd.

happytiger · on March 20, 2024

I don’t think that’s true. I think a lot of these articles actually create these results intentionally so that they have the conclusion they want to write about: that AI isn’t ready.

I have been watching Devin and really see the model they have implemented working well for these kinds of executive function step-by-step LLM tasks. It’s remarkably smart how they implemented it.

https://the-decoder.com/cognition-unveils-ai-powered-softwar...

If you give vague instructions to a junior level human you get poor work product just like AI. But every journalist wants to write that ringer-dinger traffic bringer about ai not being ready for prime time or whatever bad headline works…

JohnFen · on March 20, 2024

You seem to be assuming malice or bad faith here, but I haven't seen any reason to suspect that. Not to say that you're wrong, of course. You might be right. I just don't see why I should suspect it.

Particularly considering that I know multiple people who have made a similar error. The distortion that the extreme hype is generating is pretty widespread.

happytiger · on March 20, 2024

Perhaps. But I generally would expect some level of research and professionalism beyond such simplistic concepts from someone who has a degree in journalism and works in the field.

Hype is precisely what journalists are supposed to cut through. That’s why it’s called ‘reporting.’ So the idea that because many people with no background in AI are confused conflating to reasons why that’s acceptable for journalists writing about it seems rather weak to me.

I do see a broad series of articles that follow the pattern, and do believe it to be clickbait ‘journalism’ of the lowest standard.

Incompetence could be an explanation, but it’s article after article from professional mainstream sources: sources that should know better or at least have a consult with someone who does as part of their due diligence. That implies it’s being done because it gets views, not because it’s good journalism and is therefore intentional. But that is my conjecture you are right.

JohnFen · on March 20, 2024

Well, the press has always been truly awful at reporting on technical topics. It's why you can't, and never could, take any reporting on such things at face value at all, is why all of the scientists and researchers I've worked with have disliked it when their work gets reported on, and is why the so much of what regular people think "scientists say" is incorrect. Nuance and therefore accuracy gets dropped in favor of having a clear, simple story.

I don't see why LLM-related topics would be any different. But the reasons why this happens are pretty clear, and poor intentions or even laziness on the part of the reporters are rarely factors.

Tainnor · on March 20, 2024

In other words, in order to get the "massive productivity boost" often cited as the main selling point of LLMs, you need to have completed a thorough training on prompt engineering, and if you get something slightly wrong (e.g. a simple typo), you get absolutely zero feedback that your intent was misunderstood?

wenebego · on March 20, 2024

Yeah these people keep telling me about how cars are a "massive productivity boost" and i get in and nothing happens.

Grimblewald · on March 20, 2024

Also I refused to learn basic traffic rules and now I keep ending up in accidents. I didn't need to know traffic rules when walking! I was told walking but faster with more capacity to transport cargo >:(

6510 · on March 20, 2024

move fast and break things

dragonwriter · on March 20, 2024

> The problem is that people like this author are trying to literally treat it like a person instead of an LLM.

This is exactly how consumer-facing LLM-powered interfaces are marketed and promoted, though.

Havoc · on March 20, 2024

That’s consistent with my experience too.

Huge help for hobby programming but struggling to use it for day job (finance). Even for basic memos the reasoning just isn’t coherent enough and not nuanced enough.

I think it’s because programming is quite modular. You can ask it a fragment of a problem easily. Eg how do I send a message on GCP pubsub. That same modular and self contained aspect just doesn’t exist in my day job

program_whiz · on March 21, 2024

Just a tangential note, it is useful for programming in the sense you can use it as a faster google to lookup a snippet. By the same token, asking "what's the kelly criterion" is faster than googling (for a finance example).

But most programmers (other than sheer juniors) also don't spend most of their day doing "insert snippet here". ChatGPT isn't close to figuring out the proper solution to link up multiple disparate APIs, read through the docs to figure out why the system was written a particular way, spend hours chatting with various people, wade through the political mire, understand vague customer and business requirements, and then finally submit a PR that forms the first in a long series of fraught attempts to cobble together such a system.

Upon PR review, deliver correctly worded responses depending on the personalities of the people who are ripping your code to shreds for pedantic and often meaningless reasons (mostly to seem important), and notice when the feedback is actually legitimate and requirees changes, or signals a bigger issue with the entire project.

Do that in a loop, keeping the context of the progress over the last few months/years so that a steady stream of design documents, political agreements, assignments, knowledge transfer as employees change over, and system requirements evolve, and changes come together to form a project that adjusts subtly over time as the politics and business requirements change as well.

In the end, everyone's "day job" is full of nuance and subtlety, but GPT is really good for automating everyone else's jobs (like those lawyers, its just a bunch of text rules, right? Or doctors, its just looking up a matching list of symptoms I think...)

Havoc · on March 21, 2024

>everyone's "day job" is full of nuance and subtlety, but GPT is really good for automating everyone else's jobs

That's not really where I was going with that, but can see how it may come across as such.

My point wasn't that one job has nuance & high level skills while the other doesn't but rather that ChatGPT has different usefulness at the low end of each.

Or put differently for coding ChatGPT use generating boilerplate is the obvious win at the low end. I haven't found the equivalent low hanging fruit in my day to day despite enthusiastic trying.

>google to lookup a snippet. By the same token, asking "what's the kelly criterion"

Think a step slightly more complicated than wikipedia like info retrieval. For coding it can do a fair bit more thanks to vast amounts of github code...for other jobs there is no equivalent depth of knowledge baked into the models. Maybe a couple of medical journals, some transcripts of law cases? It's just nowhere near though in scale, code or applicability and that shows. Nor are the other jobs as inherently machine readable (or modular as I said - code is usually split nicely into functional procedures etc)

program_whiz · on March 21, 2024

Just to echo this, I can see if you are a program manager or something "programming adjacent" you also would have little to no use for ChatGPT (despite being in an area its mostly very good at). It removes all the "write me a snippet" low hanging fruit out of the job. E.g. if you are the one running the team of engineers, there's nothing like that, its all just the nuance / politics and human decision making that requires the full context (not something you can jot down easily for GPT). For finance, probably anyone doing math / quant / research could still have low hanging fruit (solve this integral, what's the formula, how do I do this in excel), but anyone who is mostly decision making can't get much value from GPT.

smileysteve · on March 20, 2024

The article lost me at

> The confidence that ChatGPT exudes when providing poorly sourced information (like Wikipedia)

Isn't the data that Wikipedia is more peer reviewed and up to date than most other aggregated sources (such as encyclopedias); Sure it can be co-opted, but we've also seen publishing houses co-opted.

happytiger · on March 20, 2024

Despite warnings everywhere that accuracy isn’t part of what this generation of gpt-4 us all about, and the fact that Sama has said that right now they are optimizing for language capabilities and conversational flow rather than accuracy, every single article on the Internet criticizes accuracy.

LLMs are designed to create “helpful” and “convincing responses” without any overriding guarantees regarding their accuracy or alignment with fact. Right in the chat interface for chat-gpt is says, “ChatGPT can make mistakes. Consider checking important information.”

I’m sure accuracy will improve but it’s not a strong point for the current generation nor was it ever intended to be.

Not coming at you, just trying to support your point.

In my mind the fact that publishing houses are being co-opted (and I’ve seen it too) with this generation of LLM is more social commentary about the sad state of modern news and publishing. Imagine what happens when the LLM technology hits initial degrees or human equivalence.

dragonwriter · on March 20, 2024

> LLMs are designed to create “helpful” and “convincing responses” without any overriding guarantees regarding their accuracy or alignment with fact.

This is an incoherent goal: accuracy and alignment with fact are a big part of being convincing and a bigger part of being helpful.

happytiger · on March 20, 2024

Being accurate and being helpful are different things. Otherwise there wouldn’t be liars.

dragonwriter · on March 20, 2024

Being accurate is generally key to be helpful.

That's the reason liars, while they certainly exist, are, all others things being equal, viewed as less helpful than non-liars.

bitwize · on March 20, 2024

The way I think of it is, LLMs are sold as HAL but really they're a better MegaHAL: https://github.com/kranzky/megahal

For that reason I don't think LLM, alone, will be capable of human-equivalent intelligence.

gessha · on March 20, 2024

It probably is but one nuance is that peer reviews are from random people across the internet, not subject experts.

There’s an argument to be made whether random people can achieve the same thing that experts do.

I personally use Wikipedia as an authoritative source but I do check sources of something doesn’t make sense to me.

JohnMakin · on March 20, 2024

At that point in 2023 didn't chatGPT have a hard knowledge cutoff? If so asking it to cite relevant sources to a major news event that just happened seems kind of ridiculous. I think it was much later in 2023 that that was removed.

Grimblewald · on March 20, 2024

Even then this displays a remarkable lack of understanding on what llms are and how they work. They have a fuzzy memory of things, so when you ask for citations you'll get titles, names and journals that seem plausible because they're centred around the distribution of names you'd expect for a paper of the kind that was required to support a specific point but llms don't have the capacity to resolve information to that degree, otherwise we'd have losslesscompression of all public works in a few gb of data. That would be as if not more ground breaking than llms chat capacity.

nine_zeros · on March 20, 2024

ChatGPT cannot be used for anything that requires precision. It is just confidently wrong too many times.

Use it for what it is good for, easy fuzzy activities.

VincentEvans · on March 20, 2024

If you have to «always double-check» - it means that you have to do the most important portion of the work yourself, which significantly narrows potential uses to those where you already possess competence of the task and command of the data to judge the results and the use of gpt just saves you some typing. And saving on typing is great, but a significant step down from our collective expectations.

It’s like having a personal assistant that is most helpful but has a penchant for blatantly sociopathic lying. Not great.

Not arguing, just restating the obvious in my own words.

giuliomagnifico · on March 20, 2024

If it gives you no, or wrong, sources you -journalist- have to check them again, yes. Pointless.