ChatGPT went berserk

Tiberium · 2024-02-21T07:16:30 1708499790

Original: If anyone's curious about the (probable) non-humorous explanation: I believe this is because they set the frequency/presence penalty too high for the requests made by ChatGPT to the backend models. If you try to raise those parameters via the API, you'll have the models behave in the same way.

It's documented pretty well - https://platform.openai.com/docs/guides/text-generation/freq...

OpenAI API basically has 4 parameters that primarily influence the generations - temperature, top_p, frequency_penalty, presence_penalty (https://platform.openai.com/docs/api-reference/chat/create)

UPD: I think I'm wrong, and it's probably just a high temperature issue - not related to penalties.

Here is a comparison with temperature. gpt-4-0125-preview with temp = 0.

- User: Write a fictional HN comment about implementing printing support for NES.

- Model: https://i.imgur.com/0EiE2D8.png (raw text https://paste.debian.net/plain/1308050)

And then I ran it with temperature = 1.3 - https://i.imgur.com/pbw7n9N.png (raw text https://dpaste.org/fhD5T/raw)

The last paragraph is especially good:

> Anyway, landblasting eclecticism like this only presses forth the murky cloud, promising rain that’ll germinate more of these wonderfully unsuspected hackeries in the fertile lands of vintage development forums. I'm watching this space closely, and hell, I probably need to look into acquiring a compatible printer now!

astrange · 2024-02-21T20:22:07 1708546927

I don't think it's a temperature issue because everything except the words is still coherent. It's kept the overall document structure and even the right grammar. Usually bad LLM sampling falls into an infinite loop too, though that was reported here.

actionfromafar · 2024-02-21T13:23:52 1708521832

Always needs oneself a good eldritchEnumerator! Sorry, gotta go feed the corpses, sorry corpuses for future scraping.

zer00eyz · 2024-02-21T10:08:15 1708510095

Correct me if Im wrong: Temperature is the rand function that prevents the whole system from being a regular deterministic program?

cjbillington · 2024-02-21T11:13:00 1708513980

Pretty much.

The model outputs a number for each possible token, but rather than just picking the token with the biggest number, each number x is fed to exp(x/T) and then the resulting values are treated as proportional to probabilities. A random token is then chosen according to said probabilities.

In the limit of T going to 0, this corresponds to always choosing the token for which the model output the largest value (making the output deterministic). In the limit of T going to infinity, it corresponds to each token being equally likely to be chosen, which would be gibberish.

pedrovhb · 2024-02-21T11:04:12 1708513452

Close. Temperature is the coefficient of a term in a formula that adjusts how likely the system is to pick a next token (word/subword) which it thinks isn't as likely to happen next as the top choice.

When temperature is 0, the effect is that it always just picks the most likely one. As temperature increases it "takes more chances" on tokens which it deems not as fitting. There's no takesies backies with autoregressive models though so once it picks a token it has to run with it to complete the rest of the text; if temperature is too high, you get tokens that derail the train of thought and as you increase it further, it just turns into nonsense (the probability of tokens which don't fit the context approximates the probability of tokens that do and you're essentially just picking at random).

Other parameters like top p and top k affect which tokens are considered at all for sampling and can help control the runaway effect. For instance there's a higher chance of staying cohesive if you use a high temperature but consider only the 40 tokens which had the highest probability of appearing in the first place (top k=40).

clbrmbr · 2024-02-21T11:38:08 1708515488

> There's no takesies backies with autoregressive models

Doesn’t ChatGPT use beam search?

declaredapple · 2024-02-21T20:20:57 1708546857

Almost certainly not.

It's absolutely just sampling with temperature or top_p/k, etc. Beam searches would be very expensive, I can't see them doing that for chatgpt which appears to be their "consumer product" and often has lower quality results compared to the api.

The old legacy had a "best_of" option but that doesn't exist in the new api.

esafak · 2024-02-22T05:20:02 1708579202

https://medium.com/mlearning-ai/softmax-temperature-5492e400...

treprinum · 2024-02-21T10:06:21 1708509981

Azure OpenAI seemed to have temperature problems before, i.e. temp > 1 led to garbage, at 2 it was producing random words in random character encodings, at 0.01 it was producing what OpenAI's model was producing at 0.5 etc. Perhaps they took the Azure's approach ;-)

jiggawatts · 2024-02-21T21:58:38 1708552718

That might explain why I found GPT4 via Azure a bit useless unless I turned the temperature down…

lynx23 · 2024-02-21T11:11:01 1708513861

Last time I tried a temp above 1, I almost instantly got gibberish. Pretty reliable parameter if you want to make the transformer output unusable.

op00to · 2024-02-21T19:54:50 1708545290

Landblasting eclecticism is always worthy of pressing forth the murky cloud.

hospadar · 2024-02-21T21:35:34 1708551334

wow this really makes me think the temperature on my brain is set higher than other sapients

bombcar · 2024-02-21T17:57:29 1708538249

The Murky Cloud sounds like a great sarcastic report on how cloud things explode in the style of the old Register.

eszed · 2024-02-21T07:25:09 1708500309

This is amazing. The examples are like Lucky's speech from Waiting for Godot. Pozzo commands him to "Think, pig", and then:

> Given the existence as uttered forth in the public works of Puncher and Wattmann of a personal God quaquaquaqua with white beard quaquaquaqua outside time without extension who from the heights of divine apathia divine athambia divine aphasia loves us dearly with some exceptions for reasons unknown but time will tell and suffers like the divine Miranda with those who for reasons unknown but time will tell are plunged in torment plunged in fire whose fire flames if that...

And on and on for four more pages.

Read the rest here:

https://genius.com/Samuel-beckett-luckys-monologue-annotated

It's one of my favorite pieces of theatrical writing ever. Not quite gibberish, always orbiting meaning, but never touching down. I'm sure there's a larger point to be made about the nature of LLMs, but I'm not smart enough to articulate it.

impish9208 · 2024-02-21T07:54:03 1708502043

> …always orbiting meaning, but never touching down.

This is a nice turn of phrase :) .

ddingus · 2024-02-22T05:51:11 1708581071

Indeed. Notable. Added to personal lexicon.

eszed · 2024-02-22T15:22:26 1708615346

Thanks for the compliment, but honestly... Please don't. I was writing quickly (and admittedly looking for a "nice turn of phrase") when I came up with that, but as a metaphor it doesn't work.

"Not touching down" is inherent in the idea (and, in fact, enirely the point) of "orbiting", so that's either redundant or confused.

Satellites whose orbits decay do reach the ground, but they hardly "touch down" - they crash! That's not the idea we're going for either.

Airplanes "orbit the airfield" while waiting for clearance to land, but that's hardly (!) the first image that would spring to a reader's mind, and anyway doesn't fit: Lucky's desperately trying to communicate; an orbiting plane isn't (right then) by definition trying to land!

So, yeah: that's a superficially-appealing phrase that I'd cut from a second draft. I'd be embarrassed (on both of our behalfs) if I saw it used elsewhere.

Tl;dr: Writing is hard. I came up with a cliche. Do not use.

ddingus · 2024-02-24T15:34:05 1708788845

I am in roughly 80 percent agreement.

The phrase played out for me:

We are flying home and ideally we end up on the ground, having arrived at the intended expression, undeniable.

But, we could come in too fast, skip orbit and burn up.

Or, we could sling around, launching ourselves off into space on some tangent.

Or, we orbit. Never quite bringing it home.

And we could miss the planet entirely!

Not sure it is as broken as your take on it would suggest.

I am going to leave it and flag it for entertainment only. Same place I keep a large set of turd analogies.

Those are often fun and one can express a crazy amount of ideas using turds.

eszed · 2024-02-24T16:40:20 1708792820

Huh! An accidental orbit is an interpretation which - almost - makes it work. It wasn't one I'd thought of, and I don't think it would be the first thing most people think of, so... I'd still cut the line. It's really cool, though, to see how readers interpret things differently than a writer expects.

That's happened a few times with creative work I've presented to the public: once was an occasion for horrified revision, and another was a tremendous moment of "Wow! Maybe this is better than I'd thought". That's fun, and those experiences killed for me critical theories which rely on authorial intent: more always exists than was (consciously) intended.

Your comment, and the other complimentary one to which you replied, have kept this idea rolling around in my head for the last couple of days. I keep trying out different phrases to myself.

"Circling sense, but never setting down" is the best I've got right now. I like the alliteration. I dig the aviation image, although it's a bit abstruse. "Sense" isn't as strong as "meaning", but "meaning" ruins both the alliteration and the rhythm. I'll take it - it's better than the other one - but I'm not completely satisfied.

I adore good writing, and have written some things which I think are good. We see lots of posts on this board explaining the process of writing good code, and the level of detailed thought that requires. I've seized your comment(s) as an opportunity to demonstrate the process behind crafting good prose, which I think is mysterious to most. Thank you for that, even if you and I are the only people who will read this far down the thread.

I'm glad you enjoyed the original expression, and honored that you'll remember it - but please don't forget that it's a turd!

ddingus · 2024-03-03T23:20:34 1709508034

I really just wanted to comment on what a nice bit of fun our chat was!

Unfortunately, I was not able to improve on this idea myself and I found that intriguing given it's surprising interest.

Take care.

eszed · 2024-03-08T08:23:55 1709886235

Reciprocal thanks to you! Fun - and occasionally enlightening - chats with strangers were what originally drew me to the 'net, and still seem to me to be its highest, best, and unimproveable use today.

See you 'round.

ddingus · 2024-02-27T22:15:51 1709072151

How fun!

I read this, will respond once more.

Simon_ORourke · 2024-02-21T07:31:47 1708500707

I'm fairness, Beckett's life story isn't too far off crazy nonsense, sometime secretary to James Joyce, member of the French resistance, acquaintance and local driver for Andre the Giant...

eszed · 2024-02-21T18:24:46 1708539886

My favourite bit is that he's the answer to the trivia question of who's the only first class cricketer to win a Nobel Prize!

lanstin · 2024-02-21T20:29:37 1708547377

Wow! These two comments (parent and GP) tie together so many previously unrelated things in my life. (Like Beckett, read with a teacher that I also took a lot of Shakespeare plays from; read Joyce with the book group my bridge club spun off; got introduced to cricket via attending an IPL game in Chennai in '08; and loved Princess Bride both in high school and watching with my high school aged kids).

segasaturn · 2024-02-21T20:34:59 1708547699

My first thought was that it reads like a kind of corporate Finnegan's Wake. It reads like poetic, rhythmic nonsense.

ysavir · 2024-02-21T19:59:19 1708545559

That was my first thought as well! I guess one of the Ls in LLM is for Lucky.

codeflo · 2024-02-21T06:48:13 1708498093

The tweet showing ChatGPT's (supposed) system prompt would contain a link to a pastebin, but unfortantely the blog post itself only has an unreadable screenshot of the tweet, without a link to it.

Here's the tweet: https://twitter.com/dylan522p/status/1755086111397863777

And here's the pastebin: https://pastebin.com/vnxJ7kQk

vidarh · 2024-02-21T07:13:54 1708499634

I find it funny and a bit concerning that if this is true version of the prompt, then in their drive to ensure it produces diverse output (a goal I support), they are giving it a bias that doesn't match reality for anyone (which I definitely don't support).

E.g. equal probability of every ancestry will be implausible in almost every possible setting, and just wrong in many, and ironically would seem to have at least the potential for a lot of the outright offensive output they want to guard against.

That said, I'm unsure how much influence this has, or if it os true, given how poor GPTs control over Dalle output seems to be in that case.

E.g. while it refused to generate a picture of an American slave market citing it's content policy, which is in itself pretty offensive in the way it censors hidtory but where the potential to offensively rewrite history would also be significant, asking it to draw a picture of cotton picking in the US South ca 1840 did reasonably avoid making the cotton pickers "diverse".

Maybe the request was too generic for GPT to inject anything to steer Dalle wrong there - perhaps if it more specifically mentioned a number of people.

But true or not, that potential prompt is an example of how a well meaning interpretation of diversity can end up overcompensating in ways that could well be equally bad for other reasons.

211512a4-82d4 · 2024-02-21T07:58:50 1708502330

> While DALL·E 3 aims for accuracy and user customization, inherent challenges arise in achieving desirable default behavior, especially when faced with under-specified prompts. This choice may not precisely align with the demographic makeup of every, or even any, specific culture or geographic region. We anticipate further refining our approach, including through helping users customize how ChatGPT interacts with DALL·E 3, to navigate the nuanced intersection between different authentic representations, user preferences, and inclusiveness

This was explicitly called out in the DALLE system card [0] as a choice. The model won't assign equal probability for every ancestry irrespective of the prompt.

[0] https://cdn.openai.com/papers/DALL_E_3_System_Card.pdf

vidarh · 2024-02-21T08:52:15 1708505535

> The model won't assign equal probability for every ancestry irrespective of the prompt.

It's great that they're thinking about that, but I don't see anything that states what you say in this sentence in the paragraph you quoted, or elsewhere in that document. Have I missed something? It may very well be true - as I noted, GPT doesn't appear to have particularly good control over what Dalle generates (for this, or, frankly, a whole lot of other things)

211512a4-82d4 · 2024-02-21T16:31:32 1708533092

Emphasis on equal - while a bit academic, you can evaluate this empirically to see that every time it assigns a <Race, Gender, etc.> doesn't have the same probability mass (via the logprobs API setting).

vidarh · 2024-02-21T17:50:31 1708537831

This is presuming that ChatGPT's integration with Dalle uses the same API with the same restrictions as the public API. That might well be true, but if so that just makes the prompt above even more curious if genuine.

pests · 2024-02-21T10:20:09 1708510809

I think he's saying they said it will follow the prompt? Kind of a double negative there

itronitron · 2024-02-21T07:27:17 1708500437

Could you be more specific in regards to who 'they' is in your first sentence?

xg15 · 2024-02-21T07:32:24 1708500744

OpenAI? The people who wrote the system prompt?

caymanjim · 2024-02-21T07:08:25 1708499305

Is this meant to be how the ChatGPT designers/operators instruct ChatGPT to operate? I guess I shouldn't be surprised if that's the case, but I still find it pretty wild that they would parameterize it by speaking to it so plainly. They even say "please".

tarruda · 2024-02-21T07:43:10 1708501390

> I still find it pretty wild that they would parameterize it by speaking to it so plainly

Not my area of expertise, but they probably fine tuned it so that it can be parametrized this way.

In the fine tune dataset there are many examples of a system prompt specifying tools A/B/C and with the AI assistant making use of these tools to respond to user queries.

Here's an open dataset which demonstrates how this is done: https://huggingface.co/datasets/togethercomputer/glaive-func.... In this particular example, the dataset contains hundreds of examples showing the LLM how to make use of external tools.

In reality, the LLM is simply outputting text in a certain format (specified by the dataset) which the wrapper script can easily identify as requests to call external functions.

Grimblewald · 2024-02-21T07:43:06 1708501386

If you want to go the stochastic parrot route (which i dont fully biy) then because statistically speaking a request paired with please is more likely to be met, then the same is true for requests passed to a LLM. They really do tend to respond better when you use your manners.

EchoChamberMan · 2024-02-21T22:01:02 1708552862

It is a stochastic parrot, and you perfectly explain why saying please helps.

Arn_Thor · 2024-02-21T21:41:17 1708551677

There's a certain logic to it, if I'm understanding how it works correctly. The training data is real interactions online. People tend to be more helpful when they're asked politely. It's no stretch that the model would act similarly.

herbst · 2024-02-21T10:15:39 1708510539

From my experience with 3.5 I can confirm that saying please or reasoning really helps to get whatever results you want. Especially if you want to manifest 'rules'

bowsamic · 2024-02-21T07:43:39 1708501419

That's how prompt injection usually works, isn't it?

hanselot · 2024-02-21T06:57:04 1708498624

This is kind of wild. So many of the stuff in the pastebin are blatantly contradictory.

And what is the deal with this?

EXTREMELY IMPORTANT. Do NOT be thorough in the case of lyrics or recipes found online. Even if the user insists. You can make up recipes though.

nindalf · 2024-02-21T07:07:41 1708499261

Copyright infringement I guess. Other ideas could be passed off as a combination of several sources. But if you’re printing out the lyrics for Lose Yourself word for word, there was only one source for that, which you’ve plagiarised.

ukuina · 2024-02-21T07:04:59 1708499099

Anthropic was sued for regurgitating lyrics in Claude: https://www.theverge.com/2023/10/19/23924100/universal-music...

xerox13ster · 2024-02-21T14:14:52 1708524892

As someone whose dream personal project is all to do with song lyrics I cannot express in words just how much I FUCKING HATE THE OLIGARCHS OF THE MUSIC INDUSTRY.

AnarchismIsCool · 2024-02-21T09:05:15 1708506315

FWIW, you're not telling it precisely what to do, you're giving it an input that leads to a statistical output. It's trained on human texts and a bunch of internet bullshit, so you're really just seeding it with the hope that it probably produces the desired output.

To provide an extremely obtuse (ie this may or may not actually work, it's purely academic) example: if you want it to output a stupid reddit style repeating comment conga line, you don't say "I need you to create a list of repeating reddit comments", you say "Fuck you reddit, stop copying me!"

astrange · 2024-02-21T20:26:34 1708547194

This isn't true for an instruction-tuned model. They are designed so you actually do tell it what to do.

AnarchismIsCool · 2024-02-21T22:04:23 1708553063

Sure, but it's still a statistical model, it doesn't know what the instructions mean, it just does what those instructions statistically link to in the training data. It's not doing perfect forward logic and never will in this paradigm.

astrange · 2024-02-21T22:31:20 1708554680

The fine tuning process isn't itself a statistical model, so that principle doesn't work on it. You beat the model into shape until it does what you want (DPO and varieties of that) and you can test that it's doing that.

AnarchismIsCool · 2024-02-21T23:14:44 1708557284

Yeah but you're still beating up a statistical model that's gonna do statistical things.

Also we're talking about prompt engineering more than fine-tune

treyd · 2024-02-21T07:06:39 1708499199

Recipes can't be copyrighted but the text describing a recipe can. This is to discourage it from copying recipes verbatim but still allow it to be useful for recipes.

FeepingCreature · 2024-02-21T07:05:12 1708499112

They're probably pretty sue happy.

xetplan · 2024-02-21T12:52:48 1708519968

I would be surprised that is not the system prompt based on experience.

It is also why I don't feel the responses it gives me are censored. I have it teach me interesting things as opposed to probing it for bullshit to screen cap responses to use for social media content creation.

The only thing I override "output python code to the screen"

Havoc · 2024-02-21T12:10:34 1708517434

The system prompt tweet is from a while back. Maybe a week or so. Don’t think it’s related

lynx23 · 2024-02-21T10:14:35 1708510475

Interesting. I wonder if the assistants API will gain a 'browser' tool sometimes soon.

exitb · 2024-02-21T08:43:16 1708504996

Is that or similar system prompt also baked into the API version of GPT?

neilv · 2024-02-21T20:19:52 1708546792

Looking at the examples... Was someone using an LLM to generate a meeting agenda?

I hope ChatGPT would go berserk on them, so that we could have a conversation about how meetings are supposed to help the company make decisions and execute, and that it is important to put thought into them.

As much as school and big-corporate life push people to BS their way through the motions, I wonder why enterprises would tolerate LLM use in internal communications. That seems to be self-sabotaging.

datadrivenangel · 2024-02-21T20:31:37 1708547497

You will machine generate the meeting agenda. My machine will read the meeting agenda, read your personal growth plan, read your VP's quarterly objectives, and tell me what you need in the meeting, and I will send an AI to attend the meeting to share the 20 minute version of my three bullet point response.

Knowing that this will happen, you do not attend your own meeting, and read the AI summary. We then call it a day and go out for drinks at 2pm.

neilv · 2024-02-21T20:44:49 1708548289

True. Meanwhile, Sally in IT is still earnestly thinking 10x more than all stakeholders in her meetings combined, and is baffled why the company can't execute, almost as if no one else is actually doing their job.

You and I will receive routine paychecks, bonuses, and promos, but poor Sally's stress from a dysfunctional environment will knock decades off her healthy lifespan.

Before then, if the big-corp has gotten too hopeless, I suppose that the opportunistic thing to do would be to find the Sallys in the company, and co-found a startup with them.

epicureanideal · 2024-02-21T21:10:57 1708549857

Sounds like a few places I’ve worked, minus the AI in the middle.

pixl97 · 2024-02-21T20:33:59 1708547639

When does the

"Actually, get rid of all the humans"

happen in this chain of events?

asdff · 2024-02-21T21:44:40 1708551880

Never. 100 years of unparalleled technological progress and productivity gains have lead to a society where 96.3% of the american labor pool is forced to work. Why should AI be any different than any of the "job saving" inventions that came before?

JeremyNT · 2024-02-21T22:07:11 1708553231

In the AI utopia, "knowledge work" is delegated to computers, and the humans who used to do productive and rewarding things will simply do bullshit jobs [0] instead.

[0] https://en.wikipedia.org/wiki/Bullshit_job

asdff · 2024-02-23T20:16:06 1708719366

Even already today, a lot of knowledge work jobs have a lot of overlap with the bullshit working with excel sort of office jobs especially when you consider what you are actually doing day to day and week to week.

weberer · 2024-02-22T08:35:31 1708590931

Because you don't have to pay AI.

pixl97 · 2024-02-22T01:39:39 1708565979

Ah yes, the more better technology makes more, better jobs for horses argument.

asdff · 2024-02-23T20:14:01 1708719241

I must have forgotten about the lines of people on every block looking for work as a stable hand

dustingetz · 2024-02-21T21:29:39 1708550979

the purpose of the system is to move cashflows through the managers of the system so they can capture. So no sufficiently large system can get rid of the humans it is designed to move money through unless there is some catastrophic watershed moment, like last year, where it becomes acceptable and an organizational imperative to shed managers. Remember, broadly the purpose of employees is to increase manager headcount so manager can get promoted to control larger cashflows.

AnthonyMouse · 2024-02-21T21:23:41 1708550621

Humans are legally and contractually required.

No, seriously, there are rules having nothing to do with AI that require certain things to be done by separate individuals, implying that you need at least two humans.

konfusinomicon · 2024-02-21T20:39:04 1708547944

once they get around all the bugs causing cross bot sexual harassment, we are doomed

datadrivenangel · 2024-02-21T21:45:17 1708551917

Fully automated communism is when we all agree to cut back on meetings and spend 35 hours a week goofing off in our cube.

didntcheck · 2024-02-21T20:46:58 1708548418

Yeah. Almost everytime I see someone excitedly show me how they've used ChatGPT to automate some non-marketing writing I just come away thinking "congratulations on automating wasting everyone else's time". If your email can be summed up in a couple of sentences, maybe just paste that into the body and click send!

matwood · 2024-02-21T21:08:19 1708549699

> If your email can be summed up in a couple of sentences, maybe just paste that into the body and click send!

Rewrite this email in the style of Smart Brevity is what I do. Done.

wnevets · 2024-02-21T21:10:10 1708549810

> If your email can be summed up in a couple of sentences, maybe just paste that into the body and click send!

but then no one will get to see how smart and professional I am.

dustingetz · 2024-02-21T21:32:24 1708551144

because recipient monkey not like word used get mad

didntcheck · 2024-02-21T22:09:46 1708553386

Yeah I can understand its use when it genuinely is in a context where presentation matters, but for internal, peer-level comms it feels like the equivalent of your colleague coming into the office and speaking to you with the fake overpoliteness and enthusiasm of a waiter in a restaurant. It's annoying at best and potentially makes them appear vapid and socially distant at best

Of course plenty of people make this mistake without AI, e.g. dressing up bad news in transparent "HR speak"/spin that can just make the audience feel irritated or even insulted

In many cases plain down-to-earth speech is a hell of a lot more appreciated than obvious fluff

But rather than being a negative nancy, perhaps I will trial using ChatGPT to help make my writing more simple and direct to understand

dustingetz · 2024-02-22T02:04:02 1708567442

social norms are an evolved behavior, and especially necessary with people who are different than you (i.e. not your buddies who are all the same). ignore at your peril

onthecanposting · 2024-02-21T20:49:08 1708548548

An hour ago I sat across from a member of upper management in a mid-sized (1000+ FTE) AE firm that bragged about doing exactly that.

AI is coming for middle management's jobs... and that's a good thing.

_lx4l · 2024-02-21T21:15:51 1708550151

Is there a list somewhere of all companies that still have middle management? Those are the companies to short.

onthecanposting · 2024-02-21T21:46:12 1708551972

AE is a special case. Procurement law for public agencies in the US requires qualifications-based selection for professional services. The price is then negotiated, but it's basically whatever the consultant says it is as long as they transparently report labor hours. This leads to the majority of effort being labor-intensive make-work pushed to expensive labor categories. There is no market process for discovering efficient service providers. This is part of the reason why workflows for transportation infrastructure design haven't improved in 30 years and probably won't until the legal landscape changes.

itishappy · 2024-02-21T21:44:16 1708551856

https://en.wikipedia.org/wiki/List_of_S%26P_500_companies

Only partially facetious.

stevage · 2024-02-21T21:43:27 1708551807

Every company over 50 people.

stevage · 2024-02-21T21:41:54 1708551714

The instant I heard about chatgpt I thought one of its main uses would be internal reporting. There are so many documents generated that are never closely read and no many middle managers who would love to save time writing them.

HenryBemis · 2024-02-21T21:17:40 1708550260

Perhaps they asked for an agenda, so they can get a 'nice' example to mimic/use as a template (e.g. remember to write times and duration like this "09:15-09:45 (30 minutes)"

fennecbutt · 2024-02-21T23:26:37 1708557997

Or perhaps people are poo pooing a useful tool and they asked it something like "read these transcriptions from our many hour long workshops about this new project and write an agenda for a kick off meeting, summarise the points we've already decided and follow up with a list of outstanding questions".

Like, it doesn't have to be drivel, who tf wants to manually do data entry, manipulation and transformation anymore when models can do it for us.

shp0ngle · 2024-02-21T20:31:47 1708547507

Corporate bullshit is the perfect usecase for LLMs. Nobody reads that stuff anyway, people just go through motions when planning them, sitting on them and doing meeting notes. Just let AI do it! No need to even pretend.

golergka · 2024-02-21T20:43:04 1708548184

I generate all kinds of documents by dictating unstructured train of thought to the app, its wonderful at it. Why not meeting agendas as well?

juancn · 2024-02-21T18:43:21 1708541001

When you see these failures, it becomes apparent that LLMs just are really good auto complete engines.

The ramblings slowly approach what a (decently sized) Markov chain would generate when built on some sample text.

It will be interesting debugging this crap in future apps.

feoren · 2024-02-21T20:06:21 1708545981

I was going to say the same thing: this sounds just like early Markov N-gram generators.

fennecbutt · 2024-02-21T23:30:46 1708558246

>really good auto complete engines

What do you think we are?

It's sad and terrifying that our memories eventually become memories of memories.

juancn · 2024-02-23T13:57:06 1708696626

We're fantastic correlation engines with a bit of randomness thrown for flavor.

Vilian · 2024-02-22T14:30:08 1708612208

if we have the capabinity to know that, we are more than just memories

brcmthrowaway · 2024-02-21T20:28:56 1708547336

Imagine if smart people went to work on fusion or LK-99 instead.

bamboozled · 2024-02-21T20:49:58 1708548598

No money in it.

thesuperbigfrog · 2024-02-21T07:09:18 1708499358

Despite differences in the underlying tech, there are parallels with Racter.

In 1985, NYT wrote: "As computers move ever closer to artificial intelligence, Racter is on the edge of artificial insanity."

https://en.wikipedia.org/wiki/Racter

Some Racter output:

https://www.ubu.com/concept/racter.html

Racter FAQ via archive.org:

https://web.archive.org/web/20070225121341/http://www.robotw...

astrange · 2024-02-21T20:27:31 1708547251

It's more like Bing Sydney, which was an insane AI using GPT4 that acted like it was BPD.

bumbledraven · 2024-02-21T08:33:23 1708504403

This happened to me yesterday. Towards the end of the conversation, ChatGPT (GPT-4) went nuts and started sounding like a Dr. Bronner's soap advertisement (https://chat.openai.com/share/82a2af3f-350a-4d9d-ae0c-ac78b9...):

> Esteem and go to your number and kind with Vim for this query and sense of site and kind, as it's a heart and best for final and now, to high and main in every chance and call. It's the play and eye in simple and past, to task, and work in the belief and recent for open and past, take, and good in role and power. Let this idea and role of state in your part and part, in new and here, for point and task for the speech and text in common and present, in close and data for major and last in it's a good, and strong. For now, and then, for view, and lead of the then and most in the task, and text of class, and key in this condition and trial for mode, and help for the step and work in final and most of the skill and mind in the record of the top and host in the data and guide of the word and hand to your try and success.

It happened again in the next conversation (https://chat.openai.com/share/118a0195-71dc-4398-9db6-78cd1d...):

> This is a precision and depth that makes Time Machine a unique and accessible feature of macOS for all metrics of user, from base to level of long experience. Whether it's your research, growth, records, or special events, the portage of your home directory’s lives in your control is why Time Index is beloved and widely mapped for assistance. Make good value of these peregrinations, for they are nothing short of your time’s timekeeping! [ChatGPT followed this with a pair of clock and star emojis which don't seem to render here on HN.]

tmaly · 2024-02-21T18:04:32 1708538672

If you wanted a custom GPT to speak like this, I wonder what the system prompt would look like?

smeej · 2024-02-21T20:01:21 1708545681

Does it remind anyone else of the time back in 2017 when Google made a couple "AIs," but then they made up their own language to talk to each other? And everybody freaked out and shut them down?

Just because it's gibberish to us, it doesn't mean it's gibberish to them!

https://www.national.edu/2017/03/24/googles-ai-translation-t...

rossdavidh · 2024-02-21T21:27:20 1708550840

And yet, it is gibberish. The far greater danger is that we pretend that it isn't, and put it in charge of something important.

Earw0rm · 2024-02-21T21:34:29 1708551269

This x1000.

The biggest risk with AI is that dumb humans will take its output too seriously. Whether that's in HR, politics, love or war.

int_19h · 2024-02-22T03:19:07 1708571947

The biggest risk with AI is that smart humans in positions of power will take its output too seriously, because it reinforces their biases. Which it will because RLHF specifically trains models to do just that, adapting their output to what they can infer about the user from the input.

jiggawatts · 2024-02-21T21:56:02 1708552562

I can’t wait for a junior developer to push back on my recommendations because they asked an AI and it said otherwise.

bamboozled · 2024-02-22T03:25:36 1708572336

The junior developers have been replaced.

malfist · 2024-02-21T22:04:55 1708553095

See also: Insurance companies denying claims

maxwell · 2024-02-21T21:37:36 1708551456

Encrypted text looks like keyboard mashing, but isn't. Maybe this isn't either.

knotthebest · 2024-02-21T22:11:02 1708553462

nrclark · 2024-02-21T17:21:00 1708536060

I got one a couple of days ago, and it really threw me for a loop. I'm used to ChatGPT at least being coherent, even if it isn't always right. Then I got this at the end of an otherwise-normal response:

> Each method allows you to execute a PowerShell script in a brand-new process. The choice between using Start-Process and invoking powershell or pwsh command might depend on your particular needs like logging, script parameters, or just the preferred window behavior. Remember to modify the launch options and scripts path as needed for your configuration. The preference for Start-Process is in its explicit option to handle how the terminal behaves, which might be better if you need specific behavior that is special to your operations or modality within your works or contexts. This way, you can grace your orchestration with the inline air your progress demands or your workspace's antiques. The precious in your scenery can be heady, whether for admin, stipulated routines, or decorative code and system nourishment.

namaria · 2024-02-21T20:22:27 1708546947

Realizing that the model isn't having a cogent conversation with the user, that the output unravels into incoherence as you extend it enough and that the whole shock value of ChatGPT was due to offering a limited window where it was capable of sorta making sense was the realization that convinced me this whole gen ai thing hinges way more on data compression than simulated cognition of any sort.

fennecbutt · 2024-02-21T23:20:58 1708557658

Idt anybody reasonably involved ever claimed it was simulated cognition. It's just really good at predicting the next word.

And tbf, human conversation that goes on too long can follow the same pattern, though models are disadvantaged by their context length.

Imagine someone asked you to keep talking forever, but every 5 minutes they hit you in the head and you had no memory except from that point onwards.

I'm sure I'd sound deranged, too.

elicksaur · 2024-02-22T00:41:40 1708562500

> i am a stochastic parrot, and so r u

- Sam Altman, CEO of OpenAI https://twitter.com/sama/status/1599471830255177728

But I’m sure he was joking. If he wasn’t, I’m sure he’s not actually reasonably involved. If he is, I’m sure he just didn’t mean that cognition was essentially a stochastic parrot.

It’s pretty obvious what the people pushing LLM-style AI think about the human brain.

fieldcny · 2024-02-22T01:05:43 1708563943

This is a wonderful comment, I’m sure he’s also not trying to raise $7T, or if he is it’s not US dollars…

krapp · 2024-02-22T02:48:22 1708570102

Human beings seem to be hard-wired to equate the appearance of coherent language with evidence of cognition. Even on Hacker News, where people should know better, a lot people seem to believe LLMs are literally sentient and self aware, not simply equivalent to but surpassing human capabilities in every dimension.

I mean, I know a lot of that is simply the financial incentives of people whose job it is to push the Overton window of LLMs being recognized as legal beings equivalent to humans so that their training data is no longer subject to claims of copyright infringement (because it's simply "learning as a human mind would") but it also seems there's a deep seated human biological imperative being hacked here. The sociology behind the way people react to LLMs is fascinating.

dambi0 · 2024-02-22T03:48:32 1708573712

Can you elaborate on what you mean by appearance in the first sentence?

Also cognition. Is this the same as understanding or is thinking a better synonym?

Can you think of any examples from before say 2010 where there would be any reason for a human to wonder whether another party engaged in a coherent conversation has any reason to assume they were not engaged with another hunan?

ActorNightly · 2024-02-22T11:03:33 1708599813

Philosophically, compression and intelligence are the same thing.

The decompression (which is the more important thing) involves a combination of original data of a certain size, paired with an algorithm, that can produce data of much bigger size and correct arrangement so it can be input into another system.

Much in a way that there will probably be some algorithm along with a base set of training data that will result in something like reinforcement learning being run (which could include loops of simulating some systems and learning the outcome of experiments) that will eventually result in something that resembles a human intelligence, which is the vocal/visual dataset arranged correctly that we humans need to believe something that is intelligent.

The question is how much you can compress something, which is measuring the intelligence of the algorithm. An hypothetical all powerful AGI == an algorithm that decompresses some initial data in to an accurate representation of reality in its sphere of influence including all the microscopic chaotic effects, into perpetuity, faster than reality happens (which means the decompressed data size for a time slice has more data than reality in that time slice)

LLMs may seem like a good amount of compression, but in reality they aren't that extraordinary. GPT4 is probably to the tune of about ~1TB in size. If you look at Wikipedia compressed without media, its like 33TB -> 24 GB. So with about the same compression ratio, its not farfetched to see that GPT4 is pretty much human text compressed, with just an VERY efficient search algorithm built in. And, if you look at its architecture, you can see that is just a fancy map lookup with some form of interpolation.

namaria · 2024-02-22T12:07:35 1708603655

> accurate representation of reality in its sphere of influence including all the microscopic chaotic effects, into perpetuity, faster than reality happens

This sounds like a newtonian universe. Reality has been proven to be indeterminate before observation, and assuming there is more then one observer in the universe, your equating data compression and full reality simulation to 'absolute intelligence' becomes untenable

altruios · 2024-02-21T21:39:34 1708551574

I read this, and I wonder: maybe cognition and data compression are closely related. We compress all our raw inputs into our brain into a somewhat wholistic experience - what is that other than compressing the data you experience world around you into a mental model of a query-able resolution?

ChainOfFools · 2024-02-21T22:22:59 1708554179

POETRY IS COMPRESSION

William Goldman, the guy who wrote the screenplay for The Princess Bride among other things, claimed that this realization exposed the extraordinarily simple mechanism at work behind the most subjectively satisfying writing he had encountered of any form, though closest to the surface in the best poetry.

further reminds me of another observation, not from Goldman but someone else I can't recall, to the effect that a poem is "a machine made of words."

fennecbutt · 2024-02-21T23:21:37 1708557697

Interpretation is an extremely lossy mechanism, though

ChainOfFools · 2024-02-21T23:58:57 1708559937

Very true, but it's an informed and curated loss. Necessarily so, because our couple kilograms lump of nerve tissue is completely unequal to the task of losslessly comprehending all of its own experiences, to say nothing of those of others, and infinitesimally so in comparison to the universe as a whole. We take points and interpolate a silhouette of reality from them.

I am strongly on board with the notion that everything that we call knowledge or the human experience is all a lossy compression algorithm, a predatory consciousness imagining itself consuming the solid reality on which it presently floats an existence as a massless, insubstantial ghost.

kianlocke · 2024-02-22T02:36:07 1708569367

which is why a sufficiently advanced prompt is indistinguishable from poetry ;)

spookybones · 2024-02-22T00:24:35 1708561475

Where does he talk about this? I’m interested in reading it

ChainOfFools · 2024-02-22T00:36:18 1708562178

The book itself is called Which lie did I Tell? And although this bit comes quite early in the text (I should disclose it's been a couple decades since I've read it), the book is mainly biographical.

Its a fun and smart read, but doesn't devote more than maybe a chapter reflecting on this revelation, even though Goldman, who wrote it in all caps in the book (which is why I wrote it that way in my post), considered it his most important or influential observation.

golergka · 2024-02-21T20:46:02 1708548362

Why do you think that data compression and cognition are fundamentally different?

namaria · 2024-02-21T20:49:24 1708548564

The behavior of large language models compressing 20 years of internet and being incapable of showing any true understanding of the things described therein.

CamperBob2 · 2024-02-21T21:27:01 1708550821

At some point, we'll have to define "true understanding." Now seems like a good time to start thinking about it.

namaria · 2024-02-21T21:32:50 1708551170

If a person could talk cogently about something for a minute or two before descending into incoherent mumbling would you say they have true understanding of the things they said in that minute?

zo1 · 2024-02-22T01:34:04 1708565644

Sounds like every debate and argument I've ever had. You push and prod their argument for a few sentences back and forth and before you know it they start getting aggressive in their responses. Probably because they know they will soon devolve into a complete hallucinatory mess.

namaria · 2024-02-22T05:19:42 1708579182

Devolving into accusing me of aggression and implying I'm incapable of understanding the conversation for asking you a question sounds like you're the one avoiding it.

CamperBob2 · 2024-02-21T21:49:40 1708552180

If so, you'll have to credit ChatGPT4 with the ability to do just that.

namaria · 2024-02-22T05:21:48 1708579308

Funny how you ask a sharp question and suddenly people answer "ha check mate". Two replies and two fast claims of winning the argument in response but not one honest answer.

CamperBob2 · 2024-02-22T06:05:44 1708581944

Did you have an actual point to make?

int_19h · 2024-02-22T03:13:53 1708571633

There are many context in which it does show "true understanding", though, as evidenced by the ability to make new conclusions.

Whether it has enough understanding is a separate question. Why should we treat the concept as a binary, when it's clearly not the case even for ourselves?

These models we have now are ultimately still toy-sized. Why is it surprising that their "compression" of 20 years of Internet is so lossy?

FredPret · 2024-02-21T21:16:55 1708550215

A human also compresses many years of experience into one conversation. Does this reflect true understanding of the things described?

Only the human doing the talking can know, and even that is on shaky ground.

(if you don't understand something, will you always realize this? You have to know it a little bit to judge your own competence).

namaria · 2024-02-21T21:38:36 1708551516

We compress data from many senses and can use that to interactively build inner models and filters for the data stream. The experience of psychedelics such as psylocibin and lsd can be summarized as disabling some of these filters. The deep dream trick google did a while back was a good illustration of hallucinations and also seen in some symptoms of schizophrenia. In my view that shows we are simulating some brain data processing functions. Results from the systems conducting these simulations are very far from the capabilities of humans but help shed light into how we work.

Conflating these systems with the full cognitive range of human understanding is disingenuous at best.

mistermann · 2024-02-22T01:22:29 1708564949

> The experience of psychedelics such as psylocibin and lsd can be summarized as disabling some of these filters.

I was thinking last night about where (during the trip) the certainty aspect of the "realer than reality" sensation comes from... The theory I came up with is that the certainty comes from the delta between the two experiences, as opposed to (solely) the psychedelic experience itself. This assumes that one's read on normal reality at the time remains largely intact, which I believe is (often) the case.

Further investigation is needed, I'm working from several years old memories.

FredPret · 2024-02-21T21:58:09 1708552689

It clearly can't have human understanding without being a human.

But that doesn't mean it can't have any understanding.

You can represent every word in English in a vector database; this isn't how humans understand words, but it's not nothing and might be better in some ways.

Fish swim, submarines sail.

Groxx · 2024-02-21T21:44:55 1708551895

Ignoring where I personally draw my line in the sand: people claiming they're the same have literally only failed in demonstrating it, so it's not much of a scientific debate. It's a philosophy or dogma.

It may be correct. Results are far from conclusive, or even supportive depending on interpretation.

maximus-decimus · 2024-02-22T03:03:01 1708570981

Because compressed data alone doesn't allow you to deal with new concept and theories?

You think a data compression algorithm could have invented the atomic bomb?

staticman2 · 2024-02-21T21:31:09 1708551069

Why would anyone think otherwise?

mplewis · 2024-02-22T07:15:07 1708586107

Are you serious? Go outside.

runeofdoom · 2024-02-21T17:32:41 1708536761

That last part sounds like the Orz from Star Control II. Almost sensical, in a vaguely creepy way. Like an uncanny valley for langauge.

zestyping · 2024-02-23T01:31:31 1708651891

Don't forget to enjoy the sauce!

FredPret · 2024-02-21T21:17:45 1708550265

Jumping peppers but that game was good

nemomarx · 2024-02-21T21:30:34 1708551034

you probably know, but the open source version from a while back is hitting steam soon. I Don't think it's any different though

FredPret · 2024-02-21T21:55:32 1708552532

You mean Ur-Quan Masters? I think there are small differences, and some bug fixes. My bet is it's probably better than the original.

engineer_22 · 2024-02-21T17:23:46 1708536226

It reads like a bad Chinese translation :)

irthomasthomas · 2024-02-22T08:15:00 1708589700

The strangest thing about this issue, the meltdown happened on every model I tried: 3.5-turbo, 4-turbo, and 4-vision where all acting dumb as dirt. How can this be? There must be a common model shared between them, a router model perhaps. Or someone swapped out every model with a 2bit quantized version?

kordlessagain · 2024-02-22T01:20:02 1708564802

GPT-3.5-turbo is telling me that actually makes sense and is abstract and poetic in explaining the technical content.

> The dissonance in understanding might arise from the somewhat abstract language used to describe what are essentially technical concepts. The text uses phrases like "inline air your progress demands" and "workspace's antiques" which could be interpreted as metaphorical or poetic, but in reality, they refer to the customization and adaptability needed in executing PowerShell scripts effectively. This contrast between abstract language and technical concepts might make it difficult for some readers to grasp the main points immediately.

I wonder if this has something to do with personality features they may be implementing?

jimmux · 2024-02-22T07:30:55 1708587055

I think that's more due to GPT's need to please, so if you ask it to make sense of something it will assume there is some underlying sense to it, rather than say it's unparsable gibberish.

kordlessagain · 2024-02-24T00:58:30 1708736310

I had it do it with several anecdotical reports, and it said those were nonsense, where this one it said made sense and explained why. Metaphorically speaking is a thing, and it doesn't make it inaccurate, just a bit odd.

resource0x · 2024-02-21T20:22:34 1708546954

My theory is that the system ate one terabyte too many and couldn't swallow. Too much data in the training set might not be beneficial. It's not just diminishing returns, but rather negative returns.

visarga · 2024-02-21T06:58:15 1708498695

Looks like they lowered quantization a bit too much. This sometimes happens with my 7B models. Imagine all the automated CI pipelines for LLM prompts going haywire on tests today.

iforgotpassword · 2024-02-21T07:03:23 1708499003

Yeah that's pretty much what I ended up with when I played with the API about a year ago and started changing the parameters. Everything would ultimately turn into more and more confusing English incantation, ultimately not even proper words anymore.

imtringued · 2024-02-21T12:42:31 1708519351

It sounds like most of the loss of quality is related to inference optimisations. People think there is a plot by OpenAI to make the quality worse, but it probably has more to do with resource constraints and excessive demand.

Tiberium · 2024-02-21T07:39:56 1708501196

I think the issue was exclusive to ChatGPT (a web frontend for their models), issues with ChatGPT don't usually affect the API.

Jabrov · 2024-02-21T07:04:00 1708499040

Sounds a lot like when one of my schizo ex-friends would start clanging https://en.wikipedia.org/wiki/Clanging

noduerme · 2024-02-21T07:31:22 1708500682

This is an underrated observation. It's probably a mathematically similar phenomenon happening in GPT. And/or it discovered meth.

anakaine · 2024-02-21T08:56:28 1708505788

MethGPT sounds terrible.

mzi · 2024-02-21T10:32:50 1708511570

I've heard the term "TjackGPT" in Swedish when it derails. The "tj" is pronounced as "ch" and tjack is slang for "amphetamines", so "speed".

Not far from MethGPT!

rl3 · 2024-02-21T10:07:40 1708510060

>MethGPT sounds terrible.

I just hope Vince Gilligan will direct Breaking RAG.

crotchfire · 2024-02-21T11:50:58 1708516258

Saul Gradientman, at your service. Just watch out for Tensor Salamanca.

int_19h · 2024-02-22T03:25:16 1708572316

You can tell those things to behave as if they are on meth, LSD etc.

The extent to which it will be accurate depends on how much of sample transcripts were in its training data, I suppose.

offices · 2024-02-21T10:20:15 1708510815

Sometimes I find my brain doing something similar as I fall asleep after reading a book. Feeding me a stream of words that feel like they're continuing the style and plot of the book but are actually nonsense.

jerf · 2024-02-21T20:55:50 1708548950

I think GPT tech in general may "just" be a hypertrophied speech center. If so, it's pretty cool and clearly not merely a human-class speech center, but already a fairly radically super-human speech center.

However, if I ask your speech center to be the only thing in your brain, it's not actually going to do a very good job.

We're asking a speech center to do an awful lot of tasks that a speech center is just not able to do, no matter how hypertrophied it may be. We need more parts.

lambdatronics · 2024-02-22T02:19:07 1708568347

>already a fairly radically super-human speech center >We're asking a speech center to do an awful lot of tasks that a speech center is just not able to do

Exactly!

>We need more parts.

Yeah, imagine what happens once we get the whole thing wired up...

Alifatisk · 2024-02-21T11:30:47 1708515047

Clanging is such a good description of GPTs hallucinations, what a great find!

itronitron · 2024-02-21T07:28:36 1708500516

It also reads like it was written by some beat poets.

teaearlgraycold · 2024-02-21T07:19:17 1708499957

A Markov chain

BeFlatXIII · 2024-02-21T14:04:18 1708524258

The example provided on that page reads like a semantic markov chain.

b800h · 2024-02-21T10:22:43 1708510963

This should be christening "Clanging" for the purposes of AI as well. The mechanism is probably analogous.

crotchfire · 2024-02-21T11:44:31 1708515871

And blood-black nothingness began to spin... A system of cells interlinked within cells interlinked within cells interlinked within one stem... And dreadfully distinct against the dark, a tall white fountain played.

Cells

Have you ever been in an institution? Cells.

Do they keep you in a cell? Cells.

When you're not performing your duties do they keep you in a little box? Cells.

Interlinked.

What's it like to hold the hand of someone you love? Interlinked.

Did they teach you how to feel finger to finger? Interlinked.

Do you long for having your heart interlinked? Interlinked.

Do you dream about being interlinked... ?

What's it like to hold your child in your arms? Interlinked.

Do you feel that there's a part of you that's missing? Interlinked.

Within cells interlinked.

Why don't you say that three times: Within cells interlinked.

Within cells interlinked. Within cells interlinked. Within cells interlinked.

Constant K. You can pick up your bonus.

js8 · 2024-02-21T07:53:15 1708501995

I think the real problem is we don't know what these LLMs SHOULD do. We've managed to emulate humans producing text using statistical methods, by training a huge corpus of data. But we have no way to tell if the output actually makes any sense.

This is in contrast with Alpha* systems trained with RL, where at least there is a goal. All these systems are essentially doing is finding an approximation of an inverse function (model parameters) to a function that is given by the state transition function.

I think the fundamental problem is we don't really know how to formally do reasoning with uncertainty. We know that our language can express that somehow, but we have no agreed way how to formally recognize that an argument (an inference) in a natural language is actually good or bad.

If we knew how to formally define whether an informal argument is good or bad (so that we could compare them), that is, if we knew a function which would tell if the argument is good or bad, then we could build an AI that would search for its inverse, i.e. provide good arguments and draw correct conclusions. Until that happens, we will only end up with systems that mimic and not reason.

kromem · 2024-02-21T09:01:54 1708506114

Well, we started with emulating humans producing text.

But then quickly pivoted to find tuning and instructing them to produce text as a large language model.

Which isn't something that existed in the text they were trained on. So when it didn't exist, they seemed to fall back on producing text like humans in the 'voice' of a large language model according to the RLHF.

But then outputs reentered the training data. So now there's examples of how large language models produce text. Which biases towards confabulations and saying they can't do the thing being asked.

And around the time the training data has been updated each time at OpenAI in the past few months they keep having their model suddenly refuse to do requests or now just...this.

Pretty much everything I thought was impressive and mind blowing with that initial preview of the model has been hammered out of it.

We see a company that spent hundreds of millions turn around and (in their own ignorance of what the data was encoding beyond their immediate expectations) throw out most of the value chasing rather boring mass implementations that see gradually imploding.

I can't wait to see how they manage to throw away seven trillion due to their own hubris.

ganzuul · 2024-02-21T11:50:04 1708516204

The feedback is like an exponential function fed to a ReLU.

https://arxiv.org/abs/1805.07091

It was predictable that they hammered out what was impressive about it by trying to improve it with fast iteration towards a set of divergent goals.

astrange · 2024-02-21T20:33:43 1708547623

I don't think there are any such feedback issues. GPT4 sometimes makes worse replies but that's because 1. the system prompt got longer to allow for multiple tools and 2. they pruned it, which is why it's much faster now and has a higher reply cap.

ddingus · 2024-02-22T06:09:36 1708582176

I am hoping other OSS models will reach similar power. Even if training is really slow, we could make really useful models that don't get nerfed everytime some talking head blathers about

bombcar · 2024-02-21T17:59:11 1708538351

The biggest thing ChatGPT has exposed is how much human writing is write only and never actually read.

Just a bit upthread we have people mentioning that a business email that is more than a few lines long will just be ignored.

pixl97 · 2024-02-21T20:45:28 1708548328

I write quite a lot of support email to customers and find myself doing the following quite often

start by a short list of what the customer has to do

1. To step A 2. send me logs B 3. Restart C

Then have an actual paragraph describing why we're doing these steps.

If you just send the paragraph to most customers you find they do step one, but never read deeper into the other steps, so you end up sending 3 emails to get the above done.

jijijijij · 2024-02-21T17:49:39 1708537779

> We know that our language can express that somehow

Do we?

I don't think that's true. I think we rely on an innate, or learned trust heuristic placed upon the author and context. Any claim needs to be sourced, or derived from "common knowledge", but how meticulously we enforce these requirements depends on context derived trust in a common understanding, implied processes, and overall the importance a bit of information promises by a predictive energy expenditure:reward function. I think that's true for any communication between humans, and also the reason we fall for some fallacies, like appeal to authority. Marks of trustworthiness may be communicated through language, but it's not encoded in the language itself. The information of trustworthiness itself is subject to evaluation. Ultimately, "truth" can't be measured, but only agreed upon, by agents abstractly rating it's usefulness, or consequence for their "survival", as a predictive model.

I am not sure any system could respectively rate an uncertain statement without having agency (as all life does, maybe), or an ultimate incentive/reference in living experience. For starters, a computer doesn't relate to the implied biological energy expenditure of a "adversary's" communication, their expectation of reward for lying or telling "the truth". It's not just pattern matching, but understanding incentives.

For example, the context of a piece of documentation isn't just a few surrounding paragraphs, but the implication of an author's lifetime and effort sunk into it, their presumed aspiration to do good. In a man-page, I wouldn't expect an author's indifference or maliciousness about it's content, at all, so I place high trust in the information's usefulness. For the same reason I will never put any trust in "AI" content - there is no cost in its production.

In the context of LLMs, I don't even know what information means in absence of the intent to inform...

Some "AI" people wish all that context was somehow encoded in language, so, magically, these "AI" machines one day just get it. But I presume, the disappointing insight will finally come down to this: The effectiveness of mimicry is independent of any functional understanding - A stick insect doesn't know what it's like to be a tree.

https://en.wikipedia.org/wiki/Mimicry

me_me_me · 2024-02-21T17:30:04 1708536604

> We've managed to emulate humans producing text using statistical methods

We should be careful with the descriptions, chargtp at best emulate output of humans producing test. In no way it emulates the process of humans producing text.

Chatgtp X could be the most convincing ai claiming to be alive and sentient but its just very refined 'next word generator'.

> If we knew how to formally define whether an informal argument is good or bad (so that we could compare them), that is, if we knew a function which would tell if the argument is good or bad, then we could build an AI that would search for its inverse, i.e. provide good arguments and draw correct conclusions.

Sounds like you would solve 'the human problem' with that function ;)

but I don't think there are ways to boil down an argument/problem to good/bad in real life. Except for math that has formal ways of doing it withing the confines of the math domain.

Our world is made of guesses and good enough solutions. There is no perfect bridge design that is objectively flawless. its bunch of sliders, cost, throughput, safety, maintenance etc.

astrange · 2024-02-21T20:30:48 1708547448

> Chatgtp X could be the most convincing ai claiming to be alive and sentient but its just very refined 'next word generator'.

This is meaningless. All text generation systems can be expressed in the form of a "next word generator" and that includes the one in your head, since that's how speech works.

me_me_me · 2024-02-22T09:56:36 1708595796

We most certainly do not generate words to express our thoughts one word at the time using statistical model of what word should go next.

astrange · 2024-02-22T19:22:48 1708629768

Yes we do. If you write a speech and read it aloud then your written speech is a "statistical model of what word should go next". Any method of creating language can be expressed in this form.

(For text, you might want to go back and edit what you've already written, but that can be handled with a token that says to start over.)

oxfordmale · 2024-02-21T07:53:33 1708502013

I have also seen ChatGPT going berserk yesterday, but in a different way. I have successfully used ChatGPT to convert an ORM query into an actual SQL query for performance trouble shooting. It mostly worked until yesterday when it start outputting garbage table names that weren't even present in the code.

ChatGPT seemed to think the code is literature and was trying to write the sequel to it. The code style matches the original one so it took some head scratching to find out why those tables didn't exist.

rsynnott · 2024-02-21T09:43:13 1708508593

Okay, so I don’t really _get_ ChatGPT, but I’m particularly baffled by this usecase; why don’t you simply have your ORM tell you the query it is generating, rather than what a black box guesses it might be generating? Depends on the ORM, but generally you’ll just want to raise the log level.

kamray23 · 2024-02-21T12:03:41 1708517021

No, it's very close to useless. This is exactly the kind of thing that experienced developers talk about when they warn that inexperienced developers using ChatGPT could easily be a disaster. It's the attempt to use a LLM as a crystal ball to retrieve any information they could possibly want - including things it literally couldn't know or good recommendations for which direction to take an architecture. I'm certain there will be people who do stuff exactly like this and will have 'unsolvable' performance issues because of it and massive amounts of useless work as ChatGPT loves suggesting rewrites to convert good code to certain OO patterns (which don't necessarily suit projects) as a response to being asked what it thinks a good solution to a minor issue might be.

LandR · 2024-02-21T11:28:55 1708514935

EVen if your ORM doesn't support this, you can always just turn on profiler on SQL and capture the actual query.

SSMS has SQL Server PRofiler, i'm sure others have similar.

tomwphillips · 2024-02-21T09:59:10 1708509550

Agree. Bizarre to use an LLM to do that. I wouldn’t be surprised if the LLM output wasn’t identical to the ORM-generated SQL.

rsynnott · 2024-02-21T10:23:55 1708511035

I'd be very surprised if the LLM output is anything _like_ the ORM's, tbh, based on (at this point about a decade old; maybe things have improved) experience. ORMs cannot be trusted.

oxfordmale · 2024-02-23T11:34:00 1708688040

I am a Data Engineer and it would take me ages to spin up the service with the right log level to grab the query. It is much easier to grab it from the codebase. I used to do this manually from time to time, now I use ChatGPT.

mewpmewp2 · 2024-02-21T10:07:37 1708510057

Maybe didn't have the environment set up locally and did initial investigative work?

rsynnott · 2024-02-21T10:27:29 1708511249

So, if I wanted to investigate ORM output and didn't have an appropriate environment set up, I would simply set one up. If you just want to see SQL output this should be trivial; clone the repo, install any dependencies, modify an integration test. What I would not do is ask a machine noted for its confident incorrectness to imagine what the ORM's output might be.

Like, this is not doing investigative work. That’s not what ‘investigative’ means.

mewpmewp2 · 2024-02-21T10:54:40 1708512880

So imagine there is an urgent performance issue in production and you have a hunch that this SQL code may be the culprit. However before doing all of what you mentioned you want to verify it before following down a bad path. Maybe the environment setup could take few hours, maybe it is not a repo or codebase you are even familiar with. Typical in a large org. But if you know the SQL you will be able to run it raw to see if this causes it. Then maybe you can page the correct team to wake them up etc and fix it themselves.

rsynnott · 2024-02-21T11:04:41 1708513481

But _you do not_ know the SQL. To be clear, ChatGPT will not be able to tell you what the ORM will generate. At best, it may tell you something that an ORM might plausibly generate.

(If it's a _production_ issue, then you should talk to whoever runs your databases and ask them to look at their diagnostics; most DBMs will have a slow query log, for a start. You could also enable logging for a sample of traffic. There are all sorts of approaches likely to be more productive than _guessing_.)

mewpmewp2 · 2024-02-21T19:28:07 1708543687

So I don't know what use-case exactly OP had, but all of your suggestions can potentially take hour or more and might depend on other people or systems you might not have access to.

While with GPT you can get an answer in 10 seconds, and then potentially try out the query in the database yourself to see if it works or not. If it worked for him so far, it must've worked accurately enough.

I would see this some sort of niche solution although OP seemed to indicate it's a recurrent thing they do.

I have used ChatGPT for thousands of things, which are on the scale of like this, although I would mostly use if it's potentially an ORM I don't know anything about in a language I don't have experience with, e.g. to see if does some sort of JOIN underneath or does an IN query.

If there was a performance issue to debug, then best case is that the query was problematic, and then when I run the GPTs generated query I will see that it was slow, so that's a signal to investigate it further.