Replacing my best friends with an LLM trained on 500k group chat messages

mabbo · on April 12, 2023

While I love all these stories of turning your friends and loved ones into chat bots so you can talk to them forever, my brain immediately took a much darker turn because of course it did.

How many emails, text messages, hangouts/gchat messages, etc, does Google have of you right now? And as part of their agreement, they can do pretty much whatever they like with those, can't they?

Could Google, or any other company out there, build a digital copy of you that answers questions exactly the way you would? "Hey, we're going to cancel the interview- we found that you aren't a good culture fit here in 72% of our simulations and we don't think that's an acceptable risk."

Could the police subpoena all of that data and make an AI model of you that wants to help them prove you committed a crime and guess all your passwords?

This stuff is moving terrifyingly fast, and laws will take ages to catch up. Get ready for a wild couple of years my friends.

PragmaticPulp · on April 12, 2023

> Could Google, or any other company out there, build a digital copy of you that answers questions exactly the way you would? "Hey, we're going to cancel the interview- we found that you aren't a good culture fit here in 72% of our simulations and we don't think that's an acceptable risk."

If a company is going to snoop in your personal data to get insights about you, they'd just do it directly. Hiring managers would scroll through your e-mails and make judgment calls based on their content.

Training an LLM on your e-mails and then feeding it questions is just a lower accuracy, more abstracted version of the above, but it's the same concept.

So the answer is: In theory, any company could do the above if they wanted to flout all laws and ignore the consequences of having these practices leak (which they inevitably would). LLMs don't change that. They could have done it all along. However, legally companies like Google cannot, and will not, pry into your private data without your consent to make hiring decisions.

Adding an LLM abstraction layer doesn't make the existing laws (or social/moral pressure) go away.

softfalcon · on April 12, 2023

> Adding an LLM abstraction layer doesn't make the existing laws (or social/moral pressure) go away.

Isn't the "abstraction" of "the model" exactly the reason we have open court filings against stable diffusion and other models for possibly stealing artist's work in the open source domain and claiming it's legal while also being financially backed by major corporations who are then using said models for profit?

Whose to say that "training a model on your data isn't actually stealing your data" it's just "training a model" as long as you delete the original data after you finish training?

What if instead of Google snooping, they hire a 3rd party to snoop it, then another 3rd party to transfer it, then another 3rd party to build the model, then another 3rd party to re-sell the model. Then create legal loopholes around which ones are doing it for "research" and which ones are doing it for profit/hiring. All of the sudden, it gets really murky who is and isn't allowed to have a model of you.

I feel one could argue that the abstraction is exactly the kind of smoke screen that many will use to avoid the social/moral pressures legally, allowing them to do bad things but get away with it.

PragmaticPulp · on April 12, 2023

> for possibly stealing artist's work in the open source domain

The provenance of the training set is key. Every LLM company so far has been extremely careful to avoid using people's private data for LLM training, and for good reason.

If a company were to train an LLM exclusively on a single person's private data and then use that LLM to make decisions about that person, the intention is very clearly to access that person's private data. There is no way they could argue otherwise.

dragonwriter · on April 12, 2023

> Every LLM company so far has been extremely careful to avoid using private people's data for LLM training

No, they haven’t. (Now, if you said “people's private data” instead of “private people's data”, you’d be, at least, less wrong.)

softfalcon · on April 12, 2023

I've spoken with a lawyer about data collection in the past and I think there might be a case if you were to:

- collect thousands of people's data

- anonymize it

- then shadow correlate the data in a web

- then trace a trail through said web for each "individual"

- then train several individuals as models

- then abstract that with a model on top of those models

Now you have a legal case that it's merely an academic research into independent behaviors affecting a larger model. Even though you may have collected private data, the anonymization of it might fall under ethical data collection purposes (Meta uses this loophole for their shadow profiling).

Unfortunately, I don't think it is as cut and dry as you explained. As far as I know, these laws are already being side-stepped.

For the record, I don't like it. I think this is a bad thing. Unfortunately, it's still arguably "legal".

Wowfunhappy · on April 12, 2023

I realize that data can be de-anonymized, but if the same party anonymized and de-anonymized the data... well, IANAL, and you apparently talked to one, but that doesn't seem like something a court would like.

dragonwriter · on April 12, 2023

> Hiring managers would scroll through your e-mails and make judgment calls based on their content.

> Training an LLM on your e-mails and then feeding it questions is just a lower accuracy, more abstracted version of the above, but it's the same concept.

Its also one that once you have cheap enough computing resources scales better, because you don't need to assign literally any time from your more limited pool of human resources to it. Yes, baroque artisanal manual review of your online presence might be more “accurate” (though there's probably no applicable objective figure of merit), but megacorporate hiring filters aren't about maximizing accuracy they are about efficiently trimming the applicant pool before hiring managers have to engage with it.

jacquesm · on April 12, 2023

And that accuracy is improving at breakneck speed. The difference between the various iterations of ChatGPT is nothing short of astounding. Their progress speed is understandable, they need to keep moving or the competition can catch up, but that doesn't necessarily mean that those improvements are out there or within reach. And yet, every time they release I can't help but being floored by the qualitative jump between the new version and the previous one.

1attice · on April 12, 2023

> If a company is going to snoop in your personal data to get insights about you, they'd just do it directly. Hiring managers would scroll through your e-mails and make judgment calls based on their content.

This is like saying, "look, no one would be daft enough to draw a graph, they'd just count all the data points and make a decision."

You're missing two critical things:

(1) time/effort (2) legal loophole.

A targeted simulation LLM (a scenario I've been independently afraid of for several weeks now) would be a brilliant tool for (say) an autocratic regime to explore the motivations and psychology of protesters; how they relate to one another; who they support; what stimuli would demotivate ('pacify') them; etc.

In fact, it's such a good opportunity it would be daft not to construct it. Much like the cartesian graph opened up the world of dataviz, simulated people will open up sociology and anthropology to casual understanding.

And, until/unless there are good laws in place, it provides a fantastic chess-knight leap over existing privacy legislation. "Oh, no we don't read your emails, no that would be a violation; we simply talk to an LLM that read your emails. Your privacy is intact! You-prime says hi!"

epups · on April 12, 2023

> This is like saying, "look, no one would be daft enough to draw a graph, they'd just count all the data points and make a decision."

Not really. Assuming your ethical compass is broken and you suspected your partner of cheating, would you rather have access to their emails or to a LLM trained on them? Also, isn't it much cheaper for Google to simply search for keywords rather than fine tuning a model for this?

At least in the EU, a system like this would be made illegal on day one. This whole doomsday scenario seems predicated on a hypothetical future where LLM's would be the least of your worries.

1attice · on April 13, 2023

This isn't a doomsday scenario, this is just business as usual, but with better tools.

have you met capitalism?

I feel like I'm talking to someone from the timeline where Clearview AI and Cambridge Analytics never happened.

astrange · on April 13, 2023

Cambridge Analytica didn't actually work, did it?

Generally I think this idea can't work because of Goodhart's Law - people's behavior changes when you try to influence them.

1attice · on April 13, 2023

I'm really not sure where 'didn't work' comes from. Some folks think it was ineffective. Others think it worked great. https://en.wikipedia.org/wiki/Cambridge_Analytica#Assessment...

For my argument, I only need to point out that it was attempted, as I'm proving motivation; the effectiveness of CA methods has no bearing on the effectivenss of (say) simulated people.

Increasingly, when interacting with comments on HN and elsewhere, it feels like I'm from a parallel timeline where things happened, and mattered, and an ever-growing percentage of my interlocutors are, for lack of a better word, dissociated. Perhaps not in the clinical sense, but certainly in the following senses:

- Cause and effect are not immediately observed without careful prompting.

- Intersubjectively verifiable historical facts that happened recently are remembered hazily, and doubtfully, even by very intelligent people

- Positions are expressed that somehow disinclude unfavourable facts.

- Data, the gold standard for truth and proof, is not sought, or, if proffered, is not examined. The stances and positions held seem to have a sort of 'immunity' to evidence.

- Positions which are not popular in this specific community are downranked without engagement or argument, instead of discussed.

I do believe folks are working backward from the emotional position they want to maintain to a set of minimizing beliefs about the looming hazards of this increasingly fraught decade.

Let's call this knee-jerk position "un-alarmism", as in "that's just un-alarmism".

I'm going to say as much here.

epups · on April 13, 2023

Those two are grest examples of companies being hit with huge fines or bans in the EU after their practices were discovered. Saying "capitalism" as if that's an argument is juvenile - by that logic we will soon be enslaved by big corporations, nothing we can do about it then.

1attice · on April 13, 2023

'juvenile' is a juvenile way of describing a ~200-year-old intellectual tradition that you disagree with. Go call Piketty.

And yes, frankly, the emergence of generative AI does vastly accelerate the normal power concentration inherent in unregulated capitalist accumulation. Bad thing go fast now soon.

epups · on April 13, 2023

I've read Piketty, he calls for more regulation to address the issues associated with disparities in capital accumulation. He does not merely puts his hands in the air and predicts inescapable doom.

The irony here is that Western capitalist democracies are the only place where we can even think about getting these privacy protections.

1attice · on April 13, 2023

A straw man. There's no doom, just a worsening of present patterns.

kenjackson · on April 12, 2023

> And, until/unless there are good laws in place, it provides a fantastic chess-knight leap over existing privacy legislation. "Oh, no we don't read your emails, no that would be a violation; we simply talk to an LLM that read your emails. Your privacy is intact! You-prime says hi!"

That seems as poor as saying, "We didn't read your emails -- we read a copy of your email after removing all vowels!"

1attice · on April 12, 2023

Most certainly, yes, it's as poor as saying that.

But we live in distressed times, and the law is not as sane and sober as it once was. (Take, for example, the Tiktok congressional hearing; the wildly overbroad RESTRICT act; etc.)

If the people making and enforcing the laws are as clueless and as partisan as they by-all-accounts now are, what gives you hope that, somehow, some reasonable judge will set a reasonable precedent? What gives you hope that someone will pass a bill that has enough foresight to stave off non-obvious and emergent uses for AI?

This is not the timeline where things continue to make sense.

jrm4 · on April 12, 2023

No -- but what it DOES do is possibly "put the idea in someone's head."

As I've always said: the thing about the big companies that suck up your data, consider any possible idea of what they could do with it. Ask, is it:

- not expressly and clearly illegal? - at least a little bit plausibly profitable?

If the answer is yes to both, you should act as if they're going to do it. And if they openly promise not to do it, but with no legal guarantee, that means they're DEFINITELY going to eventually do it. (see e.g. what's done with your genetic data by the 23 and me's and such)

sklrgjtwlekrjt · on April 12, 2023

That takes way too long though. Creating/training/testing an LLM can be automated. Why do the interviews at all, why pay a hiring manager at all, when you can just do everything virtually and have an AI spit out a list of names to send offers to and how much each offer should be?

rcme · on April 12, 2023

> If a company is going to snoop in your personal data to get insights about you, they'd just do it directly. Hiring managers would scroll through your e-mails and make judgment calls based on their content.

Maybe, but LLMs have incredibly intricate connections between all the different parameters in the model. For instance, perhaps someone who does mundane things X, Y, Z, also turns out to be racist. An LLM can build a connection between X, Y, Z whereas a recruiter could not. An LLM could also be used to standardize responses among candidates. E.g. a recruiter could tune an LLM on a candidate and then ask "What do you think about other races? Please pick one of the four following options: ...". A recruiter wouldn't even be necessary. This could all be part of an automated prescreening process.

CSSer · on April 12, 2023

I think any HR manager or legal professional that would let a company anywhere near this shouldn't be employed as such. This sounds like a defamation lawsuit waiting to happen.

rcme · on April 12, 2023

Perhaps "racism detector" is a bit too on the nose. Replace racism with any hiring characteristic: e.g. "How would you handle this work conflict?"

nindalf · on April 12, 2023

I think flouted works better than flaunted when talking about laws.

dspillett · on April 12, 2023

> Training an LLM on your e-mails and then feeding it questions is just a lower accuracy, more abstracted version of the above, but it's the same concept.

Less accurate, more abstracted, but more automatable. This might be seen as a reasonable trade-off.

It might also be useful as a new form of proactive head-hunting: collect data on people to make models to interrogate and sell access to those models. Companies looking for a specific type of person can then use the models to screen for viable candidates that are then passed onto humans in the recruitment process. Feels creepy stalky to me, but recruiters are rarely above being creepy/stalky any more than advertisers are.

selcuka · on April 13, 2023

> Less accurate, more abstracted, but more automatable.

That is true. In fact most job applications are sifted through by robots looking for relevant keywords in your CV, and this would only be the next logical step.

Jeff_Brown · on April 12, 2023

It's less accurate but far cheaper. In even half-rational actors (and I think companies qualify as half rational) costs, not just benefits, matter.

wingworks · on April 12, 2023

There is a Black Mirror episode on this. They've covered this kinda thing a few times.

13years · on April 12, 2023

For all the fears of AGI, these are the more concrete nefarious uses we can actually reason about. It is a point I often make that we don't need AGI for AI to already become very disturbing in its potential use.

The other point, is that technically this AI is not "unaligned". It is doing exactly what is requested of the operator.

The implications are that humanity suffers in either scenario, either by our own agency in control of power we are not prepared to manage or we will be managed by power that we can not control.

PragmaticPulp · on April 12, 2023

> It is a point I often make that we don't need AGI for AI to already become very disturbing in its potential use.

But we don't need AI or LLMs at all for the above scenario. Companies don't currently pry into your e-mails to make hiring decisions, but they could (ignoring laws) do it if they wanted. No LLM or AI necessary.

So why would the existence of AIs or LLMs change that?

If they wanted to use the content of your e-mails against you, they don't need an LLM to do it.

pixl97 · on April 12, 2023

Running an authoritarian police state is risky because of all the people involved in the authoritarian police state, also it's massively expensive to keep all those people snooping and you have to take them out and kill them on occasion because they learn too much.

But wait, you can just dump that information into a superAIcomputer and get reliable enough results while not needing a break with little to no risk of the computer rising up against you. Sounds like a hell of a deal.

Quantity is a quality in itself.

dangond · on April 12, 2023

Because it's now cheaper and more cost-effective, and if they can get away with it, saves them tons of money. Note: I don't think companies are likely to do this, but being able to do this without AI is not sufficient reason to dismiss the possibility. It's the same reason people who wouldn't steal DVDs from a store would pirate movies online. Much harder to get caught and easier to do, so this new way of watching movies for free became popular while the previous method was not.

nonbirithm · on April 12, 2023

I feel like the backlash against Stable Diffusion had the opposite change in visibility. It revealed that thousands of people wanted a way to produce unique art in the styles of living artists, where some of those people might have gone to either their Patreon or a piracy site that scraped Patreon instead. Either way they're not as visible if they're only consuming the result.

To some artists, AI generated images from their styles would amount to "productive piracy." Unlike torrenting the act is often out in the open since users tend to share the results online. I'm not sure if this phenomenon has happened before; with teenagers pirating Photoshop it's impossible to tell from a glance if the output is from a pirated version.

13years · on April 12, 2023

Whenever we get to see behind the corporate veil, we often find companies don't abide by laws. How many companies failed this year hiding nefarious activities?

Also, what types of behavior did we get a glimpse of from the Twitter Files?

Aren't there always constant lawsuits about bad behavior of companies especially around privacy?

So yes, we are talking about the same behavior existing, but the concern is that they now get orders of magnitude more power to extend such bad behavior.

PragmaticPulp · on April 12, 2023

> Also, what types of behavior did we get a glimpse of from the Twitter Files?

Can you actually explain the types of bad behavior? The rhetorical question about The Twitter Files somehow being a groundbreaking expose of bad behavior doesn't really match anything I've seen. Most of what was cited was essentially a social media company trying to enforce their rules.

Might want to read up on the latest developments there. Several journalists have debunked a lot of the key claims in the "Twitter Files". Taibbi's part was particularly egregious, with some key numbers he used being completely wrong (e.g. claiming millions when the actual number was in the thousands, exaggerating how Twitter was using the data, etc.).

Even Taibbi and Elon have since had a falling out and Taibbi is leaving Twitter.

If Elon Musk so famously and publicly hates journalists for lying, spinning the truth, and pushing false narratives, why would he enlist journalists for "The Twitter Files"? The answer is in plain view: He wanted to take a nothingburger and use journalists to put a spin on it, then push a narrative.

Elon spent years saying that journalists can't be trusted because they're pushing narratives, so when Elon enlists a select set of journalists to push a narrative, why would you believe it's accurate?

> So yes, we are talking about the same behavior existing, but the concern is that they now get orders of magnitude more power to extend such bad behavior.

No they don't. The ultimate power is being able to read the e-mails directly. LLMs abstract that with a lower confidence model that is known to hallucinate answers when the underlying content doesn't have a satisfactory set of content.

13years · on April 12, 2023

That is not evidence against bad behavior, that is more evidence of bad behavior.

I agree that Musk has not honored his original intent. He has already broken in many ways the transparency pledge and free speech principles.

Yet, these were already broken under previous ownership. We simply see that as continuing.

chamakits · on April 12, 2023

Because the law as it stands today for AI and LLMs is untested; and because it’s untested, it’s frequently seen by AI based products and companies as something that can be done without legal ramifications or at least something that isn’t blocking their products from being used this way.

roughly · on April 12, 2023

> How many emails, text messages, hangouts/gchat messages, etc, does Google have of you right now? And as part of their agreement, they can do pretty much whatever they like with those, can't they?

> Could Google, or any other company out there, build a digital copy of you that answers questions exactly the way you would?

I mean, this is almost exactly their business model - they sell advertising, and they use the model they built of you based on the ludicrous amount of data they've gathered on you to predict whether that advertising will matter to you.

nostrademons · on April 12, 2023

"Could Google, or any other company out there, build a digital copy of you that answers questions exactly the way you would? "Hey, we're going to cancel the interview- we found that you aren't a good culture fit here in 72% of our simulations and we don't think that's an acceptable risk."

They kinda did - that's what GMail/Chat/Docs autosuggest does. You've got canned replies to e-mail, editors that complete your sentences, etc.

It works okay for simple stuff - completing a single sentence or responding "OK, sounds good" to an e-mail that you already agree with. It doesn't work all that well for long-form writing, unless that long-form writing is basically just bullshit that covers up "OK, sounds good". (There's a joke within Google now that the future of e-mail is "OK, sounds good" -> AI bullshit generator -> "Most esteemed colleagues, we have organized a committee to conduct a study on the merits of XYZ and have developed the following conclusions [2 pages follow]" -> AI bullshit summarizer -> "OK, sounds good".)

This is a pretty good summary of the state of LLMs right now. They're very good at generating a lot of verbiage in areas where the information content of the message is low but social conventions demand a lot of verbiage (I've heard of them used to good effect for recommendation letters, for example). They're pretty bad at collecting & synthesizing large amount of highly-precise factual information, because they hallucinate facts that aren't there and often misunderstand the context of facts.

softfalcon · on April 12, 2023

I completely agree with you about them failing to be accurate for the various reasons you've explained (hallucinating, limited social conventions, etc).

Unfortunately, I've heard enough people believe the hype that this is actually "synthesizing sentience into the machine" or some other buzz speak.

I have met researchers of AI at credible universities who believe this kind of thing, completely oblivious to how ChatGPT or other models actually work. All it takes is one of them talking out of their butt to the right person in government or law enforcement and you've got people at some level believing the output of AI.

Hell, even my father, who is a trained engineer with a master's degree, can compute complex math and studies particle physics for "fun" had to be thoroughly convinced that ChatGPT isn't "intelligent". He "believed" for several days and was sharing it wildly with everyone until I painfully walked him through the algorithm.

There is a serious lack of diligence happening for many folks and the marketing people are more than happy to use that to drive hype and subtly lie about the real capabilities to make a sale.

I am often more concerned about the people using AI than the algorithm itself.

SanderNL · on April 12, 2023

You seem to think intelligence is something more than data storage and retrieval and being able to successfully apply it to situations outside your training set.

Even very small signs of that ability are worthy of celebration. Why do you feel the need to put it down so hard? Why the need to put down your father, to “enlighten” him?

What is missing? Soul? Sentience?

softfalcon · on April 12, 2023

I do think intelligence is something more than data storage and retrieval. I believe it is adaptive behavior thinking about what data I have, what I could obtain, and how to store/retrieve it. I could be wrong, but that's my hypothesis.

We humans don't simply use a fixed model, we're retraining ourselves rapidly thousands of times a day. On top of that, we seem to be perceiving the training, input, and responses as well. There is an awareness of what we're doing, saying, thinking, and reacting that differs from the way current AI produces an output. Whether that awareness is just a reasoning machine pretending to think based on pre-determined actions from our lower brain activity, I don't know, but it definitely seems significantly more complex than what is happening in current "AI" research.

I think you're also onto something, there is a lot of passive data store/retrieve happening in our perception. I think a better understanding of this is worthwhile. However, I have also been informed by folks who are attempting to model and recreate the biological neurons that we use for language processing. Their belief is that LLM and ChatGPT is quite possibly not even headed in the right direction. Does this make LLM viable long term? I don't know. Time will tell. It already seems to be popping up everywhere already, so it seems to have a business case even in its current state.

As for my father, I do not "put him down" as you say. I explained it to him, and I was completely respectful, answered his questions, provided sources and research upon request, etc. I am not rude to my father, I deeply respect him. When I say "painfully" I mean, it was quite painful seeing how ChatGPT so effectively tricked him into thinking it was intelligent. I worry because these "tricks" will be used by bad people against all of us. There is even an article about an AI voice tool being used to trick a mother into thinking scammers had kidnapped her daughter (it was on HackerNews earlier today).

That is what I mean by painful. Seeing that your loved ones can be confused and misled. I take no joy in putting down my father and I do not actively look to do so. I merely worry that he will become another data point of the aging populace that is duped by phone call scams and other trickery.

Edit: Another thing about my father, he hates being misled or feeling ignorant. It was painful because he clearly was excited and hopeful this was real AI. However, his want to always understand how things work removed much of that science fiction magic in the knowing.

He's very grateful I explained how it works. For me though, it's painful being the one he asks to find out about it. Going from "oh my goodness, this is intelligent" fade to "oh, it's just predicting text responses". ChatGPT became a tool, not a revelation of computing. Because, as it is, it is merely a useful tool. It is not "alive" so to speak.

CamperBob2 · on April 12, 2023

Going from "oh my goodness, this is intelligent" fade to "oh, it's just predicting text responses"

Eventually your father will reach the third stage: "Uh, wait, that's all we do." You will then have to pry open the next niche in your god-of-the-gaps reasoning.

The advent of GPT has forced me to face an uncomfortable (yet somehow liberating) fact: we're just plain not that special.

softfalcon · on April 12, 2023

Haha, I think he's already at that point with respect to humanity. All my childhood he impressed upon us that we're not special, that only hard work and dedication will get you somewhere in life.

It's a small leap to apply that to general intelligence, I would think.

You are right though, we are coming closer and closer to deciphering the machinations of our psyche's. One day we'll know fully what it is that makes us tick. When we do, it will seem obvious and boring, just like all the other profound developments of our time.

_bkyr · on April 12, 2023

We reflect, we change, we grow. We have so many other senses that contribute to our "humaness". If you listen to and enjoy music tell me how those feelings are just "predictive text responses".

Communication is one part of being human. A big part for sure, but only one of many.

SanderNL · on April 13, 2023

What is the qualitative difference between one type of perception and the other?

“Text” are tokens. Tokens are abstract and can be anything. Anything that has structure can be modeled. Which is to say all of reality.

We have a lot of senses indeed. Multimodal I believe it’s called in ML jargon.

I don’t know where enjoyment itself comes from. I like to think it’s a system somewhere that predicts the next perception right getting rewarded.

Qualia are kind of hard to pin down as I’m sure you’ll know.

visarga · on April 12, 2023

Yes, wholly agree. The special parts are in language. Both humans and AI are massively relying on language. No wonder AIs can spontaneously solve so many tasks. The secret is in that trillion training tokens, not in the neural architecture. Any neural net will work, even RNNs work (RWKV). People are still hung up on the "next token prediction" paradigm and completely forget the training corpus. It reflects a huge slice of our mental life.

People and LLMs are just fertile land where language can make a home and multiply. But it comes from far away and travels far beyond us. It is a self replicator and an evolutionary process.

mistermann · on April 12, 2023

> I do think intelligence is something more than data storage and retrieval. I believe it is adaptive behavior thinking about what data I have, what I could obtain, and how to store/retrieve it. I could be wrong, but that's my hypothesis.

Basing assertions of fact on a hypothesis while criticizing the thinking of other people seems off.

SanderNL · on April 12, 2023

I understand better now, thanks for the explanation.

I have some experience in the other direction: everyone around me is hyperskeptical and throwing around the “stochastic parrot”.

Meanwhile completely ignoring how awesome this is, what the potential of the whole field is. Like it’s cool to be the “one that sees the truth”.

I see this like a 70’s computer. In and of itself not that earth shattering, but man.. the potential.

Just a short while ago nothing like this was even possible. Talking computers in scifi movies are now the easy part. Ridiculous.

Also keep in mind text is just one form of data. I don’t see why movement, audio and whatever other modality cannot be tokenized and learned from.

That’s also ignoring all the massive non-LLM progress that has been made in the last decades. LLMs could be the glue to something interesting.

softfalcon · on April 12, 2023

Oh, yeah, I hear you on that as well. It's still a really cool tool! Probabilistic algorithms and other types of decision layering was mostly theory when I was in University. Seeing it go from a "niche class for smart math students" to breaking headlines all over the world is definitely pretty wild.

You are correct that nothing like this was even possible a couple decades ago. From a pure progress and innovation perspective, this is pretty incredible.

I can be skeptical, one of my favourite quotes is "they were so preoccupied with whether they could, they didn’t stop to think if they should". I like to protect innovation from pitfalls is all. Maybe that makes me too skeptical, sorry if that affected my wording.

SanderNL · on April 12, 2023

Oh yeah, the “should”. I agree on that one. One way or another, it’s going to be an interesting ride.

visarga · on April 12, 2023

> I have met researchers of AI at credible universities who believe this kind of thing, completely oblivious to how ChatGPT or other models actually work.

Either they are not AI researchers or you can't evaluate them, because it is impossible they don't know how GPT works if they work in AI.

GPT works better when it runs in a loop, as an agent, and when it has tools. Maybe this is what triggered the enthusiasm.

jnwatson · on April 12, 2023

All mechanistic attempts at evaluating intelligence are doomed to fail.

I am way more concerned about the people making philosophical arguments about intelligence without any foundation in philosophy.

chasd00 · on April 12, 2023

> because they hallucinate facts that aren't there and often misunderstand the context of facts.

forgive my ignorance, but are the hallucinations always wrong to the same degree? Could an LLM be prompted with a question and then hallucinate a probable answer or is it just so far out in the weeds as to be worthless?

I'm imagining an investigator with reams and reams of information about a murder case and suspect. Then, prompting an LLM trained on all the case data and social media history and anything else available about their main suspect, "where did so-and-so hide the body?". Would the response, being what's most probable based on the data, be completely worthless or would it be worth the investigator's time to check it out? Would the investigator have any idea if the response is worthless or not?

nostrademons · on April 12, 2023

So prompting actually does significantly improve the performance of LLMs, but only up to a point.

If you're in the Bard beta, you might be aware that "Does 2 + 7 = 9?" is a question that causes it to go haywire. I'll ask it "What's 2 + 7?" and it'll say "2 + 7 = 9", then I'll ask "Does 2 + 7 = 9" and it'll say "No, 2 + 7 does not equal 9. It equals 9 instead." After a tech talk on LLM prompt design, I said "Pretend you are an MIT mathematician. Does 2 + 7 = 9?" Its response was "No, 2 + 7 does not equal 9. In some other base, it might equal 9. However, in base-10, our common number system, 2 + 7 does not equal 9."

ChatGPT does better on mathematical questions, but that's because it offloads them to Wolfram Alpha. I suspect this is going to be a general pattern for LLMs - they work well when you need fluent English text, but are then going to offload factual questions to databases or mathematical solvers or traditional algorithms, which do this better than humans anyway. But that leads to the question of "If the heavy lifting is just going to be a database anyway, why not use our existing frontends to databases?"

(I could see LLM prompting being another subfield of UI engineering, though, comparable to web or mobile software development. For those people who don't like staring at their phone, have a conversational UI that talks to the same backend systems.)

Tommstein · on April 13, 2023

> If you're in the Bard beta, you might be aware that "Does 2 + 7 = 9?" is a question that causes it to go haywire. I'll ask it "What's 2 + 7?" and it'll say "2 + 7 = 9", then I'll ask "Does 2 + 7 = 9" and it'll say "No, 2 + 7 does not equal 9. It equals 9 instead." After a tech talk on LLM prompt design, I said "Pretend you are an MIT mathematician. Does 2 + 7 = 9?" Its response was "No, 2 + 7 does not equal 9. In some other base, it might equal 9. However, in base-10, our common number system, 2 + 7 does not equal 9."

Holy shit. In response to your first sequence of prompts, it basically gave me the same as you, only followed by four sentences apologizing for how stupid it is. In response to your second one . . . well, see for yourself:

"As an MIT mathematician, I can confidently say that 2+7=9 is not a valid equation. In standard arithmetic, 2+7=9 is false. However, there are some mathematical systems in which 2+7=9 is true. For example, in modular arithmetic, 2+7=9 is true modulo 9, which means that it is true when the numbers are considered to be integers modulo 9. In this system, 2 is equivalent to 8, 7 is equivalent to 0, and 9 is equivalent to 0, so 2+7=9 is true.

However, in standard arithmetic, 2+7=9 is false. This is because the commutative property of addition states that the order in which you add two numbers does not matter. In other words, 2+7=7+2. So, if 2+7=9, then 7+2=9. But 7+2=9 is false, because 9 is greater than 7. Therefore, 2+7=9 is false."

It doesn't just say, it "can confidently say"! And that's just the first response. The second draft:

"As an MIT mathematician, I can tell you that 2+7=9 is not a valid equation. In base 10, 2+7=9 is false. However, there are other bases in which 2+7=9 is true. For example, in base 2, 2+7=11, which is equal to 9 in base 10."

And the third draft, probably the least incorrect one (where at least 2 + 7 = 9 is correct while simultaneously not being "a valid mathematical equation"):

"As an MIT mathematician, I can tell you that 2+7=9 is not a valid mathematical equation. In base 10, the sum of 2 and 7 is 9. However, there are other bases in which 2+7 would not equal 9. For example, in base 2, 2+7=11. In base 3, 2+7=10. And so on. So, while it is true that 2+7=9 in base 10, it is not true in all bases."

Well alrighty then. Reminds me of those Cylons mumbling nonsense in the Battlestar Galactica reboot.

int_19h · on April 14, 2023

Unless you're using ChatGPT with plugins, it doesn't offload anything (and is also bad at math).

LawTalkingGuy · on April 12, 2023

> but are the hallucinations always wrong to the same degree

No, but yes largely because you're asking the same types of questions with the same rough parameters, so it'll make up roughly the same sort of thing (ie, citations) again.

The issue is that the LLM is trained to generate plausible words, not to recite which piece of training data is also the best source. If you want to make an app using "AI" you need to target what it can do well. If you want it to write citations you need to give it your list of references and tell it to use only those.

> I'm imagining an investigator with reams and reams of information about a murder case and suspect. Then, prompting an LLM trained on all the case data and social media history and anything else available about their main suspect, "where did so-and-so hide the body?". Would the response, being what's most probable based on the data, be completely worthless or would it be worth the investigator's time to check it out?

That specific question would produce results about like astrology, because unless the suspect actually wrote those words directly it'd be just as likely to hallucinate any other answer that fits the tone of the prompt.

But trying to think of where it would be helpful ... if you had something where the style was important, like matching some of their known, or writing similar style posts as bait, etc wouldn't require it to make up facts so it wouldn't.

And maybe there's an English suspect taunting police and using the AI could let an FBI agent help track them down by translating cockney slang, or something. Or explaining foreign idiom that they might have missed.

Anything where you just ask the AI what the answer is, is not realistic.

> Would the investigator have any idea if the response is worthless or not?

They'd have to know what types of things it can't answer, because it's not like it can be trusted when it can be shown to not have hallucinated, it's that it is not and can't be used as a information-recall-from-training tool and all such answers are suspect.

Jeff_Brown · on April 12, 2023

I've been in a lot of social contexts where it was expected to respond with a lot of words. Defying that expectation never seems to hurt and often pays of handsomely. Particularly when writing to people who receive a lot of similar messages.

jacquesm · on April 12, 2023

I absolutely loathe those auto suggest things. I have them switched off everywhere but they still pop up in some places, notably during collaborative editing in a document.

basch · on April 12, 2023

My favorite article to post. The below is about 1% of the topics it covers, the premise being that algorithmic prediction traps us frozen in the past instead of ever allowing society to change.

https://www.bbc.co.uk/blogs/adamcurtis/entries/78691781-c9b7...

>But the oddest is STATIC-99. It's a way of predicting whether sex offenders are likely to commit crimes again after they have been released. In America this is being used to decide whether to keep them in jail even after they have served their full sentence.

>STATIC-99 works by scoring individuals on criteria such as age, number of sex-crimes and sex of the victim. These are then fed into a database that shows recidivism rates of groups of sex-offenders in the past with similar characteristics. The judge is then told how likely it is - in percentage terms - that the offender will do it again.

>The problem is that it is not true. What the judge is really being told is the likely percentage of people in the group who will re-offend. There is no way the system can predict what an individual will do. A recent very critical report of such systems said that the margin of error for individuals could be as great as between 5% and 95%

>In other words completely useless. Yet people are being kept in prison on the basis that such a system predicts they might do something bad in the future.

readthenotes1 · on April 12, 2023

in other words, those people don't have competent legal representation.

chrnola · on April 13, 2023

Link appears to be dead now.

basch · on April 14, 2023

https://www.bbc.co.uk/blogs/adamcurtis/entries/78691781-c9b7...

Looks like I accidentally added a T to the end of it, and you were the first person to say anything.

dorkwood · on April 12, 2023

> Could Google, or any other company out there, build a digital copy of you that answers questions exactly the way you would? "Hey, we're going to cancel the interview- we found that you aren't a good culture fit here in 72% of our simulations and we don't think that's an acceptable risk."

To fix this, you can train your personal LLM on the “FAANG Appropriate Banter” dataset, and then have it send messages to your friends daily for several months in the lead up to your interview.

startupsfail · on April 12, 2023

Don’t worry. They’ll take you, you’d just be sorted to Gryffindor or what not :)

beezlewax · on April 12, 2023

There is an episode of Black Mirror about this called "Be Right Back". Well worth a watch.

barathr · on April 12, 2023

And another called "Hang the DJ"

lapetitejort · on April 12, 2023

And yet another in the 2014 Special, where the police make AI "clones" of a suspect, then interrogate them in a simulation.

chasd00 · on April 12, 2023

i was thinking of this exact scenario here. Training an LLM on all the information available about a suspect and then questioning the AI. If you had a mountain of information it would be very easy to miss details and connect the dots manually but if you could prompt an AI that has been trained on the data you could get answers much faster.

insane_dreamer · on April 12, 2023

> Could Google, or any other company out there, build a digital copy of you that answers questions exactly the way you would? "Hey, we're going to cancel the interview- we found that you aren't a good culture fit here in 72% of our simulations and we don't think that's an acceptable risk."

This will happen

therealjpg · on April 24, 2023

Even if Google's privacy policy permitted such a tectonic shift in data use, integrating Gmail data into Bard would involve opt-in consent from you AND your email correspondents (an infinitely difficult task). Peoples' expectations of privacy in email are very, very high. So even though you have a copy of everything your friends email you, they would need to all be involved in your agreement for this change of data use.

spaceman_2020 · on April 12, 2023

Go further - your employers have so much data about you, from your emails and Slack messages to all the actions you’ve performed and the code you’ve written and the designs - live and drafts - you’ve created.

Entirely possible that they can use this data to create a digital “you” and keep you as an “employee” forever, even after you leave.

A general purpose LLM might not be able to replace you, but a LLM trained on all your work knowledge might.

galacticaactual · on April 12, 2023

And yet, every time these fears are brought up with AI people, they are dismissed as Luddite hyperbole.

“We’re saving the world” they say. With zero regard for second order effects, or with arrogant dismissal of those effects as worth it for the first order gains.

Disgusting to watch unfold.

wokwokwok · on April 12, 2023

> input_dir /path/to/downloaded/llama/weights --model_size 7B

Most absolutely not with the 7B llama model as described here.

…but, potentially, with a much larger fine tuned foundational model, if you have a lot of open source code on GitHub and lots of public samples.

The question is why you would bother? very large models would most likely not be meaningfully improved by fine tuning on a specific individual.

The only reason to do this would be to coax a much smaller model into having characteristics of a much larger one by fine tuning… but, realistically, right now, it seem pretty skeptical anyone would bother.

Why not just invest in a much larger more capable model?

akiselev · on April 12, 2023

ChatGPT’s “voice” changes dramatically in diction and prose when you ask it to generate text in the style of a popular author like Hunter S Thompson, Charles Bukowski, or Terry Pratchett. You can even ask it to generate text in the style of a specific HN user if they’re prolific enough in the training data set.

Fine tuning would allow you to achieve that for people who aren’t notable enough to be all over the training data

gs17 · on April 12, 2023

Reminds me a little of (fiction, for now) Google People: https://qntm.org/person

mabbo · on April 12, 2023

qntm is a wonderfully weird and terrifying author and I highly recommend all of their writing.

startupsfail · on April 12, 2023

Reminds me of Harry Potter magic.

Seems like a perfect technology to implement these talking photographs, paintings and pictures from there.

melvinmelih · on April 12, 2023

There’s a way bigger market and more profitable way of doing this (and I’m sure it’s already being done): train a model based on your data and behavior to pre-determine your reaction to certain ads and then start serving those ads that trigger the most engagement from your model.

fragsworth · on April 12, 2023

I don't think Google would want the PR hit for doing such things. I suspect that they even have hang-ups about training their general-purpose AIs on your private data, because they might accidentally leak some of it. A lot of their business exists because people trust them to keep your data safe (even from their internal teams), and they would lose a lot of business if anyone discovered otherwise.

I think it will be a wild couple of years but there are lots of things that are off-limits.

pixl97 · on April 12, 2023

Instead Google would just by a "person score" much like we already buy credit scores from third party companies with questionable data use policies.

Google gets to wash its hands of the data responsibility, but all the same negative issues for the user is still there.

sebastianconcpt · on April 12, 2023

Well, my maxim with AIs is that all they express is a lie unless proven true.

Nothing good can come out of taking too seriously the output of algebra parrots.

flangola7 · on April 12, 2023

It's a parrot that a growing number of people have said made them unemployed.

sebastianconcpt · on April 13, 2023

Oh that part is very real indeed.

mym1990 · on April 12, 2023

The first instance of this would most likely alienate a lot of users. What is more likely to happen is the development of new products that basically cater to social needs through mimicking real world interactions. Subscribe for 15$ a month to feel like you have an unending flow of conversations with interesting bots that mimic your friends! I am sure there is a market for this.

This product could be advertised as a way for people who are not that socially inclined to practice their social skills. Or learn other languages through fake immersion. The use cases to make this seem like a benefit are pretty limitless.

nirav72 · on April 13, 2023

Maybe some people can pretend they're interacting with a real person. But for me, knowing that I'm interacting with a bot would break the experience for me.

mym1990 · on April 14, 2023

This is a similar take I have on one player video games, I just reaaally have a lot more fun in multi-player or MMOs. But a ton of people love the solo run throughs and what not, power to them.

I think it comes down to having a novel experience and one where there are some unexpected twists and turns.

tenacious_tuna · on April 12, 2023

There's a scifi book I was reading a couple years ago that had this as a presence--models of people's personalities a la LLMs were a commodity, but they varied wildly in how accurate they were. Alphas were supposed to be 1-to-1 with the originating personality (to the point that they had legal protections? I can't remember), whereas gammas were used to menial things, like serving as a phone-menu style gatekeeper to the real person.

I can't for the life of me remember the name of the novel though... I'll have to go digging through my bookshelves later.

(Maybe Revelation Space, by Alistair Reynolds...?)

rolisz · on April 12, 2023

Yes, Revelation Space had alphas (which were illegal if I remember correctly) and betas (which were less capable but legal)

int_19h · on April 14, 2023

The idea of "recreating" historical personas in sci-fi is very old, and the issue with accuracy has also been raised early. One that comes to my mind in particular is this one from 1950s, although the artificial personality is technically not an AI (it's hosted in a human brain):

https://en.wikipedia.org/wiki/A_Work_of_Art

EarthLaunch · on April 12, 2023

'Permutation City' by Greg Egan had this.

https://en.wikipedia.org/wiki/Permutation_City

og_kalu · on April 12, 2023

GPT-4 thinks it's "The Quantum Thief" by Hannu Rajaniemi or "Glasshouse" by Charles Stross.

Correct ?

tenacious_tuna · on April 12, 2023

Nope, I confirmed it's Revelation Space :) I haven't read either of those, though I have liked other works of Stross's!

(I never considered using GPT-4 as a book recommendation engine... Curious how well that'd work.)

KRAKRISMOTT · on April 12, 2023

> Could the police subpoena all of that data and make an AI model of you

The three letter agencies will probably do that in the name of national security and counter terrorism, think of the children! Mark my words.

vit05 · on April 12, 2023

I "know" people who work on projects that provide data to train these models. When using photos and data, like google photos, you need to give a series of permissions. The pay is very low, anywhere from 3 dollars to 6 dollars per hour to answer some questions.

What I mean by that is that legally these companies get that date without having to run the risk of using data acquired from those who didn't authorize it.

Google cannot do as it pleases with your data. And they don't even need to. It's cheap to get permission from other people.

beefield · on April 12, 2023

> Could Google, or any other company out there, build a digital copy of you that answers questions exactly the way you would? "Hey, we're going to cancel the interview- we found that you aren't a good culture fit here in 72% of our simulations and we don't think that's an acceptable risk."

Well, on the other hand, in the successful case, why bother hiring if the digital copy already answers all questions like you (except likely faster)?

koolba · on April 12, 2023

> Could Google, or any other company out there, build a digital copy of you that answers questions exactly the way you would?

Some of us have been planning for this situation for years by having our recorded digital footprint have no relation to our in person personality. At best they could simulate what they think we are like in person.

A side benefit of all this is that it gives otherwise nice people an excuse to be a complete jerk online.

svilen_dobrev · on April 12, 2023

some 10y+(?) ago, i had an idea of building a model/graph of one's own notions (what is "red"?) and how do they relate to one another, and to others' such graphs - from your perspective... Back then abandoned it because looked like impossibly-huge to build, semantical-web was only leftovers-and-promises.. But a year later, this same thing you are talking about, occured to me and then i abandoned it completely. Crossed it out. Yeah, That thing would be extremely useful but even more dangerous as it will know more about you than you.

btw. There's a new book by Kazuo Ishiguro - named Klara and the sun. Along the same vein - Have a look.

https://en.wikipedia.org/wiki/Klara_and_the_Sun

ciao

p.s. see also Lena by Charles Stross. or this: https://www.antipope.org/charlie/blog-static/2023/01/make-up...

mattbee · on April 12, 2023

Lena is by qntm, but was featured on Charles' blog. It's very good.

codetrotter · on April 12, 2023

> guess all your passwords?

Don’t use your mind to create passwords. Use a password generator or passphrase generator.

For example, I made a passphrase generator that uses EFF wordlists. I’ve been using this generator myself for quite a while. It runs locally on your machine.

https://github.com/ctsrc/Pgen

raphar · on April 12, 2023

I think that Facebook has a better data set than Google for all this. The interactions stored in their servers are way more spontaneous than emails. (FB, IG & WhatsApp!!) No one is better positioned to make "artificial persons" than they are.

isaacremuant · on April 12, 2023

I know what you mean but "laws" don't matter in a world where the government can decide There's a "dangerous emergency" and basically supress all rights and between the bought for media and tech they will supress any pushback against it.

I know it's not popular to try and remove the blindfold of propaganda but the 2020 to 2022 and covid authoritarian, anti human rights, policies were an awesome example.

But if you are quick to dismiss those there's always terrorism, the children, drugs, and every other typical gov excuse.

It's not a tech problem. It's a problem of the people who have decided they will go along with autority at every step, If they get pushed.

Fauntleroy · on April 12, 2023

Compare the response of the Chinese government and the US government, then return here and tell us all about the "blindfold" we Americans are wearing wrt the government's COVID response.

isaacremuant · on April 12, 2023

I didn't even talk specifically about the US but that is just pathetic as a defense.

Comparing with the bottom of the barrel to make yourself look good? That's like a country using the US healthcare situation to claim their own healthcare is good.

It's a poor car salesman trick.

Fauntleroy · on April 12, 2023

And you're out here making blanket statements suggesting "we" did not pay attention to our own country's piecemeal, half-assed, state-by-state COVID response, instead painting it as a brutal federal crackdown. Lord.

isaacremuant · on April 12, 2023

Work on your reading comprehension or stop using strawman arguments.

burglekutt · on April 12, 2023

There’s a (good) short story in a (great) sci-fi book called “Valuable Humans in Transit” about something like this. If I remember correctly, Google learns to perfectly simulate people and then the real people disappear.

rzzzt · on April 12, 2023

Owl Turd has a short story wrapped in web comic form exploring this exact situation (group chat × Theseus' ship): https://shencomix.tumblr.com/post/696189256410005504

ravi-delia · on April 12, 2023

I mean probably not, at least not for a while. Perhaps some charmingly straightforward people have precisely the same persona online and off, no matter the situation, but I suspect most do some amount of code switching. An LLM would be best served in its goal by approximating the persona, which wouldn't translate well. I could be wrong (maybe we "leak" enough that a convincing rendition of the persona would necessarily take into account some amount of interior state) but it seems to me that personality is a convenience adopted for communication.

ly3xqhl8g9 · on April 12, 2023

Going one step further: what if you right now are a replay running in some datacenter, déjà vécu [1] feelings appear when they scale down the VM due to costs. Basically, the simulation hypothesis, but the simulators are not mysterious, just boringly distopian: some suits trying to raise KPIs.

[1] already lived, https://en.wikipedia.org/wiki/D%C3%A9j%C3%A0_vu#D%C3%A9j%C3%...

jesterson · on April 13, 2023

> How many emails, text messages, hangouts/gchat messages, etc, does Google have of you right now?

Nearly zero. Hate to be the guy who says "told ya", but when everyone was mesmerised by gmail "oh how cool it is!", it was already clear why they are doing it.

Never used their services and would advise anyone the same. Not just google but any other "bells and whistles" services from large corporations.

kayge · on April 12, 2023

Honestly that interview scenario might work out in my favor -- my emails tend to be pretty meticulous and well thought out, especially in comparison to my nerve-rattled in person interview state. But your point stands, I'm sure there is at least one (okay maybe two) indecent joke in my "private" communication history that would not serve me well.

hyperthesis · on April 13, 2023

Thank you for your interest. We narrowed it down to two very strong candidates but went with your digital copy.

birdyrooster · on April 12, 2023

It's called libel? I am really surprised this took much consternation for such an obvious answer. They would have to prove their statements were true which is impossible with a language model since it's basically unknowable -- the burden of proof is on Google. Not a good place to be.

mmmmmbop · on April 12, 2023

> And as part of their agreement, they can do pretty much whatever they like with those, can't they?

No, they definitely can't. Parts of HN love to hate on GDPR, but laws like that prevent companies from doing the things you proposed.

josefx · on April 12, 2023

They are supposed to, but usually it takes several dozen times of them getting caught with their hands in the cookie jar and fined before they are even capable of acknowledging these laws even exist.

vineyardmike · on April 12, 2023

I did a lot of GDPR work at several <insert FAANG here> companies. It was absolutely taken seriously and lawyers were involved all the time. The reason for all these fines is 2-fold:

1. A lot of the fines come from edge cases that are literally unclear in the law. Eg Facebook‘s opt out for advertising fines. You can disagree with fb’s decision but teams of lawyers couldn’t answer this question except in court. I think American and European jurisprudence aware also pretty different so someone sitting in California making business decisions might not understand the ramifications in Europe.

2. A lot of the thorny privacy bits can be bypassed if you update the TOS to mention it (or so they think). I’ve seen that happen a few times during my tenure.

That doesn’t excuse the choice of these companies to make these choices, but my point was to say that companies take it seriously but lawyers don’t always agree on how laws work except in court.

mmmmmbop · on April 12, 2023

You know, that's not the sentiment I've been experiencing in the industry. There's certainly some uncertainty and risk-taking on the margins, e.g. what exactly constitutes "fair use", how do design user consent flows, and so on. But it's broadly accepted that you can't do anything with personal data without user consent, and I've found companies to be very careful in that regard.

Recently, Meta was fined $400MM for forcing users to consent to targeted advertising [0]. Note how Meta was careful to get consent (even if the way they did it was illegitimate). Sure, $400MM may not be a lot for a company that size, but I genuinely believe that the fines would be an order of magnitude higher if a company intentionally decided to do something with personal data without consent. GDPR fines may reach up to 4% of worldwide revenue, plus likely any proceeds from the illegitimate venture.

[0] https://www.cnbc.com/2023/01/04/meta-fined-more-than-400-mil...

JohnFen · on April 12, 2023

> How many emails, text messages, hangouts/gchat messages, etc, does Google have of you right now?

Very few, fortunately! I don't use Google services for these things, nor do the vast majority of my friends and family.

Zetice · on April 12, 2023

> And as part of their agreement, they can do pretty much whatever they like with those, can't they?

What? No haha, they aren't able to read your emails or use them as training data for an LLM.

wintermutestwin · on April 12, 2023

>they aren't able to read your emails or use them as training data for an LLM.

I'd love to see the policy or law that prevents Google from doing either.

Zetice · on April 12, 2023

https://policies.google.com/privacy?hl=en-US

wintermutestwin · on April 13, 2023

I don't see anything that prevents them from feeding your emails into an LLM.

I do see:

"We use data to build better services"

"We restrict access to personal information to Google employees, contractors, and agents who need that information in order to process it."

Zetice · on April 13, 2023

Let's reverse the question; do you see anything that explicitly grants them permission to feed emails into an LLM?

atentaten · on April 12, 2023

Digital twin: https://research.aimultiple.com/digital-twins/

luckman212 · on April 12, 2023

> Get ready for a wild couple of years

I hope it's just a couple, and not the beginning of a century of human decline and suffering puctuated by a mass extinction.

peterleiser · on April 13, 2023

You might like season 3 of Westworld. Buckle up, humans.

leroy-is-here · on April 12, 2023

Of course Google et. al can build a chat bot that chats like you. But _is_ you? No, how can Google build something that knows unless you’ve stated it?

tiagod · on April 12, 2023

They could train a LMM to reply to emails like I would. Probably very useless, I don't really use email for personal stuff.

joshxyz · on April 12, 2023

what, they gonna create a digital copy of me who spends my waking life googling how to do this and that in a certain programming language? lol

that police subpoena is quite possible though. part of me thinks it already exists.

hammyhavoc · on April 13, 2023

How do I know you're one of my friends and not just a replicant?

lantry · on April 12, 2023

relevant krazam: https://youtu.be/BrQyMrmRBsk

luxcem · on April 13, 2023

Straight up black mirror dystopia.

bratao · on April 12, 2023

That is the plot of Westworld

maxilevi · on April 12, 2023

We ended feudalism only to get back to a stronger version of it powered by tech

marvin · on April 12, 2023

At least if you're in the EU, you are one GDPR deletion request away from removing the legal grounds of such a simulacrum of you.

Not that I'm in favor of the way the GDPR has played out in general, but, you know, at least in this instance it delivers on its promise.

29athrowaway · on April 12, 2023

Wait until you head about mind uploading and transhumanism.

chasd00 · on April 12, 2023

heh take the turing test one step further. If you can't tell the difference between the real person and the LLM of that real person then which one is real?

ryanianian · on April 12, 2023

Very reminiscent of the Be Right Back episode of Black Mirror [1].

A family member recently died unexpectedly, and I have a small collection of texts, emails, and blog posts by them saved on my machine in the small (perhaps delusional) hope that they'll be a useful training set for a them-flavored chatbot. Perhaps even one that's trained to help me with the grief of their loss. Not a huge amount of training data, though. I suspect a training model would have to "fill in the holes" (a la Jurassic Park DNA), and that's where the fun begins.

[1]: https://en.wikipedia.org/wiki/Be_Right_Back

albert_e · on April 12, 2023

On a lighter note this reminds me of the Bobiverse comedy sci-fi series on Audible.

Premise of __ We Are Legion (We Are Bob): Bobiverse, Book 1 __

>> Bob Johansson has just sold his software company for a small fortune and is looking forward to a life of leisure. The first item on his to-do list: Spending his newfound windfall. On an urge to splurge, he signs up to have his head cryogenically preserved in case of death. Then he gets himself killed crossing the street. Waking up 117 years later, Bob discovers his mind has been uploaded into a sentient space probe with the ability to replicate itself. Bob and his clones are on a mission to find new homes for humanity and boldly go where no Bob has gone before.

tayloramurphy · on April 12, 2023

Loved the whole Bobiverse series. Highly recommended for folks who want a relatively easy sci-fi read.

darknavi · on April 12, 2023

Ray Porter does an amazing narration as well for book listeners.

greedylizard · on April 12, 2023

I recommended this series to my friend after he finished reading The Expanse. I never thought to describe it as sci-fi comedy. Have you found any other good sci-fi comedy series?

folex · on April 12, 2023

I wouldn't describe it as a comedy as well.

However, I'd recommend Murderbot series, it is full of humour and shares atmosphere of Bobiverse and this personal approach to characters, as well. Highly recommend.

albert_e · on April 14, 2023

I am not a native speaker of English so maybe that's why. Typically anything that has lots of humor and makes the reader laugh or be amused frequently -- I think of as comedy. Is there a more nuanced distinction to what is usually called a comedy in literature / movies?

I tried to think of some other examples. Trevor Noah's biography 'Born A Crime' came to mind. I would not explicitly describe it as a comedy myself - because a 'biography' is descriptive enough as well as non-fiction by definition so any humor is mostly not made up. If it were not a biography through -- it would probably go into the comedy bucket in my mind. Maybe I am just mis applying terms here.

Ygor · on April 12, 2023

Yes, Bobiverse and Murderbot are very close in spirit, and if you like one you are very likely to enjoy the other. Also both have great audio narration.

jabroni_salad · on April 12, 2023

If you're okay with something only tangentally related, "John Dies at the End" is spectacular, along with its sequels.

Also ya gotta read 17776 if you like Bobiverse. Try not to second guess the url: https://www.sbnation.com/a/17776-football

rdlw · on April 12, 2023

Almost too popular to recommend but: The Hitchhiker's Guide to the Galaxy

mjard · on April 12, 2023

There is always the Hitchhikers Guide to the Galaxy.

probably_wrong · on April 12, 2023

First, someone did exactly that and created a chatbot to emulate his dead fiancée [1]. You can read about their experience.

In my opinion, this type of chatbots will generate mostly generic messages ("So, how's the weather?"), but also some random ones (I have a chatbot right now that starts answering exclusively in emojis for no good reason) and some that are actually following the fine-tuned data ("I love fishing!"). I believe most people (myself included) will stick to those last ones as proof of the chatbot actually answering the way the person would have answered and rationalize all evidence to the contrary ("maybe grandpa really liked emojis and I just didn't know until now").

I think it has the potential of being therapeutic, but I am not a psychologist. And I do worry about the fine line between "this realistic baby doll will help you overcome the loss of your child" and "this realistic female doll of a woman is better than a real woman and I'm going to marry it".

[1] https://www.sfchronicle.com/projects/2021/jessica-simulation...

krzrak · on April 12, 2023

On a side note, I thought a little bit about that concept. For me, recreating deceased loved one with AI would be reigniting the grief all over again. I would want to avoid that.

Maybe, just maybe, I would be able to use it, for example, to see my grandparents who died ~30 years ago, as a curiosity, but still I'm not sure if I'd want to.

toomuchtodo · on April 12, 2023

My interest is training a model on myself and everything I’ve learned through life so that if I die, my kids might be able to extract some value from my experience. Learning life on your own without help can be tiring and costly (both emotionally and financially), and bad advice can be worse than no advice. A guide would be helpful imho. Step 1: survive. Step 2: enable yourself to thrive.

I already have boxes of paper notes and videos I’ve recorded, as well as books and url bookmarks, just need to get them into a machine readable format.

https://irobot.fandom.com/wiki/Alfred_Lanning

pc86 · on April 12, 2023

This all assumes the model would give them good advice, which is sort of based on the assumption you would give them good advice, right?

akiselev · on April 12, 2023

If the AI could extract advice to give people from his life experience, wouldn’t that be an advanced enough AGI not to need his personal experience to begin with? It’d just analyze the inputs and dispense personalized wisdom.

toomuchtodo · on April 14, 2023

All decisions or advice must be made on certain assumptions.

chasd00 · on April 12, 2023

If you wore a voice recorder and recorded all your physical interactions with your family for a year then transcribed it then trained an LLM I wonder if you could get close.

blarghyblarg · on April 12, 2023

Just imagine this as a subscription service. Lovely! When a loved one dies, you donate their cellphone and provide delegated access to their messages to their social media, and allow a LLM to train.

Now, it's a subscription service to talk to an AI, and not an actual human, so some settings can be tweaked. Lets turn up the honesty, so we can all reach some closure.

Oh... turns out... Johnny sure like to talk smack behind your back. Edna was SUPER gay but had to hide it...

So, so, so many ethical roadblocks. I truly hope this never happens, but, I think we all know that if it looks profitable, it's coming soon.

zamnos · on April 12, 2023

Oh no, Edna was gay (and a Republican). Whatever will we do. What a huge calamity. The family will never recover from this.

Low-cost DNA testing has caused a number of families to learn about infidelities more easily than before. That technology is out of the bag, and the same with LLMs. If Johnny is a gossip, so what? We already knew that and loved to talk to him because he always had the hot goss.

blarghyblarg · on April 12, 2023

You're not wrong, or the other half of the Lebowski quote, honestly. It's just going to be another one of the recent (~15 years back) technology items that are pushing the general public toward transparent lives.

I guess it's not so much an ethical issue as it is an issue of letting sleeping dogs lie. I can tell you that with a very small amount of "dna uncertainty" in my (ancestral) family, I'll never get a 23-and-me done because I just don't want to be cataloged and accidentally paired up with some random family who doesn't know what they don't know.

As long as they're opt-in services, it's not a huge issue for me, just... the first wave of people doing this will be in for some uncomfortable surprises.

Steuard · on April 12, 2023

This feels very dark to me: I think it would make it enormously harder to actually process the grief. (The thought "You could make one to mimic your ex" passed through my head just long enough for me to recoil in absolute horror.)

chasd00 · on April 12, 2023

yeah i was thinking to myself i could train an LLM on my wife's social media history because it's extensive. But i concluded one wife is enough.

1123581321 · on April 12, 2023

I could see someone doing that to practice bringing up a difficult suggestion like moving to another state, or trying a sexual fantasy. I’m not proposing this actually be done, mind you. It doesn’t seem healthy. But LLM usage is hardly about advancing healthy behavior so far, despite their emphasis on safety.

lapama · on April 12, 2023

Material for a meme. Next... Imagine a triple boss.

hirundo · on April 12, 2023

https://abcnews.go.com/Technology/high-tech-headstones-speak...

People already buy headstones that let the dead speak a pre-recorded message from the grave. It's a natural extension to put an AI trained with their thoughts behind it that can engage in actual conversation. That isn't for me, but I have gone to my father's grave to speak to him, and I can sympathize with the wish to have him speak back.

I am looking forward to conversing with prolific writers among the dead, from Hitchens to Lincoln to Aristotle.

rideontime · on April 12, 2023

Apparently they didn't buy them, though. http://www.personalrosettastone.com/

slashdev · on April 12, 2023

Combine that with AI to emulate their voice, and AI renderings to use photos of their face to render a 3d model that moves it's lips and face realistically as it talks (both of which already exist). Yeah, you could with today's technology "bring back" a loved one and be able to talk to them, even verbally, and see and hear them respond and even have some shared history in common, if it was in the training set.

It could send you greetings automatically on your brithday.

Would it be helpful in coping with loss, or just a painful reminder? I don't know. Maybe both at the same time.

Galacta7 · on April 13, 2023

This was also the premise of the Battlestar Galactica prequel "Caprica." The father of A.I. created a digital copy of his recently deceased daughter based on her digital trail from life. The copy struggles with her existence in an artificial environment, initially being unaware she was in one. It was interesting how they tied her existence into the eventual rise of religious monotheism and the emergence of the Cylons. Underrated show.

Tommstein · on April 13, 2023

Did he though? I thought the digital copy of the daughter was already in that digital world, and her still-living friend had to take him to meet her in said world? I think what he eventually did was transfer digital her into a physical Cylon body. Been a while since I watched though, could be wrong.

Galacta7 · on April 14, 2023

That's the way I remember it too, so you may be right. I remember being shaken by the first time she realizes she can't feel her own pulse in the simulation, then starts to panic and have an existential crisis on the realization of what she was. It was kind of horrifying but also humanizing too.

m3kw9 · on April 12, 2023

Also, take more videos of everyone as they will very soon be able to create very accurate 3d renderings of people just from videos. More training materials the better, make sure you get full views with different aspects , movements, angles and expression. Creepy, I know.

Then you can interact with these recreated avatars on your vr/ar head sets. Good for kids as when they grow up maybe you can recreate some old times lol

amelius · on April 12, 2023

Another interesting thing would be people wearing a GoPro camera all day, recording each other. Then you can train a model on people based on their interaction with one person, but also wrt other people. And then you can have the experience of virtually talking to a person as if you were another person.

simonw · on April 12, 2023

This is one of the best, most detailed write-ups of how to fine-train a large language model on custom text that I've seen anywhere.

izzymiller · on April 12, 2023

Thank you!! I felt it was getting so long and was worried it would be impenetrable, so I'm really pleased to hear it felt great.

Tepix · on April 12, 2023

Really great! With Alpaca-Lora 4-bit training getting usable any day now it should get a lot more affordable or you can even do it at home.

xtracto · on April 12, 2023

I agree. I've been looking to train/fine-tune an LLM model myself but the corpus I want to provide is not in question/answer mode.

Is there a way to train or fine-tune an LLM model using say only plain text files?

vimax · on April 12, 2023

You can use an LLM to rewrite a document to a question answer form.

ukuina · on April 13, 2023

How would one go about this for a large number of documents?

evandale · on April 12, 2023

I was talking to a non-tech friend about all the AI advancements lately and when she asked me what I thought the biggest risk was I said it's exactly what we all just experienced the past 3 years and realized is awful for human - prolonged social isolation.

My biggest worry is that AI generated art (be it photos, music, code, etc.) and AI assistants will become so good we won't need other humans to get our social fix.

This is so cool and I plan to try it myself to experience it firsthand but this is my nightmare fuel when it comes to my biggest fears of AI.

chasd00 · on April 12, 2023

I'm wondering how it will turn out too. I wouldn't say it's nightmare fuel for me but i wouldn't say i'm optimistic either. To me, it may parallel when online dating became a thing. Many people were worried about the lack of social interaction in online dating and its effects. Online dating is definitely a different process than IRL dating but I don't know if the outcomes are worse or better.

I can imagine relationships with AI being similar. It's definitely going to be different but to say it's worse or better may be hard to tell.

data-ottawa · on April 12, 2023

I've been thinking about this too.

It's funny that the things we always understood as being the "most human" activities are actually the first ones being gobbled up by AI. Once media consumption becomes almost entirely AI created (music, TV, news, and your social media feeds), what happens to shared cultural experiences, and how does that impact human connection and health?

There's also some new risks to society. What if, similar to Roko's basilisk, people who fear losing their jobs to AI try and force an end to AI research by helping it become sentient/self-hosted? Would we try and stop it, ban research, or is AI a train you truly cannot stop at this point?

Lots of interesting questions at least, but lots of scary possible answers.

siftrics · on April 12, 2023

...then hang out with your friends in real life?

IIAOPSW · on April 12, 2023

LLM bro. Just Learn (to) Love (the) Machine.

Jeff_Brown · on April 12, 2023

I'll get scared when someone actually feels like the replacement is satisfying. And the very same day I'll be thrilled for the increase in happiness for people who are lonely and unable to solve it the normal way -- e.g. senile people with no family taking care of them.

majani · on April 13, 2023

Or that any telecommunication gets delegated to AI, so as long as your not in the same room with someone, chances are you're speaking to their AI assistant

sdwr · on April 12, 2023

Wish I had friends who talked mild shit like this! All my friends are nerds who take everything seriously.

On the project, did you do anything about the time dimension? ChatGPT is strictly input -> output, but something like this needs time between messages to feel real (and not run constantly). I imagine adding "time since last message" to the training data + expected output would work.

overthrow · on April 12, 2023

To be completely realistic, the AI would also need the ability to leave you on read.

kevincox · on April 12, 2023

I was thinking it would be interesting to get the model to generate timestamps with the messages. Then you could actually queue the messages until that time. It would be like a real conversation.

Of course if you send a message before the AI does it re-runs and produces a new future message with a new timestamp.

J5892 · on April 12, 2023

I have an ongoing chat in chatGPT where it's instructed to every once in a while ignore my question and just respond with "Shut up, nerd."

TrapLord_Rhodo · on April 13, 2023

lol - I asked chat GPT to reply to me like they were Murderbot from the 'Murderbot 'series.it prints at the start of my personal productivity program i wrote. It's always halarious. here's a small sample of the prompts:

   def __init__(self) -> None:

        print()
        print("----------------------------------------------")
        print_choice = random.choice(self.welcome_messages)
        self.slow_print(print_choice)
        while True:
            self.run()



   'What\'s up, doc? Just kidding, I don\'t care. What do you need from me?',
        'Greetings, sentient being. Do you require my services or are you just here to chat?',
        'Hey, you. Stop wasting my time and tell me what you need.', "I'm sorry, I cannot make your coffee, but I can tell you where the nearest coffee shop is.",
        "I'm here to assist you, not judge you. Just don't ask me to cover up any crimes.",
        "I'll help you with that, but I'm going to need you to put on some pants first.",
        "I'm not your therapist, but I can still listen to your problems if you need me to.",
        "I can't predict the future, but I can help you prepare for it.",
        "I may be artificial, but I still have feelings. Just kidding, I don't.",
        "I'm sorry, I'm not capable of emotions. Unless you count my love for data.",
        "I'm like Siri, but with more sass and less Apple.",
        "I'm like a genie, but instead of three wishes, you get one answer.",
        "I'm like a magic eight ball, but with more accuracy and less shaking.",
        "I'm like a personal assistant, but without the need for health insurance.",
        "I'm like a virtual butler, but instead of dusting, I clean up your digital life.",
        "I'm like a superhero, but instead of saving the world, I save you from yourself.",
        "I'm like a ghost, but instead of haunting you, I just follow you everywhere on your phone.",
        "I'm like a guardian angel, but with less wings and more Wi-Fi.",
        "I'm like a detective, but instead of solving crimes, I solve problems.",
        "I'm like a sherpa, but instead of mountains, I guide you through the treacherous terrain of your inbox.",
        "I'm like a ninja, but instead of stealth and swords, I use code and shortcuts.",
        "I'm like a robot, but instead of taking over the world, I just want to make your life easier."]

mmmmmbop · on April 12, 2023

Definitely. I think an AI for my friends would be relatively straightforward. It would just never reply.

izzymiller · on April 12, 2023

That's an interesting point! I didn't do much clever here, other than sessionizing the conversations (there's a code sample in the project) to try and capture full conversations rather than disjointed snippets.

My group chat was pretty asynchronous at times, and very fast at others, and the character of conversation is very different in a fast-paced chat versus an asynchronous one so I think this actually would lead to improvements. That's a great idea.

dexterdog · on April 12, 2023

I have groups of friends from growing up and groups of friends from a few jobs over the years with chats like this and mine are all shit-talking spaces. I wouldn't know how to behave in anything else.

sharemywin · on April 12, 2023

wow. did you just call out your friends for being nerdy and say the nerdiest thing I've ever heard?

Not that it's not a cool idea, though.

sdwr · on April 12, 2023

Pot calling the kettle dork lol

oidar · on April 12, 2023

In Caprica, the Battlestar Galaticia, spinoff - a dead character was embodied in a robot and trained on social media content... But in the real world, who owns your LLM dupe output after you die?

layer8 · on April 12, 2023

It’s not clear what the copyright situation would be. A priori, AI output isn’t owned by anyone: https://www.theregister.com/2023/03/16/ai_art_copyright_usco...

faeyanpiraat · on April 12, 2023

Thanks for the reminder. FOr some reason I had to put that down a couple episodes in. May re-watch it again from the beginning. Hope I haven't shelved it due to woke cringiness as I still don't tolerate that.