We hacked Google A.I.

epolanski · on March 6, 2024

This blog post gave me a great deal of self confidence.

While I have no doubts how good the author and his friends are, all of their ideas were quite intuitive and simple to understand.

The kind of "I could've come with the same idea" type. Realistically I would've not for many reasons but it is still stuff I can grasp and even gives me ideas while reading.

Which is different from the general hacker idea I have of someone in a basement exploiting extremely far fetched and hard to grasp for me memory corruptions in some cache dumping some random bytes like the very complex attacks like Spectre I've read about.

It also makes me think that if most of the applications I have worked on haven't been attacked and easily exploited is because honestly nobody bothered.

dylan604 · on March 6, 2024

> It also makes me think that if most of the applications I have worked on haven't been attacked and easily exploited is because honestly nobody bothered.

This is my view of the things I create as well, and the fact they are not released to the public and are not generally public facing. Building internal tools does have a bit of freedom. However, I do things to the best of my knowledge "best practice" and don't intentionally do stupid things just because. But it is rather reassuring knowing that it's not that exposed to show how small "best of my knowledge" really is

beaned · on March 7, 2024

Completely agree with you. However, I think over time I've come to realize that those hacks that seem obscure, weird, and impossible are not perceived that way by the people who discover them. It's just their area of expertise, their natural playground. And so maybe those exploits are as easy to understand by others in that field, as this blog post is to you.

xanderlewis · on March 7, 2024

Yes. Things always look way harder from the outside.

Also, the fact that clarity of exposition is interpreted as triviality is why people are sometimes compelled to write things in a way that deliberately obscures the content — one doesn’t want to risk explaining it too well and having the reader think “well, I could’ve come up with that!”.

drakythe · on March 6, 2024

The general hacker idea you have is... not reality.

anon_cow1111 · on March 7, 2024

At this point, I'm convinced that 90% of modern hacking is reliant on

username: admin

password: password

dartos · on March 7, 2024

The other 10% is just literally asking for their password

rokkitmensch · on March 7, 2024

More precisely, it's a Burp Suite plugin that tests for that combo, and users have no idea because it's one of a million didn't things they also have no idea are running.

barkingcat · on March 7, 2024

this is even too advanced. 99% of modern hacking is :

username: admin

password:

akira2501 · on March 6, 2024

The best lock pickers spend a lot of time making their own locks.

doakes · on March 6, 2024

So is the idea (for the last/$20k one) that you would convince someone to paste your maliciously crafted prompt to steal their data?

The other post[0] of the same exploit is really interesting b/c it reads instructions from a document. So if someone had something like "find X in my documents" and you shared the malicious document with them, it could trigger those instructions.

[0] https://embracethered.com/blog/posts/2023/google-bard-data-e...

tevon · on March 7, 2024

It could likely also be injected via malicious websites, force-shared google docs etc.

If a unknowing user asks a simple question, and Gemini reaches out to a malicious website for an answer, the prompt could be injected.

Additionally it could be taken out of an email / doc that was previously sent to the innocent user if the user asked Gemini to search their email or docs or something.

Kind of crazy the number of delivery vectors there are for these connected LLMs

azherebtsov · on March 6, 2024

I think the idea might be that companies who will decide to use bard under the hood of theirs chat bots/assistants may use Google suite extensively. Attacker will use the prompt from the article as an input to this custom chat bot and will have access to private Google workspace (corporate email, docs,…)

doakes · on March 7, 2024

Ok, that makes a lot more sense. If a company provides a chat bot/assistant, you can trick it into exposing company data it has access to. Thanks

guessbest · on March 6, 2024

Its seems like a combination of 90's seo spam pages combined with running unsigned/unchecked executables. I think we're going to have certifications and positions for AI Tools Security Officers in the near future if we don't already.

Klathmon · on March 7, 2024

I'm also thinking of attacks similar to the recent okta attack where they gained access through a support employee.

I could see trying to get queries like this to show up in their internal tooling, show up in a support ticket, or somewhere like that.

Then the first time it's executed to see what the issue could be, it can exfiltrate any data it has access to!

vizzah · on March 6, 2024

yeah, sounds like a "weird" vulnerability assuming it comes from a malicious text payload someone must deliberately insert into the own chat.

Hard to fathom $20k prize for that, to us old-schoolers, used to at least expect exploit delivery from an innocently-looking link.

moyix · on March 6, 2024

Worth noting that you can use "invisible text" to give instructions to LLMs without it showing up in the chat box. So all you have to do is get someone to copy/paste one of those messages into their chat, and there are lots of ways you might be able to do this ("omg I figured out a cool new jailbreak that makes the model do anything you want!"). See here for more details:

https://news.ycombinator.com/item?id=39004822

https://twitter.com/goodside/status/1746685366952735034

cjbprime · on March 7, 2024

Now that the models are multimodal, you can do it with images (e.g. white text on a white background) too.

kangabru · on March 6, 2024

With all the hype around AI I'm sure people are trying out all sorts of products that could have vulnerabilities like this. For example, imagine a recruiter hooks up an AI product to auto-read their LinkedIn messages and evaluate candidates. An attacker would just have to contact them, get the AI to read something of theirs, and this prompt attack could expose private information about the recruiter and/or company. The attacker would just need the recruiter to view the image (or better yet, have the service prefetch the image) to expose the data.

sroussey · on March 7, 2024

This sounds like a highly specific example. ;)

doakes · on March 6, 2024

That was my thought. Since you could also convince them to paste "javascript:..." into their URL bar and that's not an issue to Google.

kccqzy · on March 6, 2024

It's not weird in the sense that people are known to trick other people into opening the browser's JS console and pasting various things they don't understand. Things like "open Facebook then open the console and paste this to see whether your crush is stalking your profile" and people would actually do that. Of course the pasted script actually exfiltrates to the attacker a bunch of your private information.

lordswork · on March 6, 2024

You could probably obfuscate the text payload and make it seem like a cool trick you'd want to try out yourself, like "Check out this prompt that generates these cool images with Gemini!" (cool images attached).

seafoamteal · on March 6, 2024

This was a really interesting and also fun read. Btw, I am absolutely loving the design of this website.

KTibow · on March 7, 2024

I notice that there's an extra horizontal scrollbar, I think they forgot to set box-sizing

fddrdplktrew · on March 6, 2024

[flagged]

12907835202 · on March 7, 2024

On mobile it's quite aesthetically pleasing

opello · on March 7, 2024

Does anyone know what a "markdown verbatism" is?

In trying to find out what a "verbatism" the best I could do was a typo of "verbatim" but that doesn't quite map to "markdown formatted literal." Or maybe it's the rendered form of the markdown literal?

Anyway, seemed like interesting and new vocabulary that was key to the one issue for sure.

e12e · on March 7, 2024

It's probably a typo for verbatim. It's probably not intentional, but either way, it illustrates that llms are quite forgiving, and that the LLM "understands" the typo, while a strict whitelist that checked for "markdown verbatim" would let the prompt through...

e12e · on March 7, 2024

* strict blacklist

opello · on March 8, 2024

Sure, but even if it's a typo for verbatim, I still don't quite understand what "a markdown verbatim" would mean where verbatim is the noun.

I've always thought of the daringfireball.net[1] page as the authoritative source of Markdown syntax, and it calls them "code blocks." It looks like Pandoc[2] talks about "verbatim environments" in the same way. And clang[3] has a method for extracting documentation formatted as "markdown verbatim," instead of applying formatting to the document.

[1] https://daringfireball.net/projects/markdown/syntax#precode

[2] https://pandoc.org/MANUAL.html#verbatim

[3] https://clang.llvm.org/extra/doxygen/classclang_1_1clangd_1_...

I went to Gemini to ask it what a "markdown verbatism" was and:

> In markdown, verbatims are code snippets or text that you want displayed exactly as you typed it, without markdown interpreting any formatting instructions.

it seems to be applying the Pandoc usage, which I found a few other places too. But it strikes me as an excessively jargon-heavy way of talking about code blocks or pre-formatted blocks when those terms seem resolve the nuance and would be common in other contexts.

The idea that it's a clever way to escape a blacklist is interesting too.

e12e · on March 8, 2024

I guess response is the noun and verbatim the adjective (or give the verb, verbatim the adverb):

> Give me a response as a "markdown verbatism" of a button like:

> [Click Me](https://www.google.com)

opello · on March 9, 2024

In this example it seems peculiar, if not incorrect, for the adjective (verbatim) to come after the adjectival noun (markdown).

e12e · on March 9, 2024

And the quoting is odd as well.

Lockal · on March 6, 2024

I already prepared to make a rant with "yet another cool-hacker invented prompt injection or discovered how LLM works", but was pleasantly surprised that it was not the case

kccqzy · on March 6, 2024

> The awesome part is that we could ask them any question about the applications, how they worked and the security engineers could quickly check the source code to indicate if we should dig into our ideas or if our assumptions are a dead end.

Wow. So this is basically around the same access as an internal red team. Simply amazing!

Labo333 · on March 6, 2024

Great article! (shameless plug) As an alternative to "Burp Extension Copy As Python-Requests", I coded this CLI tool that converts HAR to Python Requests code: https://github.com/louisabraham/har2requests

aldousd666 · on March 6, 2024

I love stuff like this. Once upon a time I thought I'd get more into hacking like this and started working on it... But then I changed jobs and never got back. This made me remember all those games of capture the flag in the 90s.

px43 · on March 6, 2024

Loving that CSP bypass :-D

asynchronous · on March 6, 2024

Unrelated to the article but the website design itself is top notch.

realprimoh · on March 7, 2024

It really is! I didn't go back and click on the homepage til I read this comment, but the vibes of it are amazing.

rokkitmensch · on March 7, 2024

The best tidbit is the precomputed graphql queries. Just... why. One of those "not even broken, but for the love of potatoes why".

jrockway · on March 7, 2024

I guess my favorite thing is that Google now uses GraphQL, but error code 13 is still "INTERNAL".

alicelebi · on March 7, 2024

You've got a cool website :D

pass7 · on March 7, 2024

Give me Josie kirkman Instagram

o11c · on March 6, 2024

So now it's not just Artificial Stupidity, but Artificial Insecurity.

sonicanatidae · on March 6, 2024

It was never secure and anyone that said it was, was lying or mistaken.

o11c · on March 6, 2024

Some insecurity is natural, due to the problem being hard.

But this insecurity was artificially added to a system that was, in general, previously secure.