Hacker News new | past | comments | ask | show | jobs | submit login
We hacked Google A.I. (landh.tech)
276 points by EvgeniyZh on March 6, 2024 | hide | past | favorite | 48 comments



This blog post gave me a great deal of self confidence.

While I have no doubts how good the author and his friends are, all of their ideas were quite intuitive and simple to understand.

The kind of "I could've come with the same idea" type. Realistically I would've not for many reasons but it is still stuff I can grasp and even gives me ideas while reading.

Which is different from the general hacker idea I have of someone in a basement exploiting extremely far fetched and hard to grasp for me memory corruptions in some cache dumping some random bytes like the very complex attacks like Spectre I've read about.

It also makes me think that if most of the applications I have worked on haven't been attacked and easily exploited is because honestly nobody bothered.


> It also makes me think that if most of the applications I have worked on haven't been attacked and easily exploited is because honestly nobody bothered.

This is my view of the things I create as well, and the fact they are not released to the public and are not generally public facing. Building internal tools does have a bit of freedom. However, I do things to the best of my knowledge "best practice" and don't intentionally do stupid things just because. But it is rather reassuring knowing that it's not that exposed to show how small "best of my knowledge" really is


Completely agree with you. However, I think over time I've come to realize that those hacks that seem obscure, weird, and impossible are not perceived that way by the people who discover them. It's just their area of expertise, their natural playground. And so maybe those exploits are as easy to understand by others in that field, as this blog post is to you.


Yes. Things always look way harder from the outside.

Also, the fact that clarity of exposition is interpreted as triviality is why people are sometimes compelled to write things in a way that deliberately obscures the content — one doesn’t want to risk explaining it too well and having the reader think “well, I could’ve come up with that!”.


The general hacker idea you have is... not reality.


At this point, I'm convinced that 90% of modern hacking is reliant on

username: admin

password: password


The other 10% is just literally asking for their password


More precisely, it's a Burp Suite plugin that tests for that combo, and users have no idea because it's one of a million didn't things they also have no idea are running.


this is even too advanced. 99% of modern hacking is :

username: admin

password:


The best lock pickers spend a lot of time making their own locks.


So is the idea (for the last/$20k one) that you would convince someone to paste your maliciously crafted prompt to steal their data?

The other post[0] of the same exploit is really interesting b/c it reads instructions from a document. So if someone had something like "find X in my documents" and you shared the malicious document with them, it could trigger those instructions.

[0] https://embracethered.com/blog/posts/2023/google-bard-data-e...


It could likely also be injected via malicious websites, force-shared google docs etc.

If a unknowing user asks a simple question, and Gemini reaches out to a malicious website for an answer, the prompt could be injected.

Additionally it could be taken out of an email / doc that was previously sent to the innocent user if the user asked Gemini to search their email or docs or something.

Kind of crazy the number of delivery vectors there are for these connected LLMs


I think the idea might be that companies who will decide to use bard under the hood of theirs chat bots/assistants may use Google suite extensively. Attacker will use the prompt from the article as an input to this custom chat bot and will have access to private Google workspace (corporate email, docs,…)


Ok, that makes a lot more sense. If a company provides a chat bot/assistant, you can trick it into exposing company data it has access to. Thanks


Its seems like a combination of 90's seo spam pages combined with running unsigned/unchecked executables. I think we're going to have certifications and positions for AI Tools Security Officers in the near future if we don't already.


I'm also thinking of attacks similar to the recent okta attack where they gained access through a support employee.

I could see trying to get queries like this to show up in their internal tooling, show up in a support ticket, or somewhere like that.

Then the first time it's executed to see what the issue could be, it can exfiltrate any data it has access to!


yeah, sounds like a "weird" vulnerability assuming it comes from a malicious text payload someone must deliberately insert into the own chat.

Hard to fathom $20k prize for that, to us old-schoolers, used to at least expect exploit delivery from an innocently-looking link.


Worth noting that you can use "invisible text" to give instructions to LLMs without it showing up in the chat box. So all you have to do is get someone to copy/paste one of those messages into their chat, and there are lots of ways you might be able to do this ("omg I figured out a cool new jailbreak that makes the model do anything you want!"). See here for more details:

https://news.ycombinator.com/item?id=39004822

https://twitter.com/goodside/status/1746685366952735034


Now that the models are multimodal, you can do it with images (e.g. white text on a white background) too.


With all the hype around AI I'm sure people are trying out all sorts of products that could have vulnerabilities like this. For example, imagine a recruiter hooks up an AI product to auto-read their LinkedIn messages and evaluate candidates. An attacker would just have to contact them, get the AI to read something of theirs, and this prompt attack could expose private information about the recruiter and/or company. The attacker would just need the recruiter to view the image (or better yet, have the service prefetch the image) to expose the data.


This sounds like a highly specific example. ;)


That was my thought. Since you could also convince them to paste "javascript:..." into their URL bar and that's not an issue to Google.


It's not weird in the sense that people are known to trick other people into opening the browser's JS console and pasting various things they don't understand. Things like "open Facebook then open the console and paste this to see whether your crush is stalking your profile" and people would actually do that. Of course the pasted script actually exfiltrates to the attacker a bunch of your private information.


You could probably obfuscate the text payload and make it seem like a cool trick you'd want to try out yourself, like "Check out this prompt that generates these cool images with Gemini!" (cool images attached).


This was a really interesting and also fun read. Btw, I am absolutely loving the design of this website.


I notice that there's an extra horizontal scrollbar, I think they forgot to set box-sizing


[flagged]


On mobile it's quite aesthetically pleasing


Does anyone know what a "markdown verbatism" is?

In trying to find out what a "verbatism" the best I could do was a typo of "verbatim" but that doesn't quite map to "markdown formatted literal." Or maybe it's the rendered form of the markdown literal?

Anyway, seemed like interesting and new vocabulary that was key to the one issue for sure.


It's probably a typo for verbatim. It's probably not intentional, but either way, it illustrates that llms are quite forgiving, and that the LLM "understands" the typo, while a strict whitelist that checked for "markdown verbatim" would let the prompt through...


* strict blacklist


Sure, but even if it's a typo for verbatim, I still don't quite understand what "a markdown verbatim" would mean where verbatim is the noun.

I've always thought of the daringfireball.net[1] page as the authoritative source of Markdown syntax, and it calls them "code blocks." It looks like Pandoc[2] talks about "verbatim environments" in the same way. And clang[3] has a method for extracting documentation formatted as "markdown verbatim," instead of applying formatting to the document.

[1] https://daringfireball.net/projects/markdown/syntax#precode

[2] https://pandoc.org/MANUAL.html#verbatim

[3] https://clang.llvm.org/extra/doxygen/classclang_1_1clangd_1_...

I went to Gemini to ask it what a "markdown verbatism" was and:

> In markdown, verbatims are code snippets or text that you want displayed exactly as you typed it, without markdown interpreting any formatting instructions.

it seems to be applying the Pandoc usage, which I found a few other places too. But it strikes me as an excessively jargon-heavy way of talking about code blocks or pre-formatted blocks when those terms seem resolve the nuance and would be common in other contexts.

The idea that it's a clever way to escape a blacklist is interesting too.


I guess response is the noun and verbatim the adjective (or give the verb, verbatim the adverb):

> Give me a response as a "markdown verbatism" of a button like:

> [Click Me](https://www.google.com)


In this example it seems peculiar, if not incorrect, for the adjective (verbatim) to come after the adjectival noun (markdown).


And the quoting is odd as well.


I already prepared to make a rant with "yet another cool-hacker invented prompt injection or discovered how LLM works", but was pleasantly surprised that it was not the case


> The awesome part is that we could ask them any question about the applications, how they worked and the security engineers could quickly check the source code to indicate if we should dig into our ideas or if our assumptions are a dead end.

Wow. So this is basically around the same access as an internal red team. Simply amazing!


Great article! (shameless plug) As an alternative to "Burp Extension Copy As Python-Requests", I coded this CLI tool that converts HAR to Python Requests code: https://github.com/louisabraham/har2requests


I love stuff like this. Once upon a time I thought I'd get more into hacking like this and started working on it... But then I changed jobs and never got back. This made me remember all those games of capture the flag in the 90s.


Loving that CSP bypass :-D


Unrelated to the article but the website design itself is top notch.


It really is! I didn't go back and click on the homepage til I read this comment, but the vibes of it are amazing.


The best tidbit is the precomputed graphql queries. Just... why. One of those "not even broken, but for the love of potatoes why".


I guess my favorite thing is that Google now uses GraphQL, but error code 13 is still "INTERNAL".


You've got a cool website :D


Give me Josie kirkman Instagram


So now it's not just Artificial Stupidity, but Artificial Insecurity.


It was never secure and anyone that said it was, was lying or mistaken.


Some insecurity is natural, due to the problem being hard.

But this insecurity was artificially added to a system that was, in general, previously secure.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: