Hacker News new | past | comments | ask | show | jobs | submit login
GAZEploit: Remote keystroke inference attack by gaze estimation in VR/MR devices (wired.com)
180 points by wallflower 5 days ago | hide | past | favorite | 101 comments





I'm genuinely shocked. I assumed that Apple would have foreseen this possibility and locked the Persona's eyes somewhere as long as the user was typing, at least for passwords.

Whole point of the digital face is to look real though, and freezing the gaze would look unnervingly fake.

I'm confident they could come up with a filler eye animation algorithm that was convincing enough to pass muster for short periods of time. Even if hand coding something didn't quite work out, they certainly have tons of eye tracking data internally they could use to train a small model, or optimize parameters.

The complexity of this solution just shot up from "2 minute hack" to "2 month research project, minimum". It's understandable why they didn't do this.

I don't think anyone was suggesting to go for the 'parameterized model' from the start. They could just hide the eyes while typing, as a good starting point.

Yeah. Make them appear closed, done

No way this takes two months to get to a convincing proof of concept.

But you could at least dampen out or randomize eye travel while looking at the keyboard. Fully reproducing eye output is a recipe for disaster, and that should have been obvious.

It's about tradeoffs, the device is barely 7 months old at this point. Thankfully the fix is fairly obvious too.

OTOH once you as an outsider know that sometimes the AVP is lying to you about where the wearer is looking why would you ever trust it?

For example, you could then use the AVP to stare at people and then claim afterwards you were doing no such thing.


Add a faint glow to indicate they're typing and the continued face animation is a stand-in.

Throw people for a loop and switch your headset keyboard to DVORAK. When they scan your eye movements and apply to QWERTY, they'll be confused AF!

Well, you still only have to try one other password. If you get locked out after one password attempt and nobody knows that you use dvorak, your defense works, but if you have three attempts, you can also add colemak to your list of things to try ;)

add sunglasses to the avatar while typing

Someone hire this person please.

Just have them close their eyes? That's what I do when I have to recall my password anyway.

Just do the same thing the external display does and do a 'cloudy eyes' version when they user is interacting w/ the keyboard.

If I were implementing it and wanted to obscure, I'd blur the whole screen momentarily, probably with a small message. I really doubt that's ideal for a commercial offering, though. I'm not really worried about unnerving people if I'm using an avatar, that comes with the territory as it is.

Why? Most people are capable of fixating at a single point with basically no perceptible eye movement.

It would, wouldn't it?

I'd suggest blurring the face in a "password input context" (like password fields on the web with their redacted display text), but I suspect that that'd go against what Apple wants the Vision Pro experience to look like.


Then it shouldn't be used for secure input.

> I assumed

Oh man, this is my favorite part of the Apple Design Cycle!

1. Apple announces a new feature that is suspiciously invasive and only marginally useful (eg. iCloud Screening, Find My, OCSP, etc.)

2. Self-conscious, Apple releases a security whitepaper that explains how things should work but doesn't let anyone audit their system

3. Users assume that things are okay because the marketing tells them it is okay, and do not ever consider the potential for an exploit

4. The data leaks, either to advertisers, Apple employees, warrantless government allies, government adversaries or OEM contractors

5. Apple customers attempt to absolve themselves of responsibility ("How was I supposed to know?")

I've seen this process so many times at this point that I'm just apathetic to it all. Maybe one day people will learn to stop assuming the best when there is literally no evidence corroborating it.


What data leaks? What are you talking about?

https://www.ifixit.com/News/33801/apple-genius-caught-steali...

https://arstechnica.com/tech-policy/2023/12/apple-admits-to-...

https://apple.stackexchange.com/questions/445122/is-icloud-p...

Various oversight issues of that nature. Note: we could know about all of these exploits before-hand if Apple's supposedly-private infrastructure was meaningfully accountable.


Please note this is fixed:

> The researchers alerted Apple to the vulnerability in April, and the company issued a patch to stop the potential for data to leak at the end of July


They released airtags without thinking about stalking, so I'm not that shocked.

This has to be a lie made on purpose since it is so easily proven wrong.

Here is the keynote: https://www.youtube.com/live/JdBYVNuky1M?si=46vw7FG3SjWWBezn

9.25 is when they talk about unwanted tracking.


Ok, if you want me to be more specific. (Given that we are talking about keynotes, which are, by design, marketing mistruths. )

They thought a bit about stalking, but not enough to alter the experience, or release tools for non-apple owners to avoid being tracked.

Sure, there are some "industry leading features" but no-one else in industry decided to co-opt a network of ~1 billion devices to provide location updates.

Sure Apple made it very difficult to track an airtag on a person, for the owner's privacy. But that also means that the non-owner is less able to find it.

It takes about 3-5 days (although its been up to two weeks in some cases) before my various iphones twig that an errant airtag is with me.

Now you might see me as someone who is anti apple, or has an agenda against apple. Thats not the case.

The issue is, when you create a device like this, and marry it to such a capable platform, you have to own the side effects. It took something like _6 months_ to release an android airtag detector. Which means it was very much an after thought. Had they talked to any Domestic Violence support groups, they would have told them very clearly how these devices would be used. (I suspect they did, but that would destroy the product vision too much, so it was downgraded. )


What?? It had much better anti-stalking features at launch than its competitors like Tile.

I wasn't aware that tile blocked an entire phone operating system from detecting their product.

Tile also has the advantage of not being able to provide any useful location data less than a few hundred meters (unless you use the beeper)

The spatial resolution that airtags are capable of, because of the network of iOS devices that were auto enrolled is far far greater than the shit that tile could hope to dream of.


Did Tile (or any similar product) have infrastructure that allowed non-suspecting people to know that they were being tracked? AirTags had that

only if you had an iphone. (that has now changed, belatedly.)

just because tile is a fly by night type organisation, doesn't mean apple can get away with being so lacklustre about safety.

They _knew_ that this was a risk, but didn't choose to mitigate it until much later on. Had they bothered to listen to the nagging voices, they wouldn't have been surprised.


Actually, well put. I agree

If you look at the video, it's not only the eyes here. There's a huge head movement too. Having a keyboard so large in your FOV that you have to turn your head to type something is a contributing factor.

I wonder what the accuracy is if you drop the eye tracking and only do head tracking on that demo.


It would be interesting to see both isolated.

I don’t think eye tracking alone would give you the necessary bounds for inferring the keyboard size. For one, eyes flit around more and also are harder to see.

I also wonder how easily this attack is foiled by different key clusters. E.g it looks like they’re relying on large head movements at opposite ends of the keyboard to infer the bounds.

But keyboard use can be very clustered which would foil the ability to know how wide the user has the keyboard.

I imagine it also breaks when the user moves the keyboard


Finally all those banks with randomised input grids on their websites are validated!

It'd be pretty cyberpunk if the mitigation to this is to have your eyes digitally obscured when typing in sensitive data.

And we know the only viable option would be simulated mirror shades

But then a would-be attacker could simply read what you type in the reflections!

The shades "reflect" a password that takes you to a honeypot

And perhaps replaced with a cartoon ‘x’ if your life signs terminate while you are using the device

Shades of the Lotus Notes “Visual Hash”

https://security.stackexchange.com/questions/41247/changing-...


This deserves a separate submission.

That is so bad it almost has to be a deliberate method to extract passwords.


This is remarkable. Enterprise software is its own microcosmos of pain.

Eye tracking data is incredibly sensitive and privacy-concerning.

HN tends to dislike Microsoft, but they went to great lengths to build a HoloLens system where eye tracking was both useful and safe.

The eye tracking data never left the device, and was never directly available to the application. As a developer, you registered targets or gestures you were interested in, and the platform told you when the user for example looked to activate your target.

Lots of subtlety and care went into the design, so yes, the first six things you think of as concerns or exploits or problems were addressed, and a bunch more you haven't thought of yet.

If this is a space you care about, read up on HoloLens eye tracking.

It's pretty inexcusable if Apple is providing raw eye tracking streams to app developers. The exploits are too easy any too prevalent. [EDIT ADDED: the article is behind a paywall but it sounds from comments here like Apple is not providing raw eye tracking streams, this is about 3rd parties watching your eyes to extract your virtual typing while you are on a conference call]


> if Apple is providing raw eye tracking streams to app developers

Apple is not doing that. As the article describes, the issue is that your avatar (during a FaceTime call, for example) accurately reproduces your eye movements.


Isn't it the a distinction without a difference ? Apple isn't providing your real eye movements, but an 1 to 1 reproduction of what it tracks as your eye movements.

The exploit requires analysing the avatar's eyes, but as they're not the natural movements but replicated ones, there should be a lot less noise. And of course as you need to intentionally focus on specific UI targets, these movements are even less natural and fuzzy than if you were looking at your keyboard while typing.


The difference is that you can't generalize the attack outside of using Personas, a feature which is specifically supposed to share your gaze with others. Apps on the device still have no access to what you're looking at, and even this attack can only make an educated guess.

This is a great example of why ‘user-spacey’ applications from the OS manufacturer shouldn’t be privileged beyond other applications: Because this bypasses the security layer while lulling devs into a false sense of security.

> ‘user-spacey’ applications from the OS manufacturer shouldn’t be privileged beyond other applications

I don't think that's an accurate description, either. The SharePlay "Persona" avatar is a system service just like the front-facing camera stream. Any app can opt into using either of them.


That app gets a real time Gaze vector, which unless I've misunderstood something, non-core apps don't get.

Which app?

I should have said avatar service.

But the technology is there. That is the concern.

The technology to reproduce eye movements has been around since motion pictures were invented. I'm sure even a flat video stream of the user's face would leak similar information.

Apple should have been more careful about allowing any eye motion information (including simple video) to flow out of a system where eye movements themselves are used for data input.


"technology to reproduce eye movements has been around since motion pictures were invented"

Sure, but like everything. It is when it is widespread that the impact changes. The technology was around, but now it could be on everyone's face, tracking everything you look at.

If this was added to TV's so every TV was tracking your eye-movements, and reporting that back to advertisers. There would be an outcry.

So this is just the slow nudging us in that direction.


To be clear, the issue this article is talking about is essentially "during a video call the other party can see your eyes moving."

I agree that we should be vigilant when big corps are adding more and more sensors into our lives, but Apple is absolutely not reporting tracked eye-movement data to advertisers, nor do they allow third-party apps to do that.


It's not a problem with the technology.

The problem is the edge case where it's used for two different things with different demands at the same time, and the fix is to...not do that.

> Apple fixed the flaw in a Vision Pro software update at the end of July, which stops the sharing of a Persona if someone is using the virtual keyboard.


"fixed the flaw "

Or

"Ooopps, so sorry you caught us. Guess we'll have better luck keeping this hidden next time."


Keeping what hidden? Caught who? The eye-tracking technology is literally a core part of the platform. What is it you're trying to say?

From articles first sentance:

" lot about someone from their eyes. They can indicate how tired you are, the type of mood you’re in, and potentially provide clues about health problems. But your eyes could also leak more secretive information: your passwords, PINs, and messages you type."

Do you want that shared with advertisers? With your health care provider?

The article isn't about the technology, it is about sharing the data.


Who are you saying shared what data with whom?

How are they getting the data you claim is shared with them?

Does HoloLens also use a keyboard you can type into with eye movement? If not, seems to be unrelated to this attack at all. If yes, then how would it prevent this attack where you can see the persons eyes? Doesn't matter if the tracking data is on-device only or not as you're broadcasting an image of the face anyways.

Not when I used it, you had to "physically" press a virtual keyboard with your hands

I disagree strongly. I don't want big tech telling me what I can and can't do with the device I paid for and supposedly own "for my protection". The prohibition on users giving apps access to eye tracking data and MR camera data is paternalistic and, frankly, insulting. This attitude is holding the industry back.

This exploit is not some kind of unprecedented new thing only possible with super-sensitive eye tracking data. It is completely analogous to watching/hearing someone type their password on their keyboard, either in person when standing next to them or remotely via their webcam/mic. It is also trivial to fix. Simply obfuscate the gaze data when interacting with sensitive inputs. This is actually much better than you can do when meeting in person. You can't automatically obfuscate your finger movements when someone is standing next to you while you enter your password.


You are an expert user, so of course you will demand extra powers.

The vast majority of people are not expert users, so for them having safe defaults is critical to their safety online.

> It is completely analogous to watching/hearing someone type their password on their keyboard,

Except the eye gaze vector is being delivered in high fidelity to your client so it can render the eyes.

Extracting eye gaze from normal video is exceptionally hard. Even with dedicated gaze cameras, its pretty difficult to get <5 degrees of certainty (without training or optimal lighting.)


Apple does not provide eye tracking data. In fact, you can’t even register triggers for eye position information, you have to set a HoverEffectComponent for the OS to highlight them for you.

Video passthrough also isn’t available except to “enterprise” developers, so all you can get back is the position of images or objects that you’re interested in when they come into view.

Even the Apple employee who helped me with setup advised me not to turn my head, but to keep my head static and use the glance-and-tap paradigm for interacting with the virtual keyboard. I don’t think this was directly for security purposes, just for keeping fatigue to a minimum when using the device for a prolonged period of time. But it does still have the effect of making it harder to determine your keystrokes than, say, if you were to pull the virtual keyboard towards you and type on it directly.

EDIT: The edit is correct. The virtual avatar is part of visionOS (it appears as a front camera in legacy VoIP apps) and as such it has privileged access to data collected by the device. Apparently until 1.3 the eye tracking data was used directly for the gaze on the avatar, and I assume Apple has now either obfuscated it or blocks its use during password entry. Presumably this also affects the spatial avatars during shared experiences as well.

Interestingly, I think the front display blanks out your gaze when you’re entering a password (I noticed it when I was in front of a mirror) to prevent this attack from being possible by using the front display’s eye passthrough.


"privacy-concerning"

Like checking out how you are zeroing in on the boobs. What would sponsored adds look like, once they also know what you are looking at every second. Even some medical add, and the eyes checkout the actresses body.

"Honey, why am I suddenly getting adds for Granny Porn?".



Hololens 2 certainly has support for passing gaze direction, not sure about the first one.

I think the headsets are pretty much in alignment that it's a feature that needs permissions but they'll provide it to the app with focus.

Apple is a lot more protective.


I personally view this as gatekeeping, which should be outright illegal.

As far as I know eye tracking isn’t available in VisionOS[0]

This article snippet is behind a paywall but it seems like it’s talking about the eyes that are projected on the outside of the device.

So basically it’s no more of an exploit than just tracking someone’s actual eyes.

0: https://forums.developer.apple.com/forums/thread/732552


Go behind the paywall here: https://archive.ph/44zwN

the article is talking about avatars in conference calls which accurately mirror your eye position. Someone else on that call could record you and extract your keyboard inputs from your avatar.

Enabling "reader mode" bypasses the paywall in this instance


I think the underlying flaw here is that pointing your eyes at a virtual keyboard in space to type passwords is just a poor input method. Take away the VR headset and do the same thing and the flaw still exists.

Now I want to make a keyboard where you shine a laser pointer at the key you want to press, and your cat jumping up is what actually triggers the button press.


> I think the underlying flaw here is that pointing your eyes at a virtual keyboard in space to type passwords is just a poor input method

fwiw while you can do that, it's much easier to just poke the keys or use Siri

a folding bluetooth keyboard with built in trackpad has become a must have travel accessory for me :)


I don’t have letters on any of my keys and switch between keyboard layouts frequently. I never look at my keyboard, am I still vulnerable?

Definitely not. It seems that the keyboard on Apple Vision Pro is an onscreen keyboard you type with using your eyes. The Vision Pro also broadcasts your eye movements to a screen on the front of your headset, and the combination of the two is what leaks your password. If you are just in VR looking at a virtual keyboard to type, it's no big deal. If you are typing on a physical keyboard and people are videotaping your eyes, it's no big deal. The combination of the two is the problem.

> I never look at my keyboard

The article title :

> Gaze estimation

It doesn't seem like it


That’s my thought too but maybe we subconsciously move our eyes when we type and still is a side channel.

First author here. https://www.arxiv.org/abs/2409.08122 Here is our pre-print. I am happy to answer questions in this thread. :)

How many people are typing with their eyes to begin with? Aren't they using their hands far more often? Cool attack, but I'm not sure there's much real attack surface here if no one is typing with their gaze while using an avatar.

you look at the letter and pinch... that's how i do it. Not often. and not during facetime calls. But yeah... possible.

Yeah, I know you can type that way, but I have a quest3 and after watching the video I would think no one is actually typing that way. It looks to be easily twice as slow and way more annoying than just using your fingers with hand tracking.

The video is definitely exaggerated because they’re moving their whole head.

Typing with your eyes is much faster and more subtle than what they show here.


What if the keyboard was put in the user's off-hand and they typed on it by tapping their palm? Then the keyboard wouldn't be in a fixed position to correlate eye movement against it.

It's a probabilistic attack. Of course there are workarounds and doesn't work when people are touch typists and don't look at their keyboards...

Brilliant though, just brilliant.


We need more factors of authentication. And the number required should increase with the serious of the operation.

Buying lunch - 1 Selling your home - 10


As if there weren't enough reasons to learn touch typing.

You type by looking at the letters on the keyboard

When I type in VR, I do it with a physical keyboard.

I think the key problem with all the data we’re sharing, including telemetry, is that even when specific inputs like passwords aren’t directly visible, the information still narrows down the possible key, and password spaces.

Just today, another news of credentials flying away: https://news.ycombinator.com/item?id=41535901

Can this also be done for normal videos over zoom?

Not really. you need to know the size of the keyboard, know the shape ov peoples eyes, have enough temporal and optical resolution to workout where they are pointing.

Even with optimal conditions (ie dedicated cameras, no eye make up and correct positioning) uncalibrated gaze has at least a 5 degree uncertainty.


Video for those who can’t get past the paywall

https://youtu.be/DPYT8IH-R18?si=5tcQ3NJltxROJDUq


> as long as we get enough gaze information that can accurately recover the keyboard, then all following keystrokes can be detected

That’s a pretty big assumption. Also, I guess the user has to be stationary - stay in the camera’s field of view and not move their head in a way that would obstruct the image.

Unless this is about intercepting in-device data; but in this case it seems easier to address.


> Also, I guess the user has to be stationary - stay in the camera’s field of view and not move their head in a way that would obstruct the image.

The user is always stationary in relation to the headset and the cameras in it.


Yes but not to the video feed that the other person sees.

If you move around, your head moves too. If you stand up, you momentarily go out of frame before it applies a delayed sync. The idea being that it matches what a regular webcam would do.


I see what you mean. I doubt it's very common for people to move around that much whilst trying to type their password on the on-screen keyboard though. Sure there may be cases where the attack fails but I bet they'd be few and far between.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: