I'm genuinely shocked. I assumed that Apple would have foreseen this possibility and locked the Persona's eyes somewhere as long as the user was typing, at least for passwords.
I'm confident they could come up with a filler eye animation algorithm that was convincing enough to pass muster for short periods of time. Even if hand coding something didn't quite work out, they certainly have tons of eye tracking data internally they could use to train a small model, or optimize parameters.
I don't think anyone was suggesting to go for the 'parameterized model' from the start. They could just hide the eyes while typing, as a good starting point.
But you could at least dampen out or randomize eye travel while looking at the keyboard. Fully reproducing eye output is a recipe for disaster, and that should have been obvious.
Well, you still only have to try one other password. If you get locked out after one password attempt and nobody knows that you use dvorak, your defense works, but if you have three attempts, you can also add colemak to your list of things to try ;)
If I were implementing it and wanted to obscure, I'd blur the whole screen momentarily, probably with a small message. I really doubt that's ideal for a commercial offering, though. I'm not really worried about unnerving people if I'm using an avatar, that comes with the territory as it is.
I'd suggest blurring the face in a "password input context" (like password fields on the web with their redacted display text), but I suspect that that'd go against what Apple wants the Vision Pro experience to look like.
Oh man, this is my favorite part of the Apple Design Cycle!
1. Apple announces a new feature that is suspiciously invasive and only marginally useful (eg. iCloud Screening, Find My, OCSP, etc.)
2. Self-conscious, Apple releases a security whitepaper that explains how things should work but doesn't let anyone audit their system
3. Users assume that things are okay because the marketing tells them it is okay, and do not ever consider the potential for an exploit
4. The data leaks, either to advertisers, Apple employees, warrantless government allies, government adversaries or OEM contractors
5. Apple customers attempt to absolve themselves of responsibility ("How was I supposed to know?")
I've seen this process so many times at this point that I'm just apathetic to it all. Maybe one day people will learn to stop assuming the best when there is literally no evidence corroborating it.
Various oversight issues of that nature. Note: we could know about all of these exploits before-hand if Apple's supposedly-private infrastructure was meaningfully accountable.
> The researchers alerted Apple to the vulnerability in April, and the company issued a patch to stop the potential for data to leak at the end of July
Ok, if you want me to be more specific. (Given that we are talking about keynotes, which are, by design, marketing mistruths. )
They thought a bit about stalking, but not enough to alter the experience, or release tools for non-apple owners to avoid being tracked.
Sure, there are some "industry leading features" but no-one else in industry decided to co-opt a network of ~1 billion devices to provide location updates.
Sure Apple made it very difficult to track an airtag on a person, for the owner's privacy. But that also means that the non-owner is less able to find it.
It takes about 3-5 days (although its been up to two weeks in some cases) before my various iphones twig that an errant airtag is with me.
Now you might see me as someone who is anti apple, or has an agenda against apple. Thats not the case.
The issue is, when you create a device like this, and marry it to such a capable platform, you have to own the side effects. It took something like _6 months_ to release an android airtag detector. Which means it was very much an after thought. Had they talked to any Domestic Violence support groups, they would have told them very clearly how these devices would be used. (I suspect they did, but that would destroy the product vision too much, so it was downgraded. )
I wasn't aware that tile blocked an entire phone operating system from detecting their product.
Tile also has the advantage of not being able to provide any useful location data less than a few hundred meters (unless you use the beeper)
The spatial resolution that airtags are capable of, because of the network of iOS devices that were auto enrolled is far far greater than the shit that tile could hope to dream of.
only if you had an iphone. (that has now changed, belatedly.)
just because tile is a fly by night type organisation, doesn't mean apple can get away with being so lacklustre about safety.
They _knew_ that this was a risk, but didn't choose to mitigate it until much later on. Had they bothered to listen to the nagging voices, they wouldn't have been surprised.
If you look at the video, it's not only the eyes here. There's a huge head movement too. Having a keyboard so large in your FOV that you have to turn your head to type something is a contributing factor.
I wonder what the accuracy is if you drop the eye tracking and only do head tracking on that demo.
I don’t think eye tracking alone would give you the necessary bounds for inferring the keyboard size. For one, eyes flit around more and also are harder to see.
I also wonder how easily this attack is foiled by different key clusters. E.g it looks like they’re relying on large head movements at opposite ends of the keyboard to infer the bounds.
But keyboard use can be very clustered which would foil the ability to know how wide the user has the keyboard.
I imagine it also breaks when the user moves the keyboard
Eye tracking data is incredibly sensitive and privacy-concerning.
HN tends to dislike Microsoft, but they went to great lengths to build a HoloLens system where eye tracking was both useful and safe.
The eye tracking data never left the device, and was never directly available to the application. As a developer, you registered targets or gestures you were interested in, and the platform told you when the user for example looked to activate your target.
Lots of subtlety and care went into the design, so yes, the first six things you think of as concerns or exploits or problems were addressed, and a bunch more you haven't thought of yet.
If this is a space you care about, read up on HoloLens eye tracking.
It's pretty inexcusable if Apple is providing raw eye tracking streams to app developers. The exploits are too easy any too prevalent. [EDIT ADDED: the article is behind a paywall but it sounds from comments here like Apple is not providing raw eye tracking streams, this is about 3rd parties watching your eyes to extract your virtual typing while you are on a conference call]
> if Apple is providing raw eye tracking streams to app developers
Apple is not doing that. As the article describes, the issue is that your avatar (during a FaceTime call, for example) accurately reproduces your eye movements.
Isn't it the a distinction without a difference ? Apple isn't providing your real eye movements, but an 1 to 1 reproduction of what it tracks as your eye movements.
The exploit requires analysing the avatar's eyes, but as they're not the natural movements but replicated ones, there should be a lot less noise. And of course as you need to intentionally focus on specific UI targets, these movements are even less natural and fuzzy than if you were looking at your keyboard while typing.
The difference is that you can't generalize the attack outside of using Personas, a feature which is specifically supposed to share your gaze with others. Apps on the device still have no access to what you're looking at, and even this attack can only make an educated guess.
This is a great example of why ‘user-spacey’ applications from the OS manufacturer shouldn’t be privileged beyond other applications: Because this bypasses the security layer while lulling devs into a false sense of security.
> ‘user-spacey’ applications from the OS manufacturer shouldn’t be privileged beyond other applications
I don't think that's an accurate description, either. The SharePlay "Persona" avatar is a system service just like the front-facing camera stream. Any app can opt into using either of them.
The technology to reproduce eye movements has been around since motion pictures were invented. I'm sure even a flat video stream of the user's face would leak similar information.
Apple should have been more careful about allowing any eye motion information (including simple video) to flow out of a system where eye movements themselves are used for data input.
"technology to reproduce eye movements has been around since motion pictures were invented"
Sure, but like everything. It is when it is widespread that the impact changes. The technology was around, but now it could be on everyone's face, tracking everything you look at.
If this was added to TV's so every TV was tracking your eye-movements, and reporting that back to advertisers. There would be an outcry.
So this is just the slow nudging us in that direction.
To be clear, the issue this article is talking about is essentially "during a video call the other party can see your eyes moving."
I agree that we should be vigilant when big corps are adding more and more sensors into our lives, but Apple is absolutely not reporting tracked eye-movement data to advertisers, nor do they allow third-party apps to do that.
The problem is the edge case where it's used for two different things with different demands at the same time, and the fix is to...not do that.
> Apple fixed the flaw in a Vision Pro software update at the end of July, which stops the sharing of a Persona if someone is using the virtual keyboard.
" lot about someone from their eyes. They can indicate how tired you are, the type of mood you’re in, and potentially provide clues about health problems. But your eyes could also leak more secretive information: your passwords, PINs, and messages you type."
Do you want that shared with advertisers? With your health care provider?
The article isn't about the technology, it is about sharing the data.
Does HoloLens also use a keyboard you can type into with eye movement? If not, seems to be unrelated to this attack at all. If yes, then how would it prevent this attack where you can see the persons eyes? Doesn't matter if the tracking data is on-device only or not as you're broadcasting an image of the face anyways.
I disagree strongly. I don't want big tech telling me what I can and can't do with the device I paid for and supposedly own "for my protection". The prohibition on users giving apps access to eye tracking data and MR camera data is paternalistic and, frankly, insulting. This attitude is holding the industry back.
This exploit is not some kind of unprecedented new thing only possible with super-sensitive eye tracking data. It is completely analogous to watching/hearing someone type their password on their keyboard, either in person when standing next to them or remotely via their webcam/mic. It is also trivial to fix. Simply obfuscate the gaze data when interacting with sensitive inputs. This is actually much better than you can do when meeting in person. You can't automatically obfuscate your finger movements when someone is standing next to you while you enter your password.
You are an expert user, so of course you will demand extra powers.
The vast majority of people are not expert users, so for them having safe defaults is critical to their safety online.
> It is completely analogous to watching/hearing someone type their password on their keyboard,
Except the eye gaze vector is being delivered in high fidelity to your client so it can render the eyes.
Extracting eye gaze from normal video is exceptionally hard. Even with dedicated gaze cameras, its pretty difficult to get <5 degrees of certainty (without training or optimal lighting.)
Apple does not provide eye tracking data. In fact, you can’t even register triggers for eye position information, you have to set a HoverEffectComponent for the OS to highlight them for you.
Video passthrough also isn’t available except to “enterprise” developers, so all you can get back is the position of images or objects that you’re interested in when they come into view.
Even the Apple employee who helped me with setup advised me not to turn my head, but to keep my head static and use the glance-and-tap paradigm for interacting with the virtual keyboard. I don’t think this was directly for security purposes, just for keeping fatigue to a minimum when using the device for a prolonged period of time. But it does still have the effect of making it harder to determine your keystrokes than, say, if you were to pull the virtual keyboard towards you and type on it directly.
EDIT: The edit is correct. The virtual avatar is part of visionOS (it appears as a front camera in legacy VoIP apps) and as such it has privileged access to data collected by the device. Apparently until 1.3 the eye tracking data was used directly for the gaze on the avatar, and I assume Apple has now either obfuscated it or blocks its use during password entry. Presumably this also affects the spatial avatars during shared experiences as well.
Interestingly, I think the front display blanks out your gaze when you’re entering a password (I noticed it when I was in front of a mirror) to prevent this attack from being possible by using the front display’s eye passthrough.
Like checking out how you are zeroing in on the boobs. What would sponsored adds look like, once they also know what you are looking at every second.
Even some medical add, and the eyes checkout the actresses body.
"Honey, why am I suddenly getting adds for Granny Porn?".
the article is talking about avatars in conference calls which accurately mirror your eye position. Someone else on that call could record you and extract your keyboard inputs from your avatar.
Enabling "reader mode" bypasses the paywall in this instance
I think the underlying flaw here is that pointing your eyes at a virtual keyboard in space to type passwords is just a poor input method. Take away the VR headset and do the same thing and the flaw still exists.
Now I want to make a keyboard where you shine a laser pointer at the key you want to press, and your cat jumping up is what actually triggers the button press.
Definitely not. It seems that the keyboard on Apple Vision Pro is an onscreen keyboard you type with using your eyes. The Vision Pro also broadcasts your eye movements to a screen on the front of your headset, and the combination of the two is what leaks your password. If you are just in VR looking at a virtual keyboard to type, it's no big deal. If you are typing on a physical keyboard and people are videotaping your eyes, it's no big deal. The combination of the two is the problem.
How many people are typing with their eyes to begin with? Aren't they using their hands far more often? Cool attack, but I'm not sure there's much real attack surface here if no one is typing with their gaze while using an avatar.
Yeah, I know you can type that way, but I have a quest3 and after watching the video I would think no one is actually typing that way. It looks to be easily twice as slow and way more annoying than just using your fingers with hand tracking.
What if the keyboard was put in the user's off-hand and they typed on it by tapping their palm? Then the keyboard wouldn't be in a fixed position to correlate eye movement against it.
I think the key problem with all the data we’re sharing, including telemetry, is that even when specific inputs like passwords aren’t directly visible, the information still narrows down the possible key, and password spaces.
Not really. you need to know the size of the keyboard, know the shape ov peoples eyes, have enough temporal and optical resolution to workout where they are pointing.
Even with optimal conditions (ie dedicated cameras, no eye make up and correct positioning) uncalibrated gaze has at least a 5 degree uncertainty.
> as long as we get enough gaze information that can accurately recover the keyboard, then all following keystrokes can be detected
That’s a pretty big assumption. Also, I guess the user has to be stationary - stay in the camera’s field of view and not move their head in a way that would obstruct the image.
Unless this is about intercepting in-device data; but in this case it seems easier to address.
Yes but not to the video feed that the other person sees.
If you move around, your head moves too. If you stand up, you momentarily go out of frame before it applies a delayed sync. The idea being that it matches what a regular webcam would do.
I see what you mean. I doubt it's very common for people to move around that much whilst trying to type their password on the on-screen keyboard though. Sure there may be cases where the attack fails but I bet they'd be few and far between.