I think this would be really useful to UX researchers! The folks I know in that space often set up screen-sharing sessions with users, and watch them interact with the UI in real time. Adding gaze tracking to this process seems like a natural progression, allowing the researchers to see where the user expects a button or UI feature to be, even if the user can't articulate that in real time.
I think this is impressive work. My lab does eye tracking with really expensive equipment locally. It would be interesting to conduct visual/face processing studies through the web and include eye tracking data with this library. Granted the accuracy is low compared to specialized equipment and predictions are highly variable, other contributors could enhance its accuracy. I suspect that the variability in shape, size, and structure of human faces contributed greatly to the high variance/low resolution predictions issue. Maybe categorizing the user's face shape a priori would enhance predictions. That is, the library would first estimate the "type of face" and use those data to inform the eye tracking. This might not make sense--just a thought.
Just curious, but does anyone who pays attention to this space/technology know if eye tracking (with commodity-ish hardware) has gotten to the point of "replacing the mouse"?
E.g. at the OS level, the pointer basically follows your gaze.
Eyes don't point a things we look at as accurate as a mouse does (they only aim within the fovea, about 1 degree angular, corresponding to about 1cm on the screen) and eye trackers are often far worse.
This summer I built a system that uses slight head movements for refinement, allowing hands free mousing as fast as a trackpad. However, eye trackers good enough for the system to be pleasant cost $10000+. I have ideas that would allow it to work with a $100 eye tracker, but even then it has the downside that the computer vision for your mouse will peg one core.
As someone who hates mice and hopes one day to have this sort of tech, I'm trying to understand what you mean by "Eyes don't point at things we look at as accurate as a mouse does" - how is it that I can choose to "look at" an individual pixel on my screen, or an individual serif on a letter, and then choose another one far less than 1cm away, and my eye focus shifts to it? That shift is not detectable by a camera?
(Edit update: by "detectable" I mean theoretically detectable by a really good camera and software. In other words, it seems you are arguing the tech is impossible even theoretically due to some aspect of biology. Am I following you correctly there? Thanks)
I'm also not an expert, but I suspect a confounding problem is the fact that they eyes never stop moving, even when it feels to us like we're looking at a single point: https://en.wikipedia.org/wiki/Eye_movement#Saccades
Not an expert, but you actually focusing on a pixel is your brain doing image processing, not a function of your eyeball.
Your field of vision (and focus) is much bigger than 1 degree, so any part of the image that you are "looking at" is something the brain is doing to the image as post processing, not something your eyeball is doing.
I researched this over a summer once by downloading the latest scientific papers on gaze tracking. The result were pretty disappointing to me. Then I figured I was doing this researching all wrong. Because I already knew from highschool biology that your eye does micro movements all the time to keep the retina stimulated and to keep a larger area in focus at the same time.
So I opened wikipedia and looked up the smallest micro moments that the eyes do. Based on the angle of it and the average distance between your eye and the screen, it's easy to see that you can never replace the mouse with gaze for a pixel perfect pointing device.
However! If you think outside the box, you might get a fairly accurate gaze tracker and a different GUI design to get this to work. That vision (no pun intended) is more of a long term one. An easy short term use for eye gaze use would be automatically setting GUI window focus based on eye gaze. That already might save you a keyboard-mouse switch. As long as you have no more than four windows on your screen, you can make it work with the current tech already.
"you can never replace the mouse with gaze for a pixel perfect pointing device"
And how about getting in the precision area like a finger has on a touchscreen?
You think, this is possible?
Because touch works pretty well, if the UI is good (big) enough ...
I researched it some years ago and the precision was just too low. Furthermore I believe that it has to work REALLY well. In particular it is interesting to know if there would be some kind of health risk?
E.g. if the tracking is off (i.e. the predicted x,y coord differs from the target you had in mind with your gaze) I guess one would try to auto-correct the error by a slightly different gaze, i.e. looking slightly beneath the target. I am not sure if this is a problem or not?
We do not have this problem with the mouse, as we control it via relative speed and do not set specific x,c coords.
Maybe one could solve this by also tracking the head? E.g. make and educated guess via gaze tracking and let the user refine it with her ... nose. ;)
I was wondering if this sort of cursor control would be possible with, say, a Google Glass. The software is already able to take input in the form of winking for snapping pictures. So, couldn't you have a cursor overlayed on the Glass's screen, line it up with where you want it on your computer screen, and then wink to place it there?
I think this could be useful with Digital Signage. My client uses HTML to display content and is run in browser which makes it easy to design and deploy content. He says one of the challenges is getting seamless user interactivity with his browser based digital signage.
This coupled with web camera recognizing hand gestures e.g. http://hadi.io/gest.js/ could help grabbing user interaction inside his apps without too much hassle but right now its a janky solution compared to a real native kinect setup (I don't think its even possible to effectively connect to kinect in browser).
Well after I figured out that I could use clicks to calibrate the tracker it was somewhat accurate (ca. 100-200 px), however hte variance was quite large.
I think it's a really nice idea, however not yet precise enough to work for such tasks as user studies
100px gives you around 70 zones in your average desktop window. More, when you consider shared hits between zones. That's enough to do basic heat maps of a web site (such as for ad impression tracking), coarse navigation, or integration into things like editors/IDEs.
I was thinking about ad impressions as well; but God forbid websites are ever able to use my camera for eye tracking by bypassing the webcam permissions (or through the removal of in the future? I expect anything when it comes to content monetisation).
It sadly did not work for me either, even though I tried "calibrating" it with moving my mouse around the screen, but when i moved the mouse out of the window it was just as bad as before.
Open the game and follow the mouse with your eyes while you click each 8-directional edge of the game's viewport and then click the center.
Now you can let go of the mouse (don't need to move it out of the game) and the game should follow your gaze.
This demo is impressive. It's accurate enough for me to isolate the orange ball from the cluster.
On a website, the calibration could be done in a modal overlay. "Look at this dot while clicking it wherever it appears". After then Nth click, the modal goes away and the user lands on the website ready to gaze.
Nice work, well done.. one of the best one's we've seen !!!
I am one of the founders at xLabs and we too have been working on self calibrating real time interactive webcam eye tracking for a number of and have a number of demos and commercial products.
We'd love your feedback.. check out http://xlabsgaze.com/ (core tech) and our first commercial product https://eyesdecide.com (Design effectiveness and usabilty testing SaaS).
100px accuracy. I thought eye tracking could do better than that in general? At 100px accuracy, WebGazer is more useful to advertisers than it is to consumers, as I can't see any consumer-friendly applications that would be useful with that precision. If that's the case, adoption won't grow. Are there other comparable eye tracking libraries out now?
These would probably work
- switching focus between windows
- application selection a'la mission-control
Whereas these, no way at 100px precision or without external sensor (touch/mouse/etc) would this work at all nicely/without jitter.
- coarse navigation through a document via a sidebar
- real-time inventory management in a HTML5 game
- Selective zoom of embedded images
>Imagine what you could do with eye tracking within an IDE.
You provided really good cases for the rest, but for this I can't quite picture a need (other than maybe scrolling, which is arguably faster using vim movements already). Do you have any ideas?
Maybe it could be used for modal keybindings if your gaze changes modes e.g. focus between the panes of an IDE. hjkl moves the cursor when looking at the code, but it traverses the file hierarchy when looking at the sidebar, and enumerates tabs when looking at widgets.
Kind of like how keys are reused in Vim yet have similar meanings across each mode. You don't have to remember so many unique keybindings.
Some editors let you switch between panes with keybindings, but once you have more than a few or some in weird shapes, you end up needing tmux's `C-b q` pane selector. Seems like gaze could replace most of that.
There are a number of activities in an IDE which require keystrokes or mouse movement, or changing contexts. The capability to browse through code or documentation without having to change contexts would be valuable to me; and I can't be the only one able to finish typing a thought while my eyes move on to read other code.
Minor use cases, perhaps, but something that operates intuitively has the potential to be pretty awesome.
If you have an Apple device you don't need to disable it: webcam video streams (aka getUserMedia()) isn't supported on Safari. On other browsers you always need to give permission before the camera is switched on.
I was wondering if that was the problem for me, but I'm so near-sighted that if I were close enough to actually see the cursor, there's no way the camera could see my eyes to track them. :-(
In the demo, I had to click on the screen a couple of times to get the dot to start moving around. Then, it seemed to want to follow my mouse quite a bit as I moved and clicked.
I was able to greatly move the red dot when I say closer to the webcam, really opened my eyes, and moved left to right, but the action was flipped for me — looking left moved the dot right, and vice versa. (I'm using an Apple Cinema Display… I've had issues before where it automatically flips the screen around in some cases.)
Although not very accurate, it's certainly an interesting experiment.
The instructions were on the page. Get it looking at your face properly, and then look directly at the mouse cursor and click it, on spots around the page. Make sure you don't move your head, just your eyes.
It didn't work for me at all; the red dot never appeared on my screen, though the face-outline in the preview image did fit my face quite well (when it wasn't detecting my chin as my mouth).
Hi, one of the authors here. It was probably not very clear on our part, but it's meant to train the eye tracking model as naturally you use the website. So it takes advantage of your interactions over time, like if you're using Gmail.
So if you're doing the demo on the blank page, click in a couple of places around the screen (while looking there as you would normally), and you should start seeing the prediction.
I'm on a thunderbolt display (with integrated camera) sitting back about 3 feet. After 5 or so clicks against the corner regions, tracking became surprisingly accurate. I'm impressed!
Very interesting project. Was excited to try this but the quality of the tracking is not there yet. The face detection was off most of the time unless I positioned my camera just right. Then the eye tracking was no good. But they are only using a webcam and doing this right (see http://www.tobii.com/xperience/) requires multiple IR emitters and an IR camera.
Cool idea and opens up a lot of interesting applications. Unfortunately, the demos don't work reliably (on my machine the tracking was completely off the wall).
cool trick but we all know who will end up using this the most:
advertisers. another way for them to "gauge" their reach and brand appeal. or something.
literally i feel like every new thing that comes to the browser, a marketer at a company thinks, 'how can we exploit this to put more junk into the face of other people?'
i am not against advertising, i just think most of them have no limits in terms of what they'll exploit at the expense of their own users.
Well, advertisers couldn't use this without asking permission for a user's webcam, which would be a non-starter. I choose the less cynical path and say this could be tremendously useful for remote user testing.
Remote usability testing would be fantastic. This could be useful for local testing as well. Having a library or tool that could generically sit in front of a site to show where users look when taking action X sounds beneficial to me.
I can also imagine on a lighter note this being used for games where you look to where you want to move a character (maybe blink twice for confirmation).
https://webgazer.cs.brown.edu/collision.html
Click around the screen where you're currently looking to calibrate