Hacker News new | past | comments | ask | show | jobs | submit login
Don't Get Burned By Heatmaps (gazehawk.com)
87 points by bkrausz on June 14, 2011 | hide | past | favorite | 23 comments



After reviewing the individual heatmaps, however, we get the impression that the site might have a bit too much going on

Normal A/B split tests have a similar issue. Let's say you have a sales page and you're testing colors. Red buy button, a blue button, a green button. And let's say blue wins overall with a high level of confidence.

Without drilling down (and most systems I've worked with are bad at this or can't do it at all) you can't tell if certain types of user actually preferred other scenarios. For example, visitors from east Asia might convert WAY better for the red button (red being a lucky color in China). And visitors coming from certain sites might always convert better on the green.

Problem is, your split test results show the overall results and so you optimize with the blue.. when your A/B tool could have analyzed the data and suggested segments for you. Does this tool exist (without getting users to guess segments up front)? If not, there's a ton of money being left on the table.


Actually, this is a problem with most entry level tools. Professional level tools like Omniture Test and Target or Optimost with Autonomy Segments handle MVT testing per segment, segment defined by external variables (appended data or data from Dataxu et al), business variables (return customer, prospect, etc.), or just online variables (IP address Geo, repeat vs. new visitor, etc.)

Professional level, btw, is defined as "costs a lot" and "are painful to use" and "have poor interfaces"; more recent tools give up some of the power and cost for ease of use.

However, as you point out, these easier MVT tools (or Split-Test tools) focus on showing the "winner" or "winning variables", winner defined as "most impactful on the total traffic or sample of this traffic during this time". You are right that this is not the final answer... but it's a good starting place; if you solve for the majority of users, it's usually the biggest step. Then solving for unique smaller groups can often result in lower volume but higher value conversions... if you have time.

That being said, including segment variables in the analysis is a wonderful thing to do, and I encourage it. I look forward to when Visual Website Optimizer and other "entry level" tools include these in their analyses by default... and we can look to the end of Optimost, Omniture TnT, and the other "enterprise" tools.


Seems to me like you are talking about Cohort Analysis: http://www.avc.com/a_vc/2009/10/the-cohort-analysis.html

But you are right, a tool for this would be nice.


Very cool stuff, Brian! It'd be interesting to try to cluster users based on what bits of the page they look at (women 65+ look at navigation breadcrumbs significantly more than average, say).


GazeHawk intern here -- I wrote the post. Our post next week is going to be focused more on using different clustering metrics as a way of making the data easier to understand.

We definitely have plans for running some big studies where we can do demographic comparisons, since that sort of information helps us bring a lot of value to our customers, but it might not be for a few weeks.


Seems like you need a heatmap for the variance as well as the mean.


That's an interesting idea... I'll see if anything comes of it.


This was my first thought also (although 7 hours later!), it may be interesting to look at skew as well as variance (kurtosis is probably going too far), the difference between diffuse normally distribution and split peaks...

It may even be interesting to do some kind of k-means, would be awesome to see that correlate with demographic...


To ensure a heatmap does not “eliminate the element of time”, the heatmap could show later-looked-at spots with more desaturated colors.

Of course, this has the same problem as the main point – that you can’t tell whether a spot is medium-saturated because everyone looks at it in the middle of browsing or because some people see it at the beginning and others see it only at the end. Still, this problem, like the article’s, could be fixed by moultano’s suggestion of making a heatmap for the variance as well as the value.

Or you could sacrifice the display of time, and display variance as saturation – less-variant, more-sure spots would have more saturated colors. This would be easier to read than two separate heatmaps for value and variance.


During my final year project for my BSc, I tried to address the problem of losing the temporal data from eye tracking in heatmaps with an accelerated replay, heatmap animation. Here's an example:

http://www.youtube.com/watch?v=L319pLmzHVc&feature=chann...

This video shows a selection of sittings from radiologists reporting on chest xrays.

If I recall correctly it is sped up approximately 5 x realtime.


Nice visualization! We have something similar that we generate:

https://s3.amazonaws.com/gazehawk-public/example_track.mp4

Definitely a step up from heatmaps, though I think there's a lot to be said for providing data as a static image rather than a heatmap, especially when they are being added to a powerpoint or PDF.


Though I'm pretty optimistic about the possibility for algorithmic user interface design (or at least quantitative evaluation), I've yet to see any conclusive studies on the matter. Anybody got any hard info on this?


I think it's really difficult to rigorously evaluate that sort of thing. Different websites serve different purposes and need different designs.


Yes, I suppose different sites will need slightly different fitness functions but I'm confident in the future of automated design.


Using the example of GazeHawk, it looks like they claim to work using your system's built-in webcam. Being interested in eye tracking for non-advertising purposes, I did a few calculations, and I fail to see how this is possible at typical monitor distance and typical webcam resolutions.

Does anyone have any insight as to how this is done? Currently, I'm highly skeptical of that these heat maps are the least bit accurate.


What calculations did you do?

As cofounder of GazeHawk, I've written on different aspects of this topic previously [1, 2]. Is that information helpful / can you elaborate on your skepticism?

[1] http://www.gazehawk.com/blog/on-accuracy/

[2] http://www.quora.com/GazeHawk/What-broad-computer-vision-tec...


Thanks for the links, they were very interesting.

I did basic trigonometry, and came up with an estimate of about 50 pixels accuracy using a high-res, 3rd party webcam. That's why I doubt claims about built-in webcams, since they're typically pretty low resolution.

Your first link mentions an accuracy of around 70 pixels on a MacBook Pro, which is impressive but doesn't strike me as impossible (assuming FaceTime HD camera, which is 1280x720, I believe).


While the resolution of the webcam is (obviously) important when discerning accuracy, I feel like you may be conflating two terms here. Specifically, going from the resolution of the webcam to an estimate of so many pixels of accuracy using basic trigonometry will necessarily depend on the method you're using to convert the webcam input into eye-tracking data.

The <70 pixel figure for GazeHawk's accuracy is based on testing against real, labeled training data. That is the distance, on the screen, by which our calculated gazepoint differs from the true location at which the user's gaze was directed. It is only loosely correlated with webcam resolution, in that a higher webcam resolution corresponds to a larger pipeline - more input pixels being dumped into the eye tracking algorithm. I could be wrong, but it sounds like you're discussing the size of the eyes in the input image.

Also, at this point a discussion of accuracy vs. precision becomes germane. The use of higher resolution video as an input can often impact one but not the other.


I feel like you may be conflating two terms here.

Probably. :-)

I could be wrong, but it sounds like you're discussing the size of the eyes in the input image.

I believe I am. I assume that increased pixel count in the eye region corresponds directly with increased accuracy. This could be accomplished by either moving the camera closer to the eye, or by using a higher resolution camera.


Modern eyetracking solutions can be quite accurate. I've used SR Research's Eyelink II [1] and have seen results from Tobii [2] eye trackers that are impressive too, considering their passive nature.

[1] http://www.sr-research.com/EL_II.html

[2] http://www.tobii.com/en/analysis-and-research/global/product...


The SR Research I can totally understand, since it's head-mounted, proprietary hardware. The Tobii is a bit more impressive, but I think it's using IR emitters to light up the pupil, and I assume it's using their own camera. Very impressive monitoring rates, though.

My doubt concerns claims to track eye movements reliably using a built-in webcam, and available light.


Pardon my ignorance, but I find your efforts quite insignificant. If you were to ask me where most people looked on that example, I could accurately tell you what got the most looks first, second, third etc, without ever seeing your heatmap. Do I really need a third party to tell me people like boobs more than Ron Paul?

I'd like to add that the length of eye contact is irrelevant especially to a metric that is more valuable, interpretation. Let's say that we can determine (or even narrow down) users interpretation of content and heatmap the relevance to their visit. If we could take a pro-active stance we could predict future visits and adjust the content accordingly, not be reactionary and simply say (after the user is gone) that people looked at boobs more than Ron Paul.

Just my two cents, sorry if I was a dick.


Did you even read the article?

The aggregate confirms what you're saying - that people will look down the middle at all the half-naked people.

But look at the individual heatmaps, it seems like a not-insignificant number of people followed more text than images, some drifted all over the place, etc etc.

That's the entire point of the post I think - pointing out that aggregates can be deceiving when the distribution is not even close to uniform.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: