Hacker News new | past | comments | ask | show | jobs | submit login
How Gmail’s Image Caching Affects Open Tracking (mailchimp.com)
149 points by superchink on Dec 13, 2013 | hide | past | favorite | 57 comments



How Gmail's Image Caching Affects Tracking, short form:

* Loading images is now enabled by default rather than disabled by default, meaning that a larger portion of emails will be tracked, because it's more likely tracking images will be loaded.

* Images are now loaded through a proxy, which means that all tracking images will no longer provide information like cookies, IP associated with the account, etc - the only information they'll provide is "this specific email was viewed by someone, somewhere."

There is still an option to disable loading images by default. Toggling that option still results in images being loaded through a proxy, so the second item above still applies.

As far as privacy goes, the potential level of privacy has increased (the proxy now allows you to load images if you desire without leaking IP etc.). The average level of privacy from the change is a mixed bag - more basic tracking (open tracking) will occur due to the change of default, but with the trade off that more advanced tracking (e.g. tracking IPs, setting cookies for correlation with non-email site visits a.k.a. remarketing) will no longer be possible.

There is no net change to how hard it is to verify whether an address is a valid GMail address - that's already possible by simply talking to a Google mail server.


> * Images are now loaded through a proxy, which means that all tracking images will no longer provide information like cookies, IP associated with the account, etc - the only information they'll provide is "this specific email was viewed by someone, somewhere."

No, if you know who you are mailing, you can add their ID to the link of your tracking image, so you know exactly who opened the mail.

<img src="www.tracking.com/image.gif?user_id&other_tracking_info" >


You've got no guarantee about when that image will be loaded by their proxy, making statistics next to useless.

Google could prefetch those images. Even if the initial tests seem to indicate against this, they can change this on a whim, or they could prefetch with a delay, or they could prefetch only a percentage of those images, depending on ever-changing heuristics. The only useful info advertisers would get from this is that the email was sent to a Gmail account.

Even if they don't prefetch, they can detect duplicates, especially from links coming from domains known to generate tracking pixels and there's nobody else that could do this better than Google. They can also get rid of images that aren't visible to the user (e.g. transparent or white or light gray pixels, or images that are too small to be a part of the content). Tricks like generating images with unique content only work for images with actual content to show.

And it significantly raises the cost of email campaigns too, just as with spam. I don't have spam hitting my Inbox these days and that's not because spam has become impossible. Light spam (e.g. promotions from companies you've got a relationship with) have been moved in the Promotions tab. And I can't remember the last time I've seen real spam hitting my Inbox.

All in all, I'm happy that they are introducing this feature. It's better for regular folks or for me - you know, the kind of people that always click Show Images, because promotional messages are hard to read otherwise (on purpose).

I do hope they provide the option to turn it off. Google is pretty bad at providing choices these days.


If you are sending a marketing email to a person, an easy image that has unique data is their name. :) And I didn't even think hard on this. Pretty much any content that could have been easily done as text is easily done as a "dynamic" image.

Now, they can detect duplicates, but only after they have gotten the contents. Unless I am mistaken on anything. (highly possible.)


Of course, but it's a whac-a-mole game.

Such instances can be detected (e.g. if you see 100 emails with the same HTML, but with image URLs that are slightly different). Google can then prefetch those images or it can re-enable the optin for displaying just for those emails.

If I were to do email tracking, I would just filter the GMail accounts out of the statistics, because you can't be sure of when GMail's proxy loads those images and what you're interested in is the conversion rate (not in the total number of people that opened their emails, you only care about totals for emails sent and clicks). But a service like MailChimp is not interested in doing this, because MailChimp is a third-party that's interested in showing big numbers to their customers.

And putting these numbers aside, the privacy issues related to IP tracking, or the security issues are gone. So I think this is good.


Certainly whac-a-mole. Didn't mean to imply otherwise.

For myself, no need to filter them out without evidence. Keep a few controlled accounts to periodically try and see what the delay is. And get extra suspicious if all images are opened at once on a mass send out.


There is an option to turn it off.


"Who" here is referring to things like IP data - e.g. a mapping between email address and actual physical entity. Yes, you can know that an email sent to a certain email address was opened (hence the "this specific email" - associated with an email address). No, you can't know that the email address foo@bar.com is associated with someone located in Scottsdale, AZ who subscribes to Comcast.


> As far as privacy goes, the potential level of privacy has increased (the proxy now allows you to load images if you desire without leaking IP etc.). The average level of privacy from the change is a mixed bag - more basic tracking (open tracking) will occur due to the change of default, but with the trade off that more advanced tracking (e.g. tracking IPs, setting cookies for correlation with non-email site visits a.k.a. remarketing) will no longer be possible.

> There is no net change to how hard it is to verify whether an address is a valid GMail address - that's already possible by simply talking to a Google mail server.

Thank you.

I read through an unhealthy number of comments on the various threads about this (my train was delayed, and I'd finished the book I brought). Very few people actually seemed to get the key takeaways (your last two paragraphs) correct.


I think the problem is that Ars Technica was the first big site to cover it and they did a pretty poor job. The headline was "Gmail blows up e-mail marketing...", which is pure hyperbole and confused a lot of people (myself included) about what was going on.

Here's a nice technical blog post from someone who actually knows what they're talking about: emailexpert.org/gmail-breaks-email-marketing-again/


There's an interesting tradeoff -- marketers no longer know your location, browser, client or how many times you've opened the email. However, any spammer now instantly knows that the email address is valid.

It's probably an improvement, but not all the way there. For actual privacy, what GMail needs to do (and I realize this is slightly unfeasible due to the amount of email they receive) is instantly open and cache every single email to every single email address (including non-existent addresses).


If you want to know if an email address is valid, you connect to gmail's server and send "RCPT TO:<example@gmail.com>" and they will tell you if it's valid or not.


Are you sure about this? Just tested it out:

    openssl s_client -connect  smtp.gmail.com:465 -crlf

    220 mx.google.com ESMTP u17sm2709629qeb.4 - gsmtp
    helo
    250 mx.google.com at your service
    auth login
    334 VXNlcm5hbWU6
    < BASE_64 USERNAME> 
    334 UGFzc3dvcmQ6
    < BASE_64 PASSWORD> 
    235 2.7.0 Accepted
    MAIL FROM: <my_email>
    250 2.1.0 OK u17sm2709629qeb.4 - gsmtp
    rcpt to: <my_email>
    250 2.1.5 OK u17sm2709629qeb.4 - gsmtp
    rcpt to: <emaildne39g39jd9j9jfsdk@gmail.com>
    250 2.1.5 OK u17sm2709629qeb.4 - gsmtp

I get an OK with BS emails too...


You should try telnet to port 25 and don't login. If you're sending what could be an outgoing email, it is more likely to queue it.

    MAIL FROM:<tedu@tedunangst.com>
    250 2.1.0 OK g15si484689qej.92 - gsmtp
    RCPT TO:<tedunangst1233141@gmail.com>
    550-5.1.1 The email account that you tried to reach does not exist. Please try
    550-5.1.1 double-checking the recipient's email address for typos or
    550-5.1.1 unnecessary spaces. Learn more at
    550 5.1.1 http://support.google.com/mail/bin/answer.py?answer=6596 g15si484689qej.92 - gsmtp


Cool, thanks!


That will only tell you if the account exists. The request for an image tells you more: that that account is actually used.


I'm not sure spammers particularly care if they have a lot of email addresses that people don't really read. Google was already bouncing messages to invalid recipients, so completely nonexistent email addresses were already filtered.

Granted, it's one step further to know if you're getting past the spam filter. But I feel like that's testable using your own accounts.


> There's an interesting tradeoff -- marketers no longer know your location, browser, or how many times you've opened the email. However, any spammer instantly knows that the email address is valid.

Before, however, if you never clicked on "show images" then spammers knew nothing at all.


Yes, that's the tradeoff.


As far as I can tell, Gmail is not returning open data until the email is actually opened. Gmail is pretty good about identifying spam and putting it into recipients spam folders which is usually not opened.


> However, any spammer now instantly knows that the email address is valid.

That information is worth orders of magnitude less than it used to be, especially for Google hosted email as their spam protection is near-perfect. My email address is available in the clear in a number of archived mailing lists among other places, and I'm not getting any spam at all.


Google might only be turning it on for email that's trusted. They could be leaving images turned off for messages in the spam folder.


The blog post is optimistic, but it will have an effect e.g. https://mailchimp.com/assets/images/features/main_segmented.... Suddenly they don't know the location, OS, browser and referrer. They only know the refined open rate. Good.


They know your location, OS, browser from when you signed up for the list in the first place. You did that online if we're talking about e-mail marketing and not some developer listserv. MailChimp in particular does record geo-ip information at signup, so its features based on recipient location should still work the same.


I assume this means that google is caching images only on demand, when the user opens the email first and only then. Otherwise there is no way for them to track the first open.


yeah i was wondering that as well. if they tested 'around the office' as the post seems to indicate they might have missed the effect that for every extra person that opens an email the picture is coming from google's cache and thus the open is not counted.


> for every extra person that opens an email the picture is coming from google's cache and thus the open is not counted

Each e-mail has a unique image URL; Google has to make a request for each individual mail even if they're caching images.


ok makes sense. a file hash might fix that and save google some bandwidth but i imagine they do not want to break tracking completely for now...


To hash the file you first have to download it. Downloading it is the tracking action, not showing it. Deduplication does not affect open tracking pixels.


How does this help user privacy? Currently your open action is not not tracked via images, since they're disabled by default. With this, anyone can find out when you displayed their email. That seems rather crappy.

Nothing stops Gmail from doing the loading on their side (to hide UA, IP, etc.) but only when you ask for it.

What's Google's motivation for this? Do they do emails that need to be tracked? Are they doing this for themselves to avoid having to special-case their own emails?

Wouldn't it be better to work on a standardized way to embed images in email, so that recipients can get nicely-rendered emails without exposing themselves to action tracking?


> What's Google's motivation for this?

Possibly, to get you to use more of their advertising services, which is in-line with lots of their recent changes (like removing all organic shopping results in favor of paid listings, and adding e-mail ads to Gmail's promotion tab).

Prior to this change, a dozen or so of Google's competitors offered e-mail remarketing. That's where you insert an image into your marketing mails to set a cookie in the recipient's browser when they open the mail, then you can later advertise to that person across the web. For example, you could show banners on the NY Times site advertising your Black Friday sales only to people that opened your Black Friday sale preview e-mail.

With Google proxying images, regardless of cache/deduplication policy, only Google can sell e-mail remarketing to Gmail users now. They already sell web remarketing through AdWords/DoubleClick. That means companies advertising with Google's competitors will have to move money to Google.


> Wouldn't it be better to work on a standardized way to embed images in email

We have that already, it's part of MIME. The external images are used only for tracking, potential bandwidth savings, and incompetence.


At least for me, Gmail's image settings have never worked reliably. Maybe they just capitulated?


They work fine for me. There's two types of images, embedded images and external images. Embedded ones always got shown, and it's the external ones that are the issue.


Really? Image loading should be pretty easy to get right.


am I the only one that just have thunderbird on plain text mode with gmail and never display images?


This actually brings up a good point. Does this affect IMAP users or any users that not using an Google Gmail App? (iOS, Web, Android, etc) My guess is no?


Unless Google rewrites the mails I'd say that IMAP/POP users shouldn't be affected by this.


I disabled images in emails too. It never is something interesting and almost always it's some kind of marketing/advertising nonsense.

Oh, and I don't use the web frontend to gmail as it really got confusing. Where is my inbox that shows all the mails I got and not only some categories?


It's quite simple to reset gmail to it's normal, older inbox. Hover your mouse over the word 'Inbox' in the list on the left, hit the down arrow, and you get a list of different inbox styles to try.


Plain text is great for ASCII. But try reading a plain text email in a right-to-left language with English words (like technical terms, abbreviations, etc) between sentences. I usually have to copy-and-paste it into Word and do some formatting before it becomes comprehensible at all.


I like HTML and being able to emphasize points in my email. I also like being able to like bullet-lists. Apart from that, I keep things clean.

And I've dropped gmail for fastmail since gmail stopped being good a few years back.

So yeah. You may be an outlier, but there are lots of outliers out there :)


Q: If I set 'display images' to 'off', will Google still retrieve images when I open my email?

If so, then anyone can include an invisible image and always know when I open the email.. whereas before they had no way of doing this.


Why couldn't google show images by default before turning on the cache? I assume it's a security issue, but would be interested to hear the reason in detail.


Among other reasons, imagine me sending support@example.com an email with <img src="http://localhost:3000/carefully-constructed-url" /> if I knew Example.com was a Rails shop in January 2013. That could have been oodles of fun. localhost:3000 is one of the many, many examples of things that could be put there. Other examples include probing for internal redmine instances, attempting to compromise dev/staging servers which are firewalled from outside traffic, etc etc.

This is not a risk if Google proxies the image -- they'll proxy a 404, because Gmail's servers don't have privileged, cookied access to apps on your internal network, dev boxes, etc.


Good point about an outsider potentially poking at internal.corporate.com. Though you could only trick support@example.com into making a GET request in this manner, right? Which ideally doesn't change data, but obviously exposes bigger attack area for vulnerabilities like the rails one.


Rails in January 2013 was mentioned specifically because a series of security bugs allowed attackers to achieve remote code execution with specially crafted URL parameters.

Thus someone could get a remote shell on your box running as the rails account, not just access to an internal application.


We sent a newsletter out today with Mailchimp and didn't notice any difference in opens - I have image display off by default but I think most people don't care, or are using an mobile device and quite happy to see the images. I personally think making it harder to track opens is a good thing. Like Mailchimp says, make your content worth viewing.


There's a nice technical explanation of what's going on here: http://emailexpert.org/gmail-breaks-email-marketing-again/


Wonder if anyone's tested setting no-cache/expires headers on the beacon. I'd like to expect google honoring the cache headers because its a good citizen of the web.


I was waiting to hear from you guys and the answer was pretty much what I expected. Thanks for the update, Mailchimp is such a great service.


Well, Google never can enable pre-caching of images - simple reason: it would allow an instant check if the email address is valid. Just send a mail and wait a bit - and you'd know if the mail was valid.


You know how long it takes gmail servers to respond to an invalid recipient with a 550? About two seconds.


2 seconds? Not true at all, it's just tens of milliseconds. Didn't time it exactly, but it seems 'instant', so it's less than 100 ms.


Maybe it hiccuped? It seemed instant for a valid address, but a little slower for invalid. I assumed it keeps used addresses in cache but had to go digging to confirm a negative, but it could be a fluke. Anyway, it's faster than waiting for the image proxy.


> it would allow an instant check if the email address is valid.

What's bad about that?

Not to mention you can do it anyway - just try to send an email to an address and see if they accept it, if they do then send the server a reset command so it discards the email.


Not if Google also requested images from invalid addresses. But usually you'll get a failure in the SMTP transaction if the email address is not valid. Otherwise a simple typo would send your important email into a black hole.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: