Hacker News new | past | comments | ask | show | jobs | submit login
HTTP GZIP Compression remote date and time leak (jcarlosnorte.com)
118 points by LukasReschke on Feb 22, 2016 | hide | past | favorite | 34 comments




I always always always write code to just use the numeric time in seconds or milliseconds since the epoch. Time is only converted into messy Gregorian cruft to display to the user. Anything else is begging for crazy bugs.

In a SQL database a time field should be an unsigned bigint.


While that works it'd be a real annoyance to deal with when doing a lot of the debugging queries I wind up writing for work doing batch ETL. As far as I have found there's no built in function that would take that epoch timestamp and convert it into a usable date that could then be truncated for daily binning etc.


The script just seems to assume whichever date is in the mtime field is a local time.


Exactly. So, if every server (or a significant fraction) reports in UTC, the heuristic is no longer reliable.


Which could already be the case, spec-wise the gzip timestamp is supposed to be POSIX time, which is in UTC and provides no timezone information.


It would appear 2 of the 3 examples he gives at the end of the article are returning UTC. Only Bing returns Pacific Time.


Came here to say exactly this!


Basically it comes down to this. Many webservers use gzip to compress data. Gzip creates a header which has a date field in it. Most webservers fill this date field with zeros, about 10% use the actual date, including the timezone, which can reveal the location of the server.

Tor is not to blame here, and gzip or most webservers probably neither. It's an unforseen side effect by a combination of tools.

The article provides a script to test if a server reveals the date and timezone or not.


Those servers are out of spec, then. The RFC says it's supposed to be POSIX time, i.e., always in UTC. https://tools.ietf.org/html/rfc1952


Perhaps those servers are running Windows, where the time is local time and the server doesn't want to spend the effort to do the conversion?


The spec predicted in 1996 that this would happen: "(Note that this may cause problems for MS-DOS and other systems that use local rather than Universal time.)"


Are you seriously suggesting Windows doesn't know the concept of UTC time? And secondly suggesting that fiddling with timezone offsets is an expensive conversion?


the system clock on windows is not UTC, it is local time. On Unix the clock is always UTC and is converted to local time for display purposes.


That is the same as Windows. Where are you getting your information?


No, there was a time when Windows stored local time -- also synced to BIOS clock. I don't think that's true anymore, though.


it remains true in windows 10, although it can be changed via some registry fudging.


> Most webservers fill this date field with zeros, about 10% use the actual date, including the timezone

They can't include the timezone since the field is a unix timestamp, there's no room for the timezone.

Apparently the script just assumes whichever date is stored in the mtime field is a local date (when the spec suggests UTC/GMT): https://github.com/jcarlosn/gzip-http-time/blob/master/time....


I think you can blame gzip. Tools shouldn't insert unnecessary metadata into files. At least not without asking or warning. What does putting a timestamp into gzip add? Or the operating system, for that matter?


> What does putting a timestamp into gzip add?

Roundtripping the file's mtime through compression/decompression is convenient. Same reason why the original filename can be stored (roundtrip it in case the filename is truncated at one point e.g. by having the gzip archive move through an msdos system). Here's how the spec defines it:

    MTIME (Modification TIME)
            This gives the most recent modification time of the original
            file being compressed.
Sadly the spec then goes on to recommend leaking the compression date:

            The time is in Unix format, i.e.,
            seconds since 00:00:00 GMT, Jan.  1, 1970.  (Note that this
            may cause problems for MS-DOS and other systems that use
            local rather than Universal time.)  If the compressed data
            did not come from a file, MTIME is set to the time at which
            compression started.  MTIME = 0 means no time stamp is
            available.
However note that the spec recommends a unix timestamp, which is ~UTC, and doesn't include the space for a timezone. Reading the POC[0] it apparently assumes any non-zero mtime is immediate (ignoring cached assets) and local.

[0] https://github.com/jcarlosn/gzip-http-time/blob/master/time....


This problem bit me in the butt when I was trying to setup my blog on S3. On every sync, all the pages, which were gzipped on my machine (Cloudfront doesn't do compression on the fly) would upload again. Then I discovered `gzip -n` which avoids storing the original filename and timestamp.


Just as a heads up Cloudfront does compress on its own if you set it to. It's super helpful when you have less control over the origin.

http://docs.aws.amazon.com/AmazonCloudFront/latest/Developer...


Nice! Back when I started using it, it didn't have that (or I didn't look carefully enough).


It's a fairly recent addition.


tl;dr for comments:

Article seems to fail to notice that gzip timestamps are specced to be timezone-independent, and two of the three example domains at the end of the article adhere to that spec. We don't know what happens to the article's "10%" number after servers that adhere to the spec are removed from the data. For all we know bing.com could be the only service that leaks this info.


This is where this date&time retrieval mechanism sort of breaks:

1) Timezone of server is set to one other than the geographical location's

2) Gzip-compressed page or asset is cached. Therefore the embedded date/time is from an unknown past.

So this reduces the good guesses a bit.


This is interesting, I wonder if there are more related issues. There is a lot of stuff that embeds timestamps.

The first I could think of is the TLS handshake, but it's GMT and several implementations don't set it any more to a valid value (they use a random value instead). Various image formats use timestamps, they seem to be timezone unaware.

Probably something to look into, and probably a good idea to question whether we need timestamps everywhere. If they don't serve a useful purpose better skip them. (They also happen to be a major issue with things like reproducible builds.)


Somewhat related, JPEG files have the potential to leak a whole lot of information (through EXIF metadata) if put online straight of a camera/phone. Things like time but also camera model and even precise GPS location. That's why I always use `jpeghead -purejpg` to strip that info.


A few years ago, the FBI was able to catch a hacker by examining the EXIF metadata in an image posted by his girlfriend: http://gizmodo.com/5901430/these-breasts-nailed-anonymous-ha...


The concerns are valid but it is ironic that's being served from an http only page.

(Why does GZIP needs a compression date again?)


AFAIK it's not the compression date but instead the last modified date stored in the filesystem metadata. It's likely there such that `gzip file; gunzip file.gz` keeps the mtime information intact.


The problem's it's any date so which date is stored depends on the library and the datasource. If you're gzipping and caching dynamic content there's no filesystem mtime, so you'll probably get the compression time instead (unless the remote end or library defaults to 0)


That's why you always use deflate.


When is this actualy relevant?

It really shouldn't be in the context of hidden services, considering your setup has to be seriously flawed for the webserver to be able to divulge any information about it's physical location.

Perhaps it'd help with attacking some RNGs, but beyond that it's really just a novelty.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: