The DNS prefetching was exactly what I was thinking of. With GIS, if my site shows up in Google (or elsewhere), I can now track exactly how many people see my site, and from where, without them ever clicking on a link.
You are correct in a general sense but it seems you're stretching the
truth a little bit. Since you only get three octets from GIS, it is
impossible to know "exactly" how many people visit your site. As for the
"from where," you could get a rough idea of location through GeoIP on
three octets, but the result would be generalized. The generalized data
would still be useful, but it would be lacking in resolution and
reliability compared to GeoIP on the full four octet IP address.
The approaching exhaustion of IPv4 in the coming years, and how, in
practice, it is handled could make a real mess of GIS. If your ISP
starts handing out IPv4 addresses in the private address space to
customers and does transparent PNAT, then GIS breaks badly for all
customers of said ISP. In the case of large ISPs, GIS could actually
make things slower.
The part I have no clue about is how GIS works with IPv6? I haven't read
the IETF draft, so I'll just shut up and hope someone more knowledgeable
chimes in here.
> The generalized data would still be useful, but it would be lacking in resolution and reliability compared to GeoIP on the full four octet IP address.
How is that? ARIN doesn't delegate IP address space to users or ISPs in smaller segments than 1024K IP addresses. So it seems that 3 octets is enough to map to a physical location. How does the additional octet give you additional geolocational abilities?
Good question. The answer is in understanding the details. The RIR's
(Regional Internet Registry - ARIN, RIPE, APNIC, ...) do allocate large
blocks as you state, but those large blocks are divided into subnets.
When you realize the subnets have routers and routers often provide
their GPS coordinates, you can see how Geolocation can become more
accurate with more address bits. That's just one of the ways. Another
way would be the subnet assignments often being public and location of
the company/organization with said assignment having a known location.
Still another approach is the looking up locations based on AS/ASN. And
yet another is GPS reporting (think mobile android/ios). There are
probably other ways that I don't know. The important part to realize is
how all of the various methods are both employed and combined to build
out geolocation databases. Geolocation by IP is far from perfect, but
often it can be surprisingly accurate.
Even with the full IP address, GeoIP won't necessarily tell you much. My location has been reported as York, Cambridge, City of London and the Netherlands, all of which are 100+ miles away from my actual location.
I'm a network neophyte, so go slow, but can you explain how?
Is it this?
"Basically, when your browser makes a DNS request, the DNS server will now forward the first three octets (123.45.67) of your IP address to the target web service."
So say you search for something on google; google returns its search results page, your browser gets the page, looks at all the links, asks DNS for the IP's to all those links' addresses, and DNS auto-sends YOUR (truncated) IP to all those addresses' servers?
I guess I'm unclear on why it would do that. If the truncated IP coming to a CDN isn't coming with an actual request, how do they know that at some time later your actual request is from your truncated IP? (I also don't understand why a CDN would use some sort of DNS address as a geolocation strategy, but I guess that's another discussion.)
> So say you search for something on google; google returns its search results page, your browser gets the page, looks at all the links, asks DNS for the IP's to all those links' addresses, and DNS auto-sends YOUR (truncated) IP to all those addresses' servers?
Yes.
> I guess I'm unclear on why it would do that.
The DNS prefetching done by the browser exists to save your time.
Instead of waiting to do a DNS lookup until you click on a link in the
current page, the browser does DNS lookups on all links in the page as
soon as the page is loaded. By the time you're done deciding which link
to follow, the browser is already done with the initial step required to
follow any link on the page.
> If the truncated IP coming to a CDN isn't coming with an actual request, how do they know that at some time later your actual request is from your truncated IP? (I also don't understand why a CDN would use some sort of DNS address as a geolocation strategy, but I guess that's another discussion.)
You seem to have misread the description.
A CDN is a group of multiple servers and all of them could, in theory,
respond to your request for a specific web page. The servers in the
group are spread out all over the globe, but all of them share the same
domain name. When you look up the IP address of the shared domain name,
this new GIS draft sends your truncated IP address to the DNS server of
the CDN so it can choose the server in the group that is "closest" to
you.
> The DNS prefetching done by the browser exists to save your time. Instead of waiting to do a DNS lookup until you click on a link in the current page, the browser does DNS lookups on all links in the page as soon as the page is loaded. By the time you're done deciding which link to follow, the browser is already done with the initial step required to follow any link on the page.
My apologies; I was unclear. I (think I) get the DNS prefetching idea (your browser asks DNS for all the IP's on a page in the hope that one will be hit, and it won't have to spend time to do it later when a link is actually clicked), but why would DNS send anything to the site that it's getting an address for? (And under what protocol?)
When my browser asks DNS for an IP for "www.foo.com", why does "www.foo.com" need to know I asked for it?
I think the phrase "target web service" from the article is misleading. This is about passing part of the client's IP address to the authoritative nameserver for the for the target web service. From my understanding the following is an example- Let's say I'm on the east coast, I'm using Google DNS on the west coast as my DNS server, and I want to load foo.edgecast.com. foo.edgecast.com has two servers, one on the east coast and one on the west coast. When I perform a lookup for foo.edgecast.com I talk to Google's recursive resolver which then talks to Edgecast's authoritative nameserver. Without EDNS edgecast doesn't get any information about me; it just knows the request came from Google on the west coast, so it gives out the IP address of their west coast server. With EDNS, Edgecast's gets enough of my IP address from Google to know that I'm on the east coast, so it gives out their east coast server's IP address.
AAaaahhhh... THIS makes sense. SO it's not the actual web server that's getting my truncated IP, it's the web server's provider's NAMESERVER. So if I'm hosting a website, but not its nameserver (say I'm using godaddy or whathaveyou for that), only godaddy's nameserver would get the truncated IP if my site shows up on google's search page; not my actual web server.
RequestPolicy users can disable Link prefetching and DNS prefetching from the Advanced tab in the RequestPolicy preferences. Everyone else can search for "prefetch" in about:config and do it from there.
Also, Prefetching is disabled by default if the page containing the link is opened over HTTPS.