Hacker News new | past | comments | ask | show | jobs | submit login
The result of pinging all the Internet IP addresses (securityartwork.es)
162 points by Nyr on Feb 7, 2013 | hide | past | favorite | 53 comments



HD Moore (creator of metasploit) had a talk about doing something like this. He scanned all of IPv4 hitting common TCP and UDP ports, and collected data about the services running.

He found a lot of cool stuff. For instance, there are apparently 7 Windows NT 3.5.1 boxes with SNMP open sitting on the internet. And about 300 IIS servers that give out the same session ID cookie when you log on.

The visualizations are also really nice. The talk is here: http://www.irongeek.com/i.php?page=videos/derbycon2/1-1-2-hd...


That's actually quite interesting! Thanks for the link, it's always very awesome to hear speeches by the people "behind the magic."

Also, I thought this quote on the page you linked was pretty humorous: "[HD Moore] is the Chief Architect of Metasploit, a popular video game designed to simulate network attacks against a fictitious globally connected network dubbed “the internet”.


"The increased complexity came from the disk storage resources; "

It seems they just stored if they got a response or not.

4 billion IPs, and if all you need to store is if you received a response, that's only half a gig of RAM. mmap and call it a day?


They recorded the whole response. "...and the second one simply writes down the response packets received." They also randomized the order in which they scanned so they might have needed to keep track of already scanned addresses. "Although we have sent just a single packet per IP, we messed the scans to prevent a network receiving a high number of consecutive packets."


You can use a LFSR[0] to efficiently (it is deterministic, minimal state) convert a sequential power of two range [0, 2^n-1] into one that appears to "randomly" walk around.

[0]: http://en.wikipedia.org/wiki/Linear_feedback_shift_register


Or just reverse the bits in the address.


Yeah, I could think of a handful of ways to do this easily without as much resources as stated...


This reminds me of a research project that was done at my university a few years back[1], except they were specifically scanning for web servers. Out of the 3,690,921,984 addresses they scanned, 18,560,257 had web servers running on port 80[2].

[1]: http://cs.acadiau.ca/~dbenoit/research/webcensus/Web_Census/...

[2]: http://cs.acadiau.ca/~dbenoit/research/webcensus/Web_Census/...


Based on Shodan there are at least 87,494,410 web servers [1] on port 80 now. For port 443 there are currently 14,918,407 servers listed [2], and for port 8080 there are 7,570,586 [3].

[1]: http://www.shodanhq.com/search?q=port%3A80

[2]: http://www.shodanhq.com/search?q=port%3A443

[3]:http://www.shodanhq.com/search?q=port%3A8080


It would be interesting to see a 255x255 box heatmap. No real benefit other than enjoyment ;)



Awesome!! Thanks!


It actually wouldn't be too crazy to do a complete bitmap with one of those gigapixel panning/zooming tools. It's only 65k*65k 1bit pixels or 500MB.


Should that be a 256x256 box heatmap?


Indeed it should.


Sorted by a hilbert curve!


Here's a piece I did a while back using a similar technique:

http://www.myspace.com/kruhft/photos/19626037#%7B%22ImageId%...

Myspace lowres'd it and I can't find the original, but if there's any interest I can keep looking. I did a ping 'scatter' scan of a random set of IP addresses and then mapped them into 2d space with the boxes representing the ping times. I thought it looked cool in the end :)


Absolutely beautiful... really nice stuff. Would love to have something like this of our (companies) network, as present for our it-staff. ;-)


It's beautiful - it would make a great poster!


This reminds me, gently, of two things.

1) The Opte Project (http://www.opte.org/maps/)

2) A hoax torrent download "HACKER TOOL EVERY IP ADDRESS EVER" (http://imgur.com/7CMCceQ)


I think "http://home.comcast.net/~suprtwo/ might be the funniest part of that image. I'd forgotten just how amazingly cobbled together suprnova was.


As someone who works with this, I would like to know how they can be sure their results are reliable? Just starting a sender and receiver thread simply wont do. At that rate congestion happens, and we'll start seeing packet loss. With a stateless approach, the only thing you can do to prevent this is arbitrarily slowing down the rate of packets being sent. Using that approach, it is going to take way longer than ten hours if you scan from one location.

It works perfectly well as long as your results are not used for anything important I guess. But if you have customers who needs reliable results, this naïve approach simply don't cut it in my experience.


During the scan we monitor the bandwidth, and we have control pings in order to check all the time the server can send and receive pings. We took certain monitoring and slow down things. Sure it was nor perfect, but the reliability were considered during the experiment.


So, what measures were taken?

The problem is not sending packets fast enough. It's not about bandwidth. The problem is sending them just fast enough, which is impossible if you're scanning statelessly with just ICMP echoes.

Let's say you're on a 100 mbit ethernet, your uplink is only 8 mbit. If you send packets at a rate of 10 mbits, packet loss will happen. And you're not the only one using the network either, so this can happen way earlier. And that's only the part of the network that you control. There might be a lot of hops between you and the host you're sending packets to. And with your approach (the way I understand it) you're not gonna notice packet loss.

I might make too many assumptions here, but ten hours is just too short of a time period for a network of that size for a reliable result. I'm very sceptical. But please prove me wrong, because it will def. make my job easier.

I guess you could publish the code, so I could test it myself.


What if you did a much slower (i.e., reliable) scan of a small sample? Then you could compute the probability of false negatives in the fast search and get a much more accurate count.


I guess it depends on what kind of results you want. It's possible to scan the public IPv4 address space reliably, but it requires a bit more effort than just sending out packets to see what you get back over a relatively short time frame.

You can split the address space up across several different scanners on different physical links. You can estimate the RTT to a network segment you're scanning and base your timings on that. Probing with TCP packets can yield better results than ICMP packets for this type of activity. There's so many variables involved.

Build a tool that allows you to send ICMP packets at a fixed rate (preferably in the kernel, or even without an OS at all if you're into that. Getting precise timings in user land is hard) or just a tool that sleeps between packets with the possibility of not sleeping at all. It's an educational experience. Scan a relatively small range of addresses bound to hosts on the other side of the world at different speeds and see the diff in results. Maybe there's a good tool for that already.

Whenever I read about "We've scanned/product X can scan the internet in X hours" I'm very sceptical. Unless the results are verifiable in some way (which is hard to guess/estimate for such a large sample) or the approach they took seems like a sane one (very subjective I guess), I assume they don't know what they're doing. The reason I assume this is because I've been there myself.


2^32 pings in 10 hours is ~120kHz. So I imagine that the complaining companies were the ones assigned /8 blocks (IBM, Apple, Ford etc.). They probably noticed the 500 pings/second aimed at their address space.


I'd love to know the story behind the three responses they received from 10.0.0.0/8.

I am also curious about this: "With the extracted data more interesting analysis can be done,...such as the issue with network and broadcast addresses (.0 and .255)." Why do responses from .0 or .255 have to be an issue? My cable modem sits on a /20. It seems that there are a number of valid ip addresses ending in 0 or 255 in this range:

  $ ipcalc XX.XX.57.26/255.255.240.0
  Address:   XX.XX.57.26          XXXXXXXX.XXXXXXXX.XXXX 1001.00011010
  Netmask:   255.255.240.0 = 20   11111111.11111111.1111 0000.00000000
  Wildcard:  0.0.15.255           00000000.00000000.0000 1111.11111111

  Network:   XX.XX.48.0/20        XXXXXXXX.XXXXXXXX.XXXX 0000.00000000
  HostMin:   XX.XX.48.1           XXXXXXXX.XXXXXXXX.XXXX 0000.00000001
  HostMax:   XX.XX.63.254         XXXXXXXX.XXXXXXXX.XXXX 1111.11111110
  Broadcast: XX.XX.63.255         XXXXXXXX.XXXXXXXX.XXXX 1111.11111111
  Hosts/Net: 4094                 Class A


Yes, if you're using ranges larger than /24, addresses ending with 0 and 255 are valid addresses; however, because some people making network equipment are dumb, you will have reduced reachability compared to an address not ending with 0 or 255.

Re: responses from 10/8; they may have some connectivity to local 10/8 resources; or it's possible someone was sending them fake ping responses, and the network path they're on doesn't do proper ingress filtering (many don't).

Some consumer routers filter traffic to/from addresses ending in .0 or .255 in a naive effort to prevent SMURFing


I've seen ISPs that use 10.x.x.x addresses internally, so a tracepath from the local network would show an intermediate router with a 10.x.x.x address.


I assume the 10.0.0.0/8 responses were from other hosts on their network?


You would hope they conducted the test with a machine connected to the internet with a publicly routeable address.


They couldn't have done the test otherwise. The significance of 10.0.0.0/8 is that this network is reserved for LAN use.


I'm aware of the signifigance of the netblock, that is why I brought the issue up. Why is it that you think they could not have conducted the test on a machine that is sitting behind a NAT box?


I just yesterday thought that sending packet of death to every public IPv4 address would have been a funny evening project on Friday.


Don't do it from your home computer. FBI will be at your door before Friday is over.


[deleted]


> That's roughly 1/134 of the total energy released by the sun in a year.

Can't be right, that would make it larger than all the energy consumed in the world for a year


But is this that only 7% of all IP addresses are allocated to real live hosts, or (more likely given the amount of 0% counts) large swathes of the Internet just routinely ignore ICMP and drop it on the carpet.

This has been complained about for years and means the most obvious approach to diagnosis of problems fails 93% of the time...


I remember throwing together a super basic python script that would query random IP addresses at port 80 and see which ones sent back a response. Basically an attempt to find random web pages by IP address. Seemed like most of my hits were router status pages or Apache server responses.


That's how Shodan (shodanhq.com) started :)


10 hours is rather impressive! I am also working on similar project and always wonder who else may be doing this and learn from their methods. Not too many of these that have been published though. This is also a good reminder to tell us that the Internet is not that vast after all.


Since you've mastered IPv4, how about some IPv6 love? My server is awaiting your ICMP packet!


Doesn't this make you feel kinda small?

I mean the internet address space for IPv4 is now so tiny relative to our computing resources that visualizing and interpreting the data is fairly easy.

Of course storing a response packet for every IPv6 address might cost slightly more on S3.


Interesting stuff. The data would be a lot easier to read if they only reported a couple of significant figures, though. Having nine decimal digits actually obscures easy interpretation by humans.


A good next step in the ipv4 space crunch would be to reclaim all these netblocks that aren't interested participating in the open internet :)


Coolest thing I've seen all day.

I'd be really curious to know how long the full scan took. Couldn't find the info in the article.


The article says "After 10 hours", so I'm assuming that's how long it took (that took me a while to find, also). Actually pretty amazing that they were able to do it so quickly.


"After 10 hours" we reached the end of the internet. :)

Shocked at how fast they were able to ping all the IPs


To ping the entire internet in 10 hours you would generate approximately 60 Mbps of ICMP traffic.

The site is down for me, but I assume they used multiple machines to do this. I SYN scanned about 70% of the globally routed prefixes last month and it took a little over 4 days from a single box (but I was doing some detailed packet captures that hurt disk IO).


It took only 10 hours on a single server, more info here: http://www.securityartwork.es/2013/01/21/how-much-does-it-ta...


"After 10 hours, we got the following results: Ping overall results answered: 284,401,158 IP addresses responded to the ping, i.e. 7% of systems."

Looks like 10 hours unless I'm mistaken (it is pretty late here)


All IPv4 addresses, presumably.


Hello, world.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: