Hacker News new | past | comments | ask | show | jobs | submit login

Dumb question but how would they know the content is different unless they're also crawling incognito and comparing the results?



Keeping you honest with incognito crawling is something they have to do anyway, to catch various tricks and scams - malware served up to users, etc.


So robots.txt is meaningless if they have to violate it to check for malicious content in blocked off pages anyways.


Well if you are blocking access to their crawler, I'd imagine they'd have no need to use an incognito crawler to check for malicious content. Why would they care if that content is not ending up in their index anyway?

Presumably, the incognito crawlers are only used on sites that have already granted the regular crawler access. That's content that ends up in their index which they want to vet.


Google have numerous robots that do not say Googlebot in the user-agent. They look just like Android cell phones. That is how they spot malicious sites or sites that are trying to game SEO or what-not. They are not within published CIDR blocks for Google and appear to just use wireless networks.


I'm picturing Google Street View cars driving around with a box of Pixels in the back, connecting to open WiFi and trying sites and that's why Google can now narrow down your location from what SSIDs are available.


Speaking of rolling around with box of android devices:

https://www.theguardian.com/technology/2020/feb/03/berlin-ar...

Also, I would've sworn that happened circa 2015 and not 2020. The passing of time for the last few years has such a muddled feeling.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: