so what should his robots.txt look like? at the moment it is: User-Agent: \* Dis...

saalweachter · on July 15, 2011

It's mostly sufficient. /21000/ will not match "http://picolisp.com/21000, which is the first URL in the sequence, but the remaining URLs look like "http://picolisp.com/21000/!start?*Page=+2, so Googlebot will likely only continue to download a single page once it has re-read the robots.txt.

Which is what you deserve for using non-standard URL formats.

Florin_Andrei · on July 15, 2011

Hold on, slash at the end is not standard?

saalweachter · on July 15, 2011

No, I'm saying /21000/ will match a path with a directory named /21000 but not a file named /21000.

When I say "non-standard", I am saying am saying that if the website's URLs looked like "/21000/foo" and "/21000/foo?page=2", it would have been easier to craft a "Disallow" rule that would have successfully blocked all of the desired pages.

bauchidgw · on July 15, 2011

   User-Agent: *
   Disallow: /21000

or

   User-Agent: *
   Disallow: /