I automate it by pulling urls out of HN, programmer Reddit, etc. Right now my on...

I automate it by pulling urls out of HN, programmer Reddit, etc. Right now my only source of page content is the Common Crawl, which is why there are relatively few web pages indexed. That will change.

A next step is to index entire sites, not just individual pages, based on the positive votes their links get.