Hacker News new | past | comments | ask | show | jobs | submit login

> Who chooses the white list, and why should I trust them? Is it democratically chosen?

You could have user compiled lists of sites to show in search results.

Let the users pick the lists they want to see, and communities can create and distribute lists within themselves.




Great idea, but why build a search engine at all in this case? You can use DDG + your filter and see only the results from your whitelist.

Could easily be implemented for any current search engine.

To a large extent, this is what you already do when you view a page of search results. Filter them based on your understand of what sites / results hold value.


> why build a search engine at all in this case?

On a public scale, you could make an argument for tighter integration/better privacy with the lists. For example:

    Browser -----Request-to-SE-----> Search Engine
      ^                                   |
      |                      Unfiltered Results (In YAML/JSON)
      |                                   |
      |                                   V
      |--Desired Results------ Local Filtering/Rendering
On a private scale, if you are only crawling sites on the allow list than you have the possibility of being able to better maintain a local database of sites to show up on the search.

Edit: Possibly this could be easier to use to set up distributed search as well, as each node could index a given list, and then distribute that list similarly to DNS. Don't really know how well that would work though, just an idea.


Aside from this a big reason to build this is it seems a lot simpler than writing a giant web crawler ala google and thus is a good target for an open source solution. Which is the biggest problem with duck duck go.


Do you have thoughts on implementing the distributed search?

I'm thinking about playing around with this in my spare time, but that part seems the hardest to do.


Don't start from scratch, take a look at yacy (https://yacy.net/) that already does most of what is discussed here


I would think everyone would run their own “crawler” but maybe you could use a ledger and delegate sites to different workers. If you’re whitelisting sites you could maybe only crawl a link or two deep.

It’d take a lot of cycles while small but if you get a network growing you could even have sub-networks with their own whitelist additions (and every user has a blacklist.)


That’s what directory sites offered once upon a time. It was a pretty good way to discover new content back then. I spent a lot of time on dmoz when I wanted to find information about various topics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: