Hacker News new | past | comments | ask | show | jobs | submit login

What makes this better than https://duckduckgo.com ?



Not saying that it's better but one of the main selling-points of DeuSu seems to be that it's fully open source and independent search index. Duckduckgo, if I remember correctly, is not 100% open source and get their search index from Yahoo (or maybe Bing, not sure)


If it's not good, the it doesn't matter if it's OSS or not.


Good for what? Even though this isn't good for use as an every day general purpose search engine, it could be good for a particular use case perhaps with some adaptation or for learning from.


I don't know why would people use it to be frank. Lot better alternatives exists.

> it could be good for a particular use case

Namely?

> or for learning from.

The author admitted in the github readme that the code quality is rather bad. I also don't see a link to the search index, the only valuable component of this project.


I will publish the index for download in a few weeks. I'm currently working on the documentation. Oh, and I will publish the raw crawl-data too. Everything together is about 2.5tb.

There is also a free API in beta-test right now. Will probably be ready for official release next week.


I think such publication is very important. I have no idea at all if open source search engines could ever work.

But if they can, I think a big part of it would be separation of the crawl index and the UI / prioritisation etc. Different people can work on those two ends of the problem and apply different philosophies.

Search only forums? Reject porn using XYZ method? Great! But they can all use the same database, or pick from the a common community of databases.


That's great news, thanks for the info. Sorry for sounding harsh, for being a side project this is impressive.

Have you also published the ranking mechanism? That way people might contribute you to improve it.


It's all open-source. So, yes.


It would be great if you can share (at least) some information about the kind of hosting setup you're using, how much of bandwidth and how long it took to crawl and index the 2B pages.


4 servers in total.

2 are used for crawling, index-building and raw-data storage. Quadcore, 32gb RAM, 4tb HDD and 1gbit/s internet connection on each of these. They are rented and in a big data-center. Crawling uses "only" about 200-250mbit/s of bandwidth.

2 servers for webserver and queries. Quadcore, 32gb RAM. One with 2x512gb SSD, the other with only 1x512gb SSD. These servers are here at home. I have cable internet with 200mbit/s down, 20mbit/s up. Static IPs obviously.

A full crawl currently takes about 3 months.


Thank you for your work. Keep going on it.

Wired you a small donation as I think it is important to have alternatives.


Thank you!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: