I'd like to invite everyone to try out DontBeEvil.rip, an experimental search engine for developers.
tl;dr
$ alias rip="curl -G -H 'Accept: text/plain' --url https://dontbeevil.rip/search --data-urlencode "
$ rip 'q=Heartbleed bug'
DontBeEvil.rip is a year long experiment to see if a small team can build a developer-focused search engine that is self-sustaining on $10 monthly subscriptions.
It works by only indexing high-quality resources that are relevant to developers. You won't get useless listicles because we'll never crawl them. Relevant urls are harvested from HN, StackOverflow, programmer Reddit, and a few others. Page content comes mostly from the Common Crawl project.
The limited, but awesome, features in this first release are:
- Expressions! Experience the power of Elasticsearch’s Simple Query Strings.
- REST API. Just change 'text/plain' to `application/json` in the above alias.
- CLI. Just use curl in the terminal. Simple as.
HackerNews, StackOverflow, Arxiv abstracts, 2M Github repos, and programmer Reddit (up to 2020) are being indexed right now. There's much more to come in the next few months.
I'd love to hear your questions, comments and suggestions in the comments below.
This is mostly just raw data, it isn't that useful (yet).
The security issues with using curl directly to my terminal feel a bit dangerous. I'd rather use my browser and be able to see the results over a json tree. Providing raw access to ES has a high risk to reward.
The search results for https://dontbeevil.rip/search?q=python%20context%20manager%2... are non-topical hits on SO records. I even put the name of the python package (from stdlib no less) in the query string.
I was able to find what I needed on devdocs.io in less than 10 seconds.
https://devdocs.io/python~3.10/library/contextlib#contextlib...
In no way am I trying to discourage you, but until the basics are in place, search over arxiv abstracts is way less useful than just SO and docs (language and libraries).
I would recommend returning text/plain by default and .json if someone asks for it (in the url), no everyone can set headers. I'd also put an about page at the root of your site, plain text is fine.