Hacker News new | past | comments | ask | show | jobs | submit login

I've found that about 80% of the bullshit my search engine crawler finds is on Russian and Chinese IPs (especially the Alibaba Cloud).

Not that there isn't legitimate content, but there seems to be very little effort put toward policing the bad as long as it primarily targets a western demographic.

This is not by any means a new development.




Oh wow, I was looking for your search engine a few months back, but couldn't recall the name, and my searching turned up nothing, even here on HN somehow.

I just came across the open-sourcing blog post. Thank you for doing this. Search is something I'm quite interested in. Is there a public mailing list somewhere to discuss marginalia-related stuff? I'll probably look into contributing at a later point (I've also always wondered how good sqlite+webtorrent would do fare a search engine, for instance -- not saying I'll make a PR for that, but I'd be curious to investigate stuff like that).


I'm still working on how best to run this as an open source project. No list or anything just yet. I'm half considering moving onto github from git.marginalia.nu; but still ironing out the details. Also a lot of rough edges to smooth when it comes to contributor experience, running the thing still requires a lot of awkward manual steps.

If you want you can pop me an email (see my profile) and I'll let you know when the details are more clear.


Don't forget the Bayes theorem: lot of crap comes from ru/cn domains <=/=> most ru/cn domains are crap.


The majority of English-language ru/cn domains being crap seems like a safe bet. I'm sure lots of legitimate businesses in those countries have such domains, but those websites aren't relevant to an English speaker not living in those countries.


If I'm interested in reducing the amount of crap, it's only the left hand of that inequality I care about.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: