Hacker News new | past | comments | ask | show | jobs | submit login

Who stops people from running their own search engines?

As if you cannot look up address range of your own country then crawl your whole country for websites that may be hosted by people living locally.

As if you cannot do the same with a foreign country that interests you.

Maybe you could even find a list that only shows residential IP's so you're sure to be only finding webservers ran by individuals and not corporations.

And if somehow "port scanning" by trying to send a http request to a residential IP is illegal in your dystopian country, you can always start by scraping the site that you're interested in, there will always be at least one more link to another domain somewhere.

For large scale servers python is shit, but that doesn't mean that you cannot spend few weekends writing your own python crawler for your needs, which is so easy that you don't need to be a programmer to do it, and if you really care about this at all, a bit of a startup hurdle won't make you immediately disinterested.

And if it really does, there's always options like https://yacy.net/

You should see these things more like real life. If you wanted to know more about your own neighbourhood, what better way is there than to go outside and walk around your neighbourhood and see things with your own eyes?

Maybe that's just my opinion, but status quo is noone's but your own fault, because I never had this problem.




Content producers get sucked into walled gardens. Even if it is just an internet discussion, but nobody will ever read your shit on Discord if it is older than a week. IRC had some of the same problems, but only to a degree. So user content is decreasing and corporate and bot farms remain for the open net.

You could crawl forums and find deep technical discussions. Not anymore. And if a term was ever part of any news cycles, you get walls of Google selected propaganda.


>Who stops people from running their own search engines?

This is basically the reason my team and I are building an alternative set of YouTube recommendations. You can check them out here:

https://channelgalaxy.com

I was just tired of YouTube steering me back to the same old small niche of videos, many times giving me repeat recommendations for stuff I'd already seen. Our algorithm is designed to surface smaller channels and find more obscure content.


Yes, we like alternatives. Thanks for building this!

(casual observation: Try matching titles without spaces, I did 'thisoldtony' and got nothing, but 'this old tony' matched. )


Good suggestion and noted.

Thanks!


Even if your neighbor hosts a website for your local football club or whatever, it will almost certainly use some hosting service and not a local machine. The number of websites self-hosted from a residential home must be a tiny, tiny fraction of all "interesting" websites.


First, the internet has grown a lot, so the time&cost&hardware for retreiving and indexing it has grown.

Second, the quantity of intentionally fake noise has grown even faster - the spam problem that you have to solve is much harder than 30 years ago, any naive approach will simply fail to notice the needle in the haystack.


> Who stops people from running their own search engines?

This question reveals a failure to understand the equipment, labor, and bandwidth costs of running a search engine.


Really, do you have an estimate of what it would cost?


That's like if someone says "a car is $18,000, just make your own car," another person laughs and says "you don't really think you can build a street legal car from scratch without making a car company," and someone else saying "really, what do you estimate a car company would cost?"

It's completely unnecessary to make that estimate, a nonsense proposition since any two implementations are two orders of magnitude in cost apart, and a question that should never be asked of someone who hasn't done it.

Which is weird, because if you are who I think you are, you've done this in a trivial way, focusing on tiny sites.

And who knows? Maybe you're about to tell me that you've indexed several tens of thousands of pages yourself, that nobody's helping you, that it runs on two computers, and that it's Not That Difficult (tm).

Of course, then someone compares that engine to a practical search engine that also encompasses modern sites, and therefore needs to run tooled browsers to cope with their AJAX nonsense, and has to hit them every hour to be up to date.

And then you look at the disk cost.

Microsoft spends about $6 billion a year on Bing.

Duck Duck Go has more than 200 staff and raised $170+ million before their first profitable quarter

I think it's very easy for someone to put a homebrew HTML chess game on the phone store and then turn around and insist they know what it takes to run EA


I'm not disputing that you can sink inordinate amounts of money into a search engine, but on the flip side, I am running a search engine.

It indexes not tens of thousands of pages, but has a peak capacity of about 100 million documents. I can crawl over a billion documents per month.

I don't really see anyone suggesting competing with Google or Bing off a PC in your garage, but it is absolutely and demonstrably feasible to build complementary services without any budget at all.

It doesn't require huge numbers of developers, it doesn't require a small country's allotment of bandwidth, and it doesn't require data-centers full of prohibitively expensive hardware.


> has a peak capacity of about 100 million documents

This is much larger than expected.


"Who stops people from running their own search engines?" It's not who, its $$$ / time / knowledge (pick 1,2 or all 3).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: