Bitmagnet Allows People to Run Their Own Decentralized Torrent Indexer Locally

bangonkeyboard · 2024-02-18T18:44:25.000000Z

Who remembers Skytorrents (https://news.ycombinator.com/item?id=13423629)? Posted as a "Show HN" here, it was a DHT-sourced index and stack written in C with no JavaScript, no cookies, no ads, no tracking. Skytorrents was unbelievably fast, friendly, and complete, and this translated into rapid adoption and traffic growth that caused the site to shut down due to server costs after just a year (https://torrentfreak.com/skytorrents-dumps-massive-torrent-d...).

It was a shame that the technology behind Skytorrents was never open-sourced; it was the best torrent crawler and site I've ever seen, and I would have liked to see how it worked so well.

mgdigital · 2024-02-18T19:28:49.000000Z

I guess the takeaway is that a public torrent site can (probably) never be sustainable while acting in users' best interests?

evanjrowley · 2024-02-18T19:59:06.000000Z

Skytorrentz was great, but I never knew it had been coded in C nor presented here. Thanks for the background info.

bawolff · 2024-02-19T00:29:29.000000Z

I feel like writing a fast torrent tracker isn't particularly hard. The hard part is the legal risk.

nadermx · 2024-02-19T00:33:10.000000Z

This can't be underestimated. Legal is by and large the largest risk of a torrent site. Maybe one day..

3abiton · 2024-02-19T14:08:56.000000Z

Never heard of it, what an interesting read

jMyles · 2024-02-18T17:40:36.000000Z

Interesting.

I want to seamlessly distribute my music through bittorrent, but since I have a small fan base (and thus, a very small potential seeding pool), I found it difficult to connect all the moving parts.

I'll give Bitmagnet a try for an Indexer.

I did find that IPFS, with a pinning service (I won't shill the one I used in particular), was a bit easier and just worked for everyone who tried to use it.

But I'd like to get my bittorrent presence up to the "just works" level also.

8organicbits · 2024-02-18T17:49:47.000000Z

Have you considered setting up a web seed?

https://www.ubuntubuzz.com/2021/06/how-to-add-web-seeds-to-t...

jMyles · 2024-02-18T20:24:36.000000Z

No I hadn't!

This seems exciting. Have you done it successfully?

evulhotdog · 2024-02-19T01:35:49.000000Z

It is pretty common for open source distribution of operating systems, and it works well.

rakoo · 2024-02-18T18:19:07.000000Z

You don't need an indexer for that. All you need is a static list with all your music, and people can click on links to download your music. Give them qbittorrent and it'll be running smoothly.

jMyles · 2024-02-18T20:25:19.000000Z

Sure, but I'd like to go beyond magnet links and make it properly decentralized, with the discovery happening in the DHT and torrent swarm.

rakoo · 2024-02-19T10:23:14.000000Z

There is no point in running your own indexer if it's only for your own music though. If your goal is to run a more "communal" indexing point, it also makes sense to build it cooperatively with other artists, listing only content you all care about and want to distribute (rather than doing a filtering after the fact with a general indexer)

I like your idea of giving its name back to a technology that just works better, and I think it pairs really well with giving back a humane touch to how we use the internet, computers, and putting people back at the center rather than thinking about tools first !

anacrolix · 2024-02-19T01:48:43.000000Z

Maybe check out how it's done here: https://archive.org/details/the-fanimatrix-div-x-5.1-hq. They don't seem to provide magnet links, but basically you create a torrent from your music. Host it somewhere (Cloudflare R2 would be good for free egress) in the right structure. Add the webseed endpoint to your torrent file and create a magnet link. Put all this stuff on a website in the download section. Let your users download it however they want.

KoftaBob · 2024-02-18T18:08:06.000000Z

Well for your use case of distributing your own music, Bitmagnet wouldn't be necessary.

What a DHT crawler like Bitmagnet does is the following:

1. Take a few initial "bootstrap" torrents and ping them to see which IP addresses are seeding that file 2. Ping those IP addresses and ask what other files/torrents they're seeding 3. Ping those torrents to see which IP addresses are seeding that torrent

Rinse repeat.

So to distribute your music to fans, you'd just want to put magnet links to your music on your site.

the8472 · 2024-02-18T18:30:37.000000Z

That's not correct. Torrent swarms and the DHT are separate. Each torrent basically forms its own small network of TCP connections to exchange data specific to that one torrent. While the DHT is a network shared by all clients that speak the protocol and it's carried over short-lived UDP query-response exchanges.

pluto_modadic · 2024-02-18T20:05:06.000000Z

they're saying they jump between one and the other. ask DHT, then ask swarm, then ask DHT, then ask swarm.

the8472 · 2024-02-18T21:13:23.000000Z

That doesn't make sense because those are concurrent processes, not serial ones.

pessimizer · 2024-02-18T19:27:17.000000Z

You have to be participating in a torrent swarm in order to bootstrap yourself into DHT at all. Bittorrent's DHT is not a network independent of torrent swarms. You need the address of a peer who is already part of the network in order to join it, and you have to get a list those addresses from somewhere.

the8472 · 2024-02-18T21:13:04.000000Z

> You have to be participating in a torrent swarm in order to bootstrap yourself into DHT at all.

This is not correct.

> Bittorrent's DHT is not a network independent of torrent swarms.

This is also wrong. People have used the DHT for non-torrent-related purposes.

> You need the address of a peer who is already part of the network in order to join it, and you have to get a list those addresses from somewhere.

And nothing dictates that that has to be obtained via the bittorrent peer protocol. https://stackoverflow.com/a/11089702/1362755

pessimizer · 2024-02-18T23:02:27.000000Z

Which of these methods is not obtaining the address of a node in a swarm, or hitting a tracker for a list of nodes in a swarm?

> People have used the DHT for non-torrent-related purposes.

This is simply non-responsive. People have used a DHT overlaid on the collection of torrent swarms for non-torrent related purposes.

> And nothing dictates that that has to be obtained via the bittorrent peer protocol.

This is a silly distinction. You also don't need to join a swarm to get on the DHT if I join a swarm to get on the DHT, write down the addresses I get on a piece of paper, then email you those addresses, which you plug into your handwritten specialized client that only knows how to join the DHT.

the8472 · 2024-02-19T16:00:37.000000Z

> Which of these methods is not obtaining the address of a node in a swarm, or hitting a tracker for a list of nodes in a swarm?

The "find bootstrap nodes via a bunch of DNS records, then persist long-lived contacts locally" approach is quite common.

> People have used a DHT overlaid on the collection of torrent swarms for non-torrent related purposes.

No, the software does not involve joining any bittorrent swarms at all.

anacrolix · 2024-02-20T11:22:07.000000Z

Sorry bro, the8472 is correct on all counts. Misunderstanding DHTs and BitTorrent and how they relate is very common.

gkbrk · 2024-02-18T18:27:53.000000Z

You don't need to ping anyone to crawl the DHT. You can passively wait and you'll get DHT queries in the form of "I'm looking for people seeding XYZ. Do you have a list?". You can just save those somewhere and you'll accumulate a list of active and new torrents.

Writing a DHT crawler is super fun, I suggest everyone to get a cheap VM and write/run one.

jMyles · 2024-02-18T20:18:55.000000Z

It's kinda like the owl with the tootsie pop:

How many cups of coffee does it take to internalize hamming distance? :-)

pluto_modadic · 2024-02-18T20:05:26.000000Z

that'd be the slow mode, though

CharlesW · 2024-02-18T18:22:00.000000Z

> I want to seamlessly distribute my music through bittorrent, but since I have a small fan base (and thus, a very small potential seeding pool), I found it difficult to connect all the moving parts.

Out of curiosity, why not just distribute via R2 or similar, or archive.org if you don't need AuthN/AuthZ? What's the complexity (to you and listeners) of BitTorrent buy you?

jMyles · 2024-02-18T22:10:42.000000Z

Well, I do distribute via some centralized platforms (including some odious ones like Spotify).

But I'd like to put forward a practice that demonstrates that the tools that have been smeared as anti-artist (chiefly but not only bittorrent) are actually compelling tools for independent distribution.

CharlesW · 2024-02-18T23:23:10.000000Z

Thanks, I love it!

throwing_away · 2024-02-19T00:15:06.000000Z

You can also encourage people to host your content with IPFS, removing the need for a pinning service if there's enough of them.

johnchristopher · 2024-02-18T18:14:53.000000Z

Won't you run into the problem that you will have to generate torrents every time you update your music collection ?

factormeta · 2024-02-19T00:28:27.000000Z

Try https://webtorrent.io

felixg3 · 2024-02-18T18:40:02.000000Z

Post your magnet link and I’ll host it on my Amsterdam-based server

UberFly · 2024-02-18T22:11:48.000000Z

"Unlike well-moderated torrent sites, Bitmagnet adds almost any torrent it finds to its database. This includes mislabeled files, malware-ridden releases, and potentially illegal content. The software tries to limit abuse by filtering metadata for CSAM content, however."

This is by far the biggest hurdle to something like this. You'll eventually have to end up again with a centralized curator.

anacrolix · 2024-02-19T01:43:29.000000Z

I have a DHT indexer implementation I mentioned in https://news.ycombinator.com/item?id=39425381. It definitely finds everything, and you need a metric to sort for quality. That metric can be seeder/leecher counts from a trusted tracker, or peer count on the DHT. https://www.coveapp.info/ uses this to grade its search results.

mgdigital · 2024-02-18T22:37:51.000000Z

Or connect with trusted curators over a decentralized network, similar to the fediverse?

UberFly · 2024-02-18T22:44:53.000000Z

Or a baked-in a use-based rating system. That would be cool if it could be protected from abuse.

paulmd · 2024-02-19T01:21:00.000000Z

Then you have a user reputation problem instead… me and my 5 million alts say this is the REAL korn_blind.mp3.exe.

Nothing is new under the sun since the internet was created, these are the same problems that email has tried and failed to solve for 50 years.

idiotsecant · 2024-02-19T02:54:48.000000Z

Its going to be an unpopular take, but crypto solves this. As in currency. Make people attest to things with locking a very small amount of money.

viraptor · 2024-02-19T04:26:51.000000Z

This has been used by record labels in the past. I can't remember which system tried to do reputation (was it kazaa?), but some of the most upvoted songs were the current hits, but replaced with a ~10s loop going for a few minutes.

Now, do you think you can outvote riaa enforcement group equivalent, if they decide so spend money on this?

paulmd · 2024-02-21T18:34:43.000000Z

Bitcoin originally evolved from the "hashcash" proof-of-work system, which was intended to be a scalable anti-spam measure (force an attacker to generate a unique proof that is expensive to generate but cheap to verify). And you are basically describing proof-of-stake (force participants to stake, and if they misbehave you slash their stake). I wasn't kidding about "nothing is new under the sun" ;)

In general though others are right that the problem often is that attackers are more willing to play these games than individuals - you are not willing to put up 10 bucks to post on a web forum, but an attacker might think $10 to get a wonderful commercial offer delivered to 1000 people sounds really good! You won't spend 10c to upvote each individual song, but the RIAA will! Etc. It needs to be highly asymmetrical, and ideally have minimal/zero cost to "honest" users with an exponential penalty for attackers.

Obviously we are not living in a post-email-spam world unfortunately, and what we have is basically the "lightning network" with gatekeepers offloading and centralizing a lot of the problem so users don't have to deal with it. But you aren't the first person to make the observation that these are related problems!

idiotsecant · 2024-02-23T23:32:10.000000Z

>And you are basically describing proof-of-stake (force participants to stake, and if they misbehave you slash their stake). I wasn't kidding about "nothing is new under the sun

That's sort of the central point of my post....

Mindwipe · 2024-02-19T10:31:24.000000Z

> Its going to be an unpopular take, but crypto solves this. As in currency. Make people attest to things with locking a very small amount of money.

So claims Elon on Twitter, a platform that is very obviously to anyone who uses it completely overrun with bots who have found it very profitable to validate their spam accounts and get preferential listings for a mere eight dollars, while scaring off legitimate users who (fairly) do not trust a service with no functioning security department with their payment information.

belorn · 2024-02-19T11:42:13.000000Z

Ad blockers seems to imply that we can do curator without the centralized aspect. Ublock has a couple hundred of lists, each with thousands or more filters. A user choose to enable or disable what they want with no central control.

Further more, the filter itself can be made to limit disclosure of the original data while still providing a binary decision regarding filtering a content or not. Hashing, bloom filters, ai models and so on are common tools for filtering data like emails and link reputation.

Is it possible to use bots or use a riaa enforcement group equivalent to get ad blockers to green lights specific ads? Personally I trust easylist as long the community does so, and would discard it the moment it lost the community trust. That makes it a kind of centralized curator, but also very much not.

yieldcrv · 2024-02-18T22:41:45.000000Z

they shouldn’t do anything aside from a disclaimer

just opt out of being an arbiter, like public utilities do

UberFly · 2024-02-18T22:46:29.000000Z

Unfortunately the majority of it will be malware or illegal porn in the blink of an eye without a good way to separate it.

yieldcrv · 2024-02-18T23:21:59.000000Z

street smarts

community indexer

why download anything from a list of torrents, okay use the same set of rules

bawolff · 2024-02-19T00:31:27.000000Z

Things like this need good signal to noise ratio to survive. Just throwing your hands in the air and saying anything goes doesn't work because users don't want to search through 10 billion mislabelled things to find the right one.

yieldcrv · 2024-02-19T01:07:58.000000Z

that’s what’s going to happen though

coldblues · 2024-02-19T11:12:08.000000Z

All the CSAM torrents are fed honeypots. Just don't accidentally download them and you should be fine.

ranger_danger · 2024-02-21T23:22:20.000000Z

How do you know? And why would you know?

belorn · 2024-02-18T19:09:18.000000Z

The really interesting part for me is what this technology might lead towards in the end, which is decentralized community based curation. An index with white listed curation would be indistinguishable from a website, but it would not need a domain name nor a ip address to function.

imtringued · 2024-02-18T21:01:42.000000Z

The problem with decentralised software is that you don't want to host other people's illegal content. I once tried out zeronet, which downloads the entire decentralised website and anyone can post things to it. Although I have not found CSAM directly on their Reddit equivalent, there are people posting advertisements to zeronet CSAM sites. The idea that I am downloading and automatically redistributing content like that is disturbing and zeronet is dead for a good reason. It's a pool that is asking to be peed in, even if the abusers themselves are a tiny minority.

mgdigital · 2024-02-18T21:11:57.000000Z

Bitmagnet may download metadata about CSAM content, which is automatically deleted with fairly high accuracy. You would never be redistributing it. No outgoing peer protocol is currently implemented. This is planned but it will give users control of what they're sharing rather than indiscriminately sharing everything.

anacrolix · 2024-02-19T01:46:04.000000Z

https://www.coveapp.info/ approaches this in a similar fashion. I don't believe there's any legal issues in collecting metadata in an automated fashion from a public network. So creating a search index for personal use from this is fine. However if you do click and start seeding questionable content, then it becomes an issue. https://www.coveapp.info/#dht-indexer

rakoo · 2024-02-18T18:17:59.000000Z

Another alternative is https://github.com/the8472/mldht which, contrary to magnetico, strives to be a nice citizen (its author is active in the bittorrent community AFAIU)

mgdigital · 2024-02-18T18:23:20.000000Z

I have worked with the8472 to get Bitmagnet's BEP5 & BEP51 implementations working and ensure it's a good citizen on the network - there is more to be done and more protocols to be implemented, but unlike Magnetico, BM is not simply scraping without responding to incoming requests.

the8472 · 2024-02-18T18:32:11.000000Z

We had a discussion here https://github.com/bitmagnet-io/bitmagnet/issues/11 With the related changes bitmagnet shouldn't have the blatant misbehavior of magnetico (anymore). Though I haven't looked at its in-the-wild behavior, so I can't vouch for how spec-compliant the implementation plays in practice.

mgdigital · 2024-02-18T18:40:50.000000Z

I'm also open to more feedback, if there are specific areas that need attention let me know :)

swayvil · 2024-02-18T18:23:43.000000Z

Googled the bittorrent community with the acronym "AFAIU". Lol.

jesprenj · 2024-02-18T19:40:39.000000Z

Nice! I made my own DHT crawler in C for a science project in high school.

git: http://ni.4a.si./sijanec/travnik/tree/src/dht.c

taminka · 2024-02-18T19:55:25.000000Z

[flagged]

nkohari · 2024-02-18T21:21:54.000000Z

I have no personal opinion on the code, but it was a high school project. Relax.

bartvk · 2024-02-18T20:28:40.000000Z

> modern C ijbol

Can you clarify what this is?

Rygian · 2024-02-18T20:54:22.000000Z

https://mashable.com/article/ijbol-explained

jesprenj · 2024-02-18T19:44:29.000000Z

Implementing sample_infohashes opens your torrent client to abuse. UDP responses are much larger than queries and this protocol (BEP51) allows attackers that can spoof source IP addresses to use a large number of clients as mules for amplified distributed denial of service attacks.

the8472 · 2024-02-19T16:12:33.000000Z

Individual DHT nodes should only see a trickle of packets. A few kilobytes per second. Even less per remote IP. So they can set fairly strict rate limits.

And ISPs are in a much better position to solve this problem anyway. They should ask for higher peering fees from peers that don't do source-filtering.

jesprenj · 2024-02-21T07:46:39.000000Z

True. Although an ISP can say they do source filtering and then fail to implement it properly. For example my ISP at home implements source filtering on IPv4, but not on IPv6.

p1mrx · 2024-02-18T22:42:44.000000Z

Has anyone tried to fix this by making queries bigger, and limiting the response to the size of the query?

mgdigital · 2024-02-18T19:57:21.000000Z

Bitmagnet has rate limiting on incoming UDP requests (both overall and per-IP) so I don't know that it would be vulnerable; if there's anything else that should be done to mitigate any risk I'd like to know.

jesprenj · 2024-02-18T21:39:08.000000Z

I do not mean bitmagnet here, I mean other bittorrent clients that respond to sample_infohashes.

pluto_modadic · 2024-02-18T20:03:17.000000Z

outgoing rate limits, and also it could exhaust the overall rate with useless traffic.

mgdigital · 2024-02-18T20:06:08.000000Z

Incoming and outgoing are both limited, I think the worst such an attack could do is prevent responding to legitimate incoming queries - this shouldn't slow down the DHT crawler in a noticeable way.

anacrolix · 2024-02-19T01:41:15.000000Z

I'm the author of https://github.com/anacrolix/torrent (started in 2013) and https://github.com/anacrolix/dht (started in 2015). I have a DHT indexer implementation I developed in 2021. It's currently closed source but available for use as part of https://www.coveapp.info/. I have found that after several hours the search is excellent and stays up to date with ease.

mgdigital · 2024-02-19T08:53:50.000000Z

Some of the features in your screenshots look interesting, but why is Cove closed source?

anacrolix · 2024-02-20T08:45:40.000000Z

Mainly because I am not ready to share my DHT indexing implementation. It's the culmination of years of working with BitTorrent and DHTs and I'd like to get something back for it one day. However I do dogfood Cove and want people to have that experience too.

pdimitar · 2024-02-23T14:48:38.000000Z

I would think the only thing you would get out of such work is a sternly worded cease and desist from a big entity so you might as well open-source this and let somebody else take the torch in case you are shut down.

I mean I'd use such software (and huge DBs like that of Skytorrentz) for research purposes because to me distribution is hugely interesting as a hobby but many courts won't see it that way, and we know a lot of them are influenced by copyrights holders.

anacrolix · 2024-02-24T01:51:00.000000Z

Thanks for your insight!

lupusreal · 2024-02-18T23:21:09.000000Z

The web UI is super slow. It's slow with the default view of 10 results per page and unbearably slower at 100. These aren't big numbers and my computer isn't old, something was poorly done here and setting the default number of results per page to 10 clearly is not the correct fix.

(For the record it only has a mere thousand results so far..)

Also, the content classification is so bad it might as well not exist. How does a torrent with "Playboy" in the title get classified as 'Unknown' instead of 'XXX'? Even torrents with "Porn" or "XXX" in the title get classified as Unknown. A simple Bayesian classifier should have this covered, it's not a task that needs heavy duty AI to solve.

This whole thing seems way too half-baked for me.

anacrolix · 2024-02-19T01:53:27.000000Z

Try https://www.coveapp.info/, I mentioned it here: https://news.ycombinator.com/item?id=39425381. I wrote a custom search implementation for torrents that uses keyword matching and boosts search scores based on popularity of torrents on the DHT. It feels very much like TPB search but not so aggressively focused on seeder counts (i.e. better matching on file names can outrank a higher seeder count).

schemathings · 2024-02-19T06:37:00.000000Z

Why is there an ffmpeg dependency if the app is only working with metadata?

anacrolix · 2024-02-20T08:43:43.000000Z

It uses ffprobe for metadata, and ffmpeg for transcoding.

schemathings · 2024-02-23T21:13:00.000000Z

Makese sense, thanks.

mgdigital · 2024-02-19T13:37:04.000000Z

Can I take a wild guess that you're using Firefox? I have noticed Firefox performance is much worse than any other browser. I think this is due to issues in the Angular and Material components being used.

As stated in the big red notice in the website, the software is currently in alpha preview. Given time, Bitmagnet will be trying to mitigate the Firefox performance issues (the <5% market share means it's prioritised accordingly), but it's important that users of all browsers can have acceptable experience in the app.

viraptor · 2024-02-19T04:35:12.000000Z

This project only started a few months ago. It's just young and being worked on - no need to dis this as half-baked.

lupusreal · 2024-02-19T10:44:34.000000Z

It takes several seconds to load 100 results from a list of 1000, with no search term used. I don't know how that's even possible. I think half-baked is a mild way of putting it, particularly if the excuse is that it hasn't been in the oven very long.

chopsuey5540 · 2024-02-18T18:49:29.000000Z

This looks interesting but I’m a bit worried about the CSAM / illegal stuff part, could a user get in trouble because he has traces of that in his crawled index? Also, how large is the index after indexing for a few months?

rakoo · 2024-02-18T18:51:57.000000Z

An indexer doesn't download content. The only information you'll have is the name of a torrent, potentially its files, and who is interested in those files.

But that's the technical view, what happens in court might be totally different.

Scion9066 · 2024-02-18T18:57:38.000000Z

In order to get the information such as the name of the torrent and its files from the hash you do need to connect to someone in the swarm to download that metadata. You won't know what it is until after you've already connected.

toyg · 2024-02-18T19:32:53.000000Z

Connecting to an unknown machine and asking what they have, is like knocking on a stranger's door and asking what they're selling. Them mentioning something nefarious and you leaving in response, is very obviously not a crime.

rakoo · 2024-02-18T20:28:49.000000Z

There probably are nefarious content you can see just from the filenames but not everything is like that. Moreover, you "only" know they distribute it, you don't do it yourself.

MomoXenosaga · 2024-02-18T21:36:45.000000Z

Considering many countries block torrent sites I wouldn't chance it.

ZoomZoomZoom · 2024-02-18T22:58:30.000000Z

The real question is: metadata is data, so are there any limitations on how much data can be transferred through DHT using well-behaving clients/servers so that you can be reasonably sure what you download on your machine isn't poisoned enough to possibly get you into trouble with the law enforcement?

anacrolix · 2024-02-19T01:50:22.000000Z

At least in the case of https://coveapp.info, the metadata you fetch from users while scraping is disassembled into a form for efficient searching only. The only part remaining in an identifiable form is the infohash.

infogulch · 2024-02-19T00:57:31.000000Z

How does Bitmagnet compare to Aquatic? https://github.com/greatest-ape/aquatic

ranger_danger · 2024-02-21T23:23:56.000000Z

Isn't this dangerous? I thought that in order to get details of any torrent from the DHT you must connect to it. This would automatically set off DMCA complaints from MPAA/RIAA etc. right?

(also how do they ever prove you actually downloaded any usable part of a torrent?!)

anacrolix · 2024-02-22T00:01:41.000000Z

You must connect to peers yes, but metadata is separate to torrent data.

ranger_danger · 2024-02-22T02:29:01.000000Z

but copyright holders do not need to show you actually DID anything besides connect to the torrent in order to accuse you, that's all they ever do...