Hacker News new | past | comments | ask | show | jobs | submit login
Build your own BitTorrent (codecrafters.io)
457 points by romac on Oct 19, 2023 | hide | past | favorite | 128 comments



Hi! Sarp here, author of the Build your own BitTorrent challenge on Codecrafters.

Some back story: After being laid off from my FAANG job, I found myself very unmotivated to go back. I started looking for interesting programming projects to revive my interest in coding. While nomading, I discovered Codecrafters on Nomadlist and really liked the push code to git and pass different stages interaction. The gamification helped me focus and projects allowed me to go deeper on software I used (SQLite, Git, Redis etc.). I even picked up a new language (Go) to do the challenges with. After completing all the challenges on the site, I ran out of things to do. This is when I decided to build a BitTorrent client which was one of the highly voted ideas on the site.

I learned many new things by building a BitTorrent client: the BitTorrent protocol, how torrent files are structured, encoding issues, pipelining network requests, url encoding binary values, using Channels in Go etc.

I’d love any feedback on the challenge. Also happy to answer any questions!


Its not uncommon for invite-only trackers to be very prescriptive about which torrent clients can and can't be used with their private tracker. Any ideas on how to overcome this obstacle to wider adoption?


Is this something you expect to use as your main torrent client?


How do they tell one client from another?


User agents, fingerprinting, etc. There are certainly ways to mask your client, but these would be considered cheating by most private trackers and would be grounds for a ban.


There's a whole ton of bluster in the torrent community. Bittorrent is a simple protocol, to the point of being naive (and therefore a fun toy network app project). Clients identify themselves to the tracker by user agent; there's really nothing else to fingerprint against. Claims to the contrary are almost certainly bullshit to scare people out of editing their user agent.

Clients also self-report the amount of data transferred. That's not great in a community that fetishizes share ratios. I've heard an op say "there's no excuse for having a ratio less than 1", which makes as much mathematical sense as the parent who told my (math) teacher friend "this is [private school], no student should be below average".

You can theoretically verify upload/download numbers because the total amount uploaded in a swarm should equal the total amount downloaded, but there are all kinds of reasons why the numbers wouldn't match. Maybe a client lost connection and couldn't send its final announce. Maybe one client is sending bad data (I'm not sure how that is reported, might be implementation specific). And clients only send transfer total updates when they connect to the tracker to change status or request more peers, so every client will have a different degree of staleness.

Even if you can tell that someone in a swarm is lying, who is your culprit? As long as they're not being egregious, there's no way to tell.


There's at least some differences, such as HTTP/2 usage, or maybe algorithm usage/bugs in newer versions. Whether or not most tracker staff actually bother to attempt fingerprinting, IDK.


> "there's no excuse for having a ratio less than 1"

Maybe the context of the quote was in regard to a private tracker?

The "ratio" in terms of private trackers isn't always the real ratio of GBs uploaded or downloaded.

There are for example some private trackers that grant additional credit for longer seed-time or they declare specific torrents "freeleech" so they don't "cost" ratio.

In the end they are just some of the measures private trackers take to strengthen their network, but they lead to a confusing definition of "ratio".


Yes, but this tracker did not offer any such things. The only reason the mean ratio of active members was > 1 was because of the steady stream of users being banned for low ratios.


How did you learn to build a BitTorrent client? I love the idea of codecrafters and books that walk you through building something but I always struggle if I don't have something to get me started.


My starting point was searching for tutorials and asking ChatGPT to implement a torrent parser :) There are great blog posts [0] for building a BitTorrent client. Along the way, I referenced open-source implementations and the BitTorrent Protocol Specification as well [1].

[0] https://blog.jse.li/posts/torrent/

[1] https://www.bittorrent.org/beps/bep_0003.html


I've actually just begun work on a project involving bittorrent, I don't think I will be implementing a client but maybe, I'm certainly going to have to implement generation of .torrent files and magnet links, so I've been learning a little bit about the protocol. Interesting to see this pop up and maybe I can use it to help me make headway.


> After completing all the challenges on the site, I ran out of things to do. This is when I decided to build a BitTorrent client which was one of the highly voted ideas on the site.

Are you employed by them now?


I'm not an employee of Codecrafters. I worked with them for the BitTorrent challenge as an independent contractor.


This is such a cool result. I would love to see the next step of creating a tool like Resilio sync that builds upon this work!


I don't know if this is feedback on the challenge per-se, but I was a little saddened that I couldn't add dependencies to my `Cargo.toml`; I wanted to solve part of the bencode challenge using nom (perhaps overkill, but it was for fun), but I can't.

If this is a concern of load/execution times on a remote builder, it would be cool if I had some way to run the testcases locally to avoid this concern


We just shipped support for this last week! You should be able to edit the file now. We’ll remove the comments saying those files can’t be edited soon.


Oh! Awesome! I'll have to give this a shot, then :)


Are we gonna get extensions to the challenge?


Better yet, extensions to implement BitTorrent extensions? Such as the mutable torrent BEP and others? It'd be great if this spawned hundreds of top quality BitTorrent clients


Build your own Extension


To limit the scope of the challenge, I had to leave out a lot of cool features you see in modern clients: magnet links, UDP trackers, DHT

I'm planning to add them over time starting with magnet links which is the highest voted extension idea right now


Jon Gjengset[1] is currently doing a livestream on the same challenge in Rust, on his YT channel[2].

[1]: https://thesquareplanet.com/

[2]: https://www.youtube.com/watch?v=jSTkEPPiULs


Oh hey, that's me!

A better link is https://www.youtube.com/watch?v=jf_ddGnum_4 which has chapter marks and has the power outage in the middle spliced away :p


Hi Jon, thanks for recording! I'm excited to watch this. Do you happen to know why captions aren't enabled on your videos? Oftentimes the issue is that the video's primary language isn't set. Once this is done, youtube will probably caption the rest, though I'm not sure if that's true of videos of all lengths.


Hope you enjoy it! I have captions "enabled" (and primary language set) on all my videos, but my experience has been that YouTube is very hit-or-miss with whether it adds auto-caption to longer videos (somewhere around 2h seems to be the limit). Sometimes it appears later, it just takes a while, other times it just never manifests. It's unfortunate, but as far as I can tell there's nothing I can do about it :'(


Hi Jon, challenge author here! First time watching your content, it was fun to see a Rust expert go through the challenge live. Saw the first hour, I noticed that during Bencode parsing, trying to find the most elegant way to implement it slowed you down a bit. (I also have this tendency and I'm sure having so many viewers doesn't help :)) Great progress by the way in 4 hours, hope you get to finish the challenge soon!


thanks Jon really enjoyed the process. I always wondered why oci registries don't bit torrent the images. now I understand why they might not have been fond of the approach.


Hey! Came across your videos randomly a few months ago and just wanted to say great content. Funny running into you here.


Thank you for pointing this out. I only got to catch the tail end, but it was really cool to watch.


You can re-watch it using the same URL if you want to see it from the beginning.


There's also a copy of the livestream on Twitch [0], for those who are blocked on YouTube.

[0] https://www.twitch.tv/videos/1954769913


It is amazing to see him code. Are there other expert programmers out there who code in Python (instead of Rust)?


What a fun live-stream


what's with the sign in required, is this a paid tutorial?

Here are some free tutorials:

JS - https://allenkim67.github.io/programming/2016/05/04/how-to-m...

GO - https://blog.jse.li/posts/torrent/

Python - https://markuseliasson.se/article/bittorrent-in-python/


You might also find tutorials here useful https://github.com/codecrafters-io/build-your-own-x


It isn't a tutorial/course but a guided step-by-step project with server-side tests for every step.


Thanks


Why in the world does this want access to my github account, with 0 explanation as to why.


It does say it only asks for your email (read) privileges tbf, but yeah I didn't bother either after that step.


This gives the site read access to your private (hidden) Github email address(es) - I think often developers implementing the Github API think this permission is needed to access your public email, & it gets requested unintentionally.


Not every account sets a public email, so this ensures that the app gets a reachable email address for every user.


I don't see why they can't present it as a simple list of blog articles with a link to the repository.

It achieves the same thing without anyone knowing who I am.


Well, because codecrafters.io would love to see you as a paying customer: https://codecrafters.io/pricing


CodeCrafters co-founder here.

You're right, we could present it as just the articles.

Our overview is publicly accessible and doesn't require a paywall. https://app.codecrafters.io/courses/bittorrent/overview

All the "content" is also available on our GitHub https://github.com/codecrafters-io/build-your-own-bittorrent

If you'd like us to run tests against your code, show you progress, community examples, hints, and so on, then you can do the interactive experience, which requires signing up.


> All the "content" is also available on our GitHub https://github.com/codecrafters-io/build-your-own-bittorrent

In this Git repository, I see various stages of the source code, and Docker files.

The claim "all the 'content' is also available on our GitHub" clearly does not hold, because at least the article texts are missing there.

EDIT: Sorry, it is there, as was pointed out to me in the answer: https://github.com/codecrafters-io/build-your-own-bittorrent...


The course definition file contains all the prompts (including nuances per language) that are viewable on the UI. https://github.com/codecrafters-io/build-your-own-bittorrent...



Looks like one reason is Codecrafters is a learning site which uses Github to store users code, instead of hosting it themselves. This post is about one project which is at Codecrafters

They should explain why though


By default, you're not required to publish your code to GitHub (although you can sync with a couple clicks). By you get a repo for you to work out of, which is hosted on CodeCrafters' git servers.


Looks like they're just leaning on Github as an auth provider without offering any alternatives. Most sites that do this offer a range of options, including normal email signup.


Almost all of our target audience has a GitHub account, so using it for auth keeps things simple and is one less thing to manage :)


It's required for authentication (the original post links to app.codecrafters.io) — codecrafters.io is the marketing page :)


You upload the challenge to your git, it's then runs a few tests on your implementation.


This links to the product's application, the main website clears things up: https://codecrafters.io/


yeah, thanks but no thanks.


Oddly the uploading part is missing. To be a peer, an equal, things have to flow both ways. Bittorrent wouldn't work otherwise.


Hi, course author here! You're right, this was left out initially to scope down the challenge (it's already quite long with 11 stages). We're planning to add it as a challenge extension, uploading is one of the highly voted challenge extension ideas by the community.


If it were structured in fashion closer along the protocol startup process or how a real client is structured more people might end up with a functional client rather than a "bittorrent leecher". Up to "peer handshake" most things look fine. From there it rushes towards a half-finished goal and if people stop there you're leaving the internet littered with one-sided examples.

After the handshake you'll need state machines to handle multiple peers, piece-sets to track what you've downloaded, the rarest-first algorithm[0], message processing and so on. The final result of having a downloaded file falls out of that almost as a side-product at some point.

[0] http://bittorrent.org/bittorrentecon.pdf


Yep, "Seeding files" is an extension: https://app.codecrafters.io/vote/challenge-extension-ideas?c.... Not supported yet, but will be available soon (we add new extensions based on user votes)


For folks who speak JavaScript, Feross built WebTorrent which brought the protocol into a browser tab.

The code base is delightful to read. A lot of developers are better at writing code than they are reading it. If you're wanting to flex your code-reading muscles, I haven't found many better places to start than the WebTorrent codebase. I put it up there with redis in being fun to read.

https://github.com/webtorrent/webtorrent



I really like this idea! I wanted to leant Erlang some time ago and a friend wanted to learn Crystal, so we set out to be able to share files between each other with completely custom clients! It was so much fun when we were able to exchange files with the base protocol and some.. Is it call BEP? Enhancements tk the protocol?

It's probably my favourite way of learning a new language, as it's simple enough to understand and implement


There's Crystal support on CodeCrafters too! You can Build your own Redis.


As the years go on, I find there are fewer and fewer small project ideas that give me butterflies. But this is one!

Hadn't heard of CodeCrafters before but I love how academic their challenge ideas are (eg build a DB). I'd love to see a compiler build in there too.


Build your own Interpreter is in the works! https://twitter.com/codecraftersio/status/168850373608654028...


Reminds me of building a bittorrent client in go: https://blog.jse.li/posts/torrent/


Thanks! That looks like a treasure and I can’t wait to do my own implementation.


Years ago, one of my previous roles was building and supporting a custom linux live OS that could be used by employees on cheap netbooks (remember those?). To distribute updates, I ended up building our own internal torrent server from scratch and used it to distribute image updates. It was a good learning exercise as one of the first times I had built software to conform to a standard and work with out of the box clients.


Fun times


For anyone who is interested in peer-to-peer systems like this, and completes Sarp's course, I have an open interview challenge you can submit it to if you find that you want to continue building in this space as a profession:

https://gitlab.com/webai-open/network/interview-challenge

Take the guidelines to heart though. We evaluate you on demonstrating understanding of what you did, not that you completed the course.

My advice for standing out would be to continue building on it past the end of the course and do something cool yourself.


This is so cool! Have you already hired people using this method? Or is this a new initiative?


It's new, and an experiment, but I have high hopes for it.


Do y'all really only work M-Th? If so, that's awesome.


Yes.

Our salary bands are competitive outside of FAANG (my previous life was at NFLX), we are fully remote, nomad friendly, and work 4 day weeks.

You can come on full time or as a contractor (your choice).

We give space to learn and do things right. Are comfortable investing in knowledge today to see compounding returns tomorrow. For example, I spent the first 3 weeks of my employment here sitting on my couch reading research papers. That paid dividens, we collapsed a 1.5 year timeline into a 1 month timeline. 3 weeks of reading research papers and 1 week of building got us to a milestone we didn't plan on reaching until a year+ into the project: we trained a model running on a developers laptop in Grand Rapids Michigan against a dataset sitting on another laptop in Yorkshire, with a fully auditable CI/CD log of what data was fed into the model.

Another example, the team decided we should do rust to cross-compile to WASM etc. from day one, so we all took 2 weeks to study the Rust Book and learn together. Now we have a subset of our p2p stuff compiling to WASM and running in headless firefox during our integration tests from day one.

Pretty flexible in every respect, just need good people who can help build this.


Honestly, sounds awesome.


Does anyone know why BitTorrent-based Linux package managers has not become a thing?


Most linux distributions have a rather robust mirroring operation. This is much faster than bittorrent. Lots of cloud vendors provide a mirror endpoint for traffic within their network, and ISPs typically also have them.

The network is fast enough for lots of small files to not really justify it.


They could scale back their mirroring operation to only a few servers that act as Webseeds? Then you get the best of both worlds.

http://bittorrent.org/beps/bep_0019.html


I think that hundreds of thousands of peer machines would be way faster than a few centralized repositories... right?

It seems you would have essentially unlimited bandwidth.


Finding peers is somewhat slow. When downloading via torrent it takes a while to really ramp up in speed. With a centralized repository you can start downloading basically right away. For many small files the latency is more important than the bandwidth, given that most repositories are not short on bandwitdth.


That seems easy to work around. You could write it in such a way that it would have a live cache of peers ready to go plus also the centralized peers would always be available.

I personally suspect that having centralized repos is a legacy technology from pre-torrent days when also disk wasnt cheap.


Picking a peer you have good bandwidth to is a hard problem.


> The network is fast enough for lots of small files to not really justify it.

Not really true. Try the following: ssh into the host doing the mirroring, with socks proxy. Now socksify apt through the connection.

The speed difference is amazing. Despite double encrypting.


Though with webseeds it shouldn't matter that much, not to add redundancy with BT.



Bittorrent is most useful as a long-running application or background service to make data available. To get any use out of it in short-running contexts (e.g. updaters) you need lots of machines jumping on the content simultanously.


I wonder if you could insert your own downloader to pacman / whatever without touching the source.. I might try to do this at some point if I have time. I often get over 100MBps from torrents but only like 5MBps from package repositories (though on good days that might be up to 40MBps)


Is this BitTorrent course free (as beer)? I can see a neighbour course about HTTP server "free during beta".


Having done a couple of their courses without paying:

You are expected to complete the project in steps they define (so for their Redis project, step 1 is to bind to a port, step 2 is to respond to a PING command, etc). If you choose not to pay, you can only complete one step per day, even if you submit code which would pass future steps.

This can be quite frustrating, since each step is often very simple, and IMO discourages producing a well-architected solution which anticipates future requirements, as you're left waiting 24 hours to press the submit button for code you've already written.

Still: It's free, and the restricted progress forced me to not use it for procrastination purposes, so there's that.


It's better for me this way, having things broken down to piecemeal level like this allow me to avoid overthinking and know when to just produce a solution and accept it.


There's also this repo if you're keen on free resources: github.com/codecrafters-io/build-your-own-x


Do submissions have deadlines?


No deadlines!


I don't think that option is there anymore.

Started the build-your-own-git tutorial and there is a hard paywall after the 3rd stage.


They have some free tier and paid tier and I am not sure what is in what.

https://codecrafters.io/pricing


Like GP I was also confused and tried looking for a pricing page but failed. Seems like there isn't a link to it from https://app.codecrafters.io/catalog which is the site you go to if you click the big CodeCrafters logo in the top left of the page.

There is a "Subscribe" button which takes you to https://app.codecrafters.io/pay but I wasn't savvy enough to notice it or realize what it was.

Only after starting a course did I begin to suspect that I needed to subscribe since there were a bunch of locks all over. However, it was not clear that the locks actually did anything since I could still click on those links. I guess I would have found out after completing the first step and not being able to progress. This appears to be by design, which strikes me as slightly dishonest.


Thank you for pointing this out. It's not by design. We actually revised our marketing pages recently, and had a regression that got rid of the tooltip that explained what was free. We have a Linear task for fixing it back, but we weren't expecting this HN post today and so it wasn't the top of our list to fix :)

I've prioritised it now tho


This is fixed btw


very much paid


Can anyone comment as to how far one can go before paying for the codecrafters service?

The crowd that is interested in these kinds of experiences may also like Protohackers, which is completely free.


The Build your own HTTP Server challenge is currently completely free.

Otherwise, you can do the first 2 stages of any challenge without paying. You can also check out all the prompts and overview for all challenges without a paywall, and you can attempt beyond stage 2 if you have the membership.

You might also find tutorials here useful https://github.com/codecrafters-io/build-your-own-x


Have been exploring this and it is a pretty fun way to learn some of the intrinsics of some products we use daily!

I have also had positive interactions with Sarp so hoping this product takes off!


Does this include the DHT? BT, the protocol itself is not very interesting, it's just a very bad file access protocol over HTTP without DHT, which makes it really P2P.


(codecrafters dev here)

It doesn't, but will soon: https://app.codecrafters.io/vote/challenge-extension-ideas?c....

We release a set of "base" stages first, and then work on extensions based on demand. DHT is one of them, magnet links is another that folks have voted for.


That's awesome. Was about to dive into DHT myself.


Great!


I don't think this is a fair description of the BitTorrent protocol. HTTP is only used for discovering peers; the data transfer is P2P using a custom protocol. (I'm ignoring optional extensions like HTTP seeding.)


No, only uses a tracker.


Wow this looks like a course I am willing to pay for.

And I wanted to learn Rust anyway :)


Also good for other languages :)


This is super fun! I did the same back in uni, it was an awesome project: https://github.com/bbpcr/Yomato


Viewing on mobile.

Not sure what I see. There is no content, nothing clickable. Just decode bencoded strings and integers and some comments.

What's the point?


Yeah the mobile experience could use more work, but tbh it's intended for desktop use (since it requires working with a terminal and code editor)


I love the idea of this site but I don’t understand the pricing model. It says 30$/month but I can only pay 120$ for 3 months. It should be 90$ for 3 month. A monthly payment model would be more accessible. I do I misunderstood something?


I am curious has anyone subscribed to codecrafters.io and if so what was your impression of the service?


I subscribed for a few months last year and enjoyed it. I liked it enough that I’m subscribing for a life time now.

There’s a few different learning styles out there, but personally for me, I learn best while doing projects. That’s why this service hits the sweet spot for me.

If you’re like me in this way, then I’d recommend it.


I am life time subscriber, as someone who likes to learn by doing, this platform was perfect to me. I didn't ever think i will look into internals but this one seduced me to dive deep in a fun way.


> downloading a file from a single peer

The hard part is avoided?


It seems to be failing to create a repository right now


We got a massive inflow of traffic from HN today and that seems to have hit some rate limits with GitHub. We're looking into it.


I think it's breaking some other things as well

[stage-1] Expected "\"orange\"\n" as stdout, got: "Hello World 2!\n"


(codecrafters dev here)

This should only affect C# repositories, and we're working on a fix - C# support is relatively new, and looks like there are some teething issues with the caching mechanism we use to run tests (we aim to keep median response times <3s).


(codecrafters dev here)

Rate limits issue should be fixed now!


What a fun site


I wish more folks distributed Linux ISOs via Bittorrent since it has an integrity check built into the protocol -- messing with PGP is hard and showing me an MD5 sum over a self signed certificate is... just special.


Instead of replacing the md5sum on the download page an attacker could replace the infohash/magnet link/.torrent file.


messing with PGP is hard _FOR YOU_. Your inadequacies are not universal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: