Hi! Sarp here, author of the Build your own BitTorrent challenge on Codecrafters.
Some back story: After being laid off from my FAANG job, I found myself very unmotivated to go back. I started looking for interesting programming projects to revive my interest in coding. While nomading, I discovered Codecrafters on Nomadlist and really liked the push code to git and pass different stages interaction. The gamification helped me focus and projects allowed me to go deeper on software I used (SQLite, Git, Redis etc.). I even picked up a new language (Go) to do the challenges with. After completing all the challenges on the site, I ran out of things to do. This is when I decided to build a BitTorrent client which was one of the highly voted ideas on the site.
I learned many new things by building a BitTorrent client: the BitTorrent protocol, how torrent files are structured, encoding issues, pipelining network requests, url encoding binary values, using Channels in Go etc.
I’d love any feedback on the challenge. Also happy to answer any questions!
Its not uncommon for invite-only trackers to be very prescriptive about which torrent clients can and can't be used with their private tracker. Any ideas on how to overcome this obstacle to wider adoption?
User agents, fingerprinting, etc. There are certainly ways to mask your client, but these would be considered cheating by most private trackers and would be grounds for a ban.
There's a whole ton of bluster in the torrent community. Bittorrent is a simple protocol, to the point of being naive (and therefore a fun toy network app project). Clients identify themselves to the tracker by user agent; there's really nothing else to fingerprint against. Claims to the contrary are almost certainly bullshit to scare people out of editing their user agent.
Clients also self-report the amount of data transferred. That's not great in a community that fetishizes share ratios. I've heard an op say "there's no excuse for having a ratio less than 1", which makes as much mathematical sense as the parent who told my (math) teacher friend "this is [private school], no student should be below average".
You can theoretically verify upload/download numbers because the total amount uploaded in a swarm should equal the total amount downloaded, but there are all kinds of reasons why the numbers wouldn't match. Maybe a client lost connection and couldn't send its final announce. Maybe one client is sending bad data (I'm not sure how that is reported, might be implementation specific). And clients only send transfer total updates when they connect to the tracker to change status or request more peers, so every client will have a different degree of staleness.
Even if you can tell that someone in a swarm is lying, who is your culprit? As long as they're not being egregious, there's no way to tell.
There's at least some differences, such as HTTP/2 usage, or maybe algorithm usage/bugs in newer versions. Whether or not most tracker staff actually bother to attempt fingerprinting, IDK.
> "there's no excuse for having a ratio less than 1"
Maybe the context of the quote was in regard to a private tracker?
The "ratio" in terms of private trackers isn't always the real ratio of GBs uploaded or downloaded.
There are for example some private trackers that grant additional credit for longer seed-time or they declare specific torrents "freeleech" so they don't "cost" ratio.
In the end they are just some of the measures private trackers take to strengthen their network, but they lead to a confusing definition of "ratio".
Yes, but this tracker did not offer any such things. The only reason the mean ratio of active members was > 1 was because of the steady stream of users being banned for low ratios.
How did you learn to build a BitTorrent client? I love the idea of codecrafters and books that walk you through building something but I always struggle if I don't have something to get me started.
My starting point was searching for tutorials and asking ChatGPT to implement a torrent parser :) There are great blog posts [0] for building a BitTorrent client. Along the way, I referenced open-source implementations and the BitTorrent Protocol Specification as well [1].
I've actually just begun work on a project involving bittorrent, I don't think I will be implementing a client but maybe, I'm certainly going to have to implement generation of .torrent files and magnet links, so I've been learning a little bit about the protocol. Interesting to see this pop up and maybe I can use it to help me make headway.
> After completing all the challenges on the site, I ran out of things to do. This is when I decided to build a BitTorrent client which was one of the highly voted ideas on the site.
I don't know if this is feedback on the challenge per-se, but I was a little saddened that I couldn't add dependencies to my `Cargo.toml`; I wanted to solve part of the bencode challenge using nom (perhaps overkill, but it was for fun), but I can't.
If this is a concern of load/execution times on a remote builder, it would be cool if I had some way to run the testcases locally to avoid this concern
We just shipped support for this last week! You should be able to edit the file now. We’ll remove the comments saying those files can’t be edited soon.
Better yet, extensions to implement BitTorrent extensions? Such as the mutable torrent BEP and others? It'd be great if this spawned hundreds of top quality BitTorrent clients
Hi Jon, thanks for recording! I'm excited to watch this. Do you happen to know why captions aren't enabled on your videos? Oftentimes the issue is that the video's primary language isn't set. Once this is done, youtube will probably caption the rest, though I'm not sure if that's true of videos of all lengths.
Hope you enjoy it! I have captions "enabled" (and primary language set) on all my videos, but my experience has been that YouTube is very hit-or-miss with whether it adds auto-caption to longer videos (somewhere around 2h seems to be the limit). Sometimes it appears later, it just takes a while, other times it just never manifests. It's unfortunate, but as far as I can tell there's nothing I can do about it :'(
Hi Jon, challenge author here! First time watching your content, it was fun to see a Rust expert go through the challenge live. Saw the first hour, I noticed that during Bencode parsing, trying to find the most elegant way to implement it slowed you down a bit. (I also have this tendency and I'm sure having so many viewers doesn't help :)) Great progress by the way in 4 hours, hope you get to finish the challenge soon!
thanks Jon really enjoyed the process. I always wondered why oci registries don't bit torrent the images. now I understand why they might not have been fond of the approach.
This gives the site read access to your private (hidden) Github email address(es) - I think often developers implementing the Github API think this permission is needed to access your public email, & it gets requested unintentionally.
If you'd like us to run tests against your code, show you progress, community examples, hints, and so on, then you can do the interactive experience, which requires signing up.
Looks like one reason is Codecrafters is a learning site which uses Github to store users code, instead of hosting it themselves. This post is about one project which is at Codecrafters
By default, you're not required to publish your code to GitHub (although you can sync with a couple clicks). By you get a repo for you to work out of, which is hosted on CodeCrafters' git servers.
Looks like they're just leaning on Github as an auth provider without offering any alternatives. Most sites that do this offer a range of options, including normal email signup.
Hi, course author here! You're right, this was left out initially to scope down the challenge (it's already quite long with 11 stages). We're planning to add it as a challenge extension, uploading is one of the highly voted challenge extension ideas by the community.
If it were structured in fashion closer along the protocol startup process or how a real client is structured more people might end up with a functional client rather than a "bittorrent leecher".
Up to "peer handshake" most things look fine. From there it rushes towards a half-finished goal and if people stop there you're leaving the internet littered with one-sided examples.
After the handshake you'll need state machines to handle multiple peers, piece-sets to track what you've downloaded, the rarest-first algorithm[0], message processing and so on.
The final result of having a downloaded file falls out of that almost as a side-product at some point.
For folks who speak JavaScript, Feross built WebTorrent which brought the protocol into a browser tab.
The code base is delightful to read. A lot of developers are better at writing code than they are reading it. If you're wanting to flex your code-reading muscles, I haven't found many better places to start than the WebTorrent codebase. I put it up there with redis in being fun to read.
I really like this idea! I wanted to leant Erlang some time ago and a friend wanted to learn Crystal, so we set out to be able to share files between each other with completely custom clients! It was so much fun when we were able to exchange files with the base protocol and some.. Is it call BEP? Enhancements tk the protocol?
It's probably my favourite way of learning a new language, as it's simple enough to understand and implement
Years ago, one of my previous roles was building and supporting a custom linux live OS that could be used by employees on cheap netbooks (remember those?). To distribute updates, I ended up building our own internal torrent server from scratch and used it to distribute image updates. It was a good learning exercise as one of the first times I had built software to conform to a standard and work with out of the box clients.
For anyone who is interested in peer-to-peer systems like this, and completes Sarp's course, I have an open interview challenge you can submit it to if you find that you want to continue building in this space as a profession:
Our salary bands are competitive outside of FAANG (my previous life was at NFLX), we are fully remote, nomad friendly, and work 4 day weeks.
You can come on full time or as a contractor (your choice).
We give space to learn and do things right. Are comfortable investing in knowledge today to see compounding returns tomorrow. For example, I spent the first 3 weeks of my employment here sitting on my couch reading research papers. That paid dividens, we collapsed a 1.5 year timeline into a 1 month timeline. 3 weeks of reading research papers and 1 week of building got us to a milestone we didn't plan on reaching until a year+ into the project: we trained a model running on a developers laptop in Grand Rapids Michigan against a dataset sitting on another laptop in Yorkshire, with a fully auditable CI/CD log of what data was fed into the model.
Another example, the team decided we should do rust to cross-compile to WASM etc. from day one, so we all took 2 weeks to study the Rust Book and learn together. Now we have a subset of our p2p stuff compiling to WASM and running in headless firefox during our integration tests from day one.
Pretty flexible in every respect, just need good people who can help build this.
Most linux distributions have a rather robust mirroring operation. This is much faster than bittorrent. Lots of cloud vendors provide a mirror endpoint for traffic within their network, and ISPs typically also have them.
The network is fast enough for lots of small files to not really justify it.
Finding peers is somewhat slow. When downloading via torrent it takes a while to really ramp up in speed. With a centralized repository you can start downloading basically right away. For many small files the latency is more important than the bandwidth, given that most repositories are not short on bandwitdth.
That seems easy to work around. You could write it in such a way that it would have a live cache of peers ready to go plus also the centralized peers would always be available.
I personally suspect that having centralized repos is a legacy technology from pre-torrent days when also disk wasnt cheap.
Bittorrent is most useful as a long-running application or background service to make data available. To get any use out of it in short-running contexts (e.g. updaters) you need lots of machines jumping on the content simultanously.
I wonder if you could insert your own downloader to pacman / whatever without touching the source.. I might try to do this at some point if I have time. I often get over 100MBps from torrents but only like 5MBps from package repositories (though on good days that might be up to 40MBps)
Having done a couple of their courses without paying:
You are expected to complete the project in steps they define (so for their Redis project, step 1 is to bind to a port, step 2 is to respond to a PING command, etc). If you choose not to pay, you can only complete one step per day, even if you submit code which would pass future steps.
This can be quite frustrating, since each step is often very simple, and IMO discourages producing a well-architected solution which anticipates future requirements, as you're left waiting 24 hours to press the submit button for code you've already written.
Still: It's free, and the restricted progress forced me to not use it for procrastination purposes, so there's that.
It's better for me this way, having things broken down to piecemeal level like this allow me to avoid overthinking and know when to just produce a solution and accept it.
Like GP I was also confused and tried looking for a pricing page but failed.
Seems like there isn't a link to it from https://app.codecrafters.io/catalog which is the site you go to if you click the big CodeCrafters logo in the top left of the page.
There is a "Subscribe" button which takes you to https://app.codecrafters.io/pay but I wasn't savvy enough to notice it or realize what it was.
Only after starting a course did I begin to suspect that I needed to subscribe since there were a bunch of locks all over. However, it was not clear that the locks actually did anything since I could still click on those links.
I guess I would have found out after completing the first step and not being able to progress. This appears to be by design, which strikes me as slightly dishonest.
Thank you for pointing this out. It's not by design. We actually revised our marketing pages recently, and had a regression that got rid of the tooltip that explained what was free. We have a Linear task for fixing it back, but we weren't expecting this HN post today and so it wasn't the top of our list to fix :)
The Build your own HTTP Server challenge is currently completely free.
Otherwise, you can do the first 2 stages of any challenge without paying. You can also check out all the prompts and overview for all challenges without a paywall, and you can attempt beyond stage 2 if you have the membership.
Does this include the DHT? BT, the protocol itself is not very interesting, it's just a very bad file access protocol over HTTP without DHT, which makes it really P2P.
We release a set of "base" stages first, and then work on extensions based on demand. DHT is one of them, magnet links is another that folks have voted for.
I don't think this is a fair description of the BitTorrent protocol. HTTP is only used for discovering peers; the data transfer is P2P using a custom protocol. (I'm ignoring optional extensions like HTTP seeding.)
I love the idea of this site but I don’t understand the pricing model. It says 30$/month but I can only pay 120$ for 3 months. It should be 90$ for 3 month.
A monthly payment model would be more accessible. I do I misunderstood something?
I subscribed for a few months last year and enjoyed it. I liked it enough that I’m subscribing for a life time now.
There’s a few different learning styles out there, but personally for me, I learn best while doing projects. That’s why this service hits the sweet spot for me.
If you’re like me in this way, then I’d recommend it.
I am life time subscriber, as someone who likes to learn by doing, this platform was perfect to me. I didn't ever think i will look into internals but this one seduced me to dive deep in a fun way.
This should only affect C# repositories, and we're working on a fix - C# support is relatively new, and looks like there are some teething issues with the caching mechanism we use to run tests (we aim to keep median response times <3s).
I wish more folks distributed Linux ISOs via Bittorrent since it has an integrity check built into the protocol -- messing with PGP is hard and showing me an MD5 sum over a self signed certificate is... just special.
Some back story: After being laid off from my FAANG job, I found myself very unmotivated to go back. I started looking for interesting programming projects to revive my interest in coding. While nomading, I discovered Codecrafters on Nomadlist and really liked the push code to git and pass different stages interaction. The gamification helped me focus and projects allowed me to go deeper on software I used (SQLite, Git, Redis etc.). I even picked up a new language (Go) to do the challenges with. After completing all the challenges on the site, I ran out of things to do. This is when I decided to build a BitTorrent client which was one of the highly voted ideas on the site.
I learned many new things by building a BitTorrent client: the BitTorrent protocol, how torrent files are structured, encoding issues, pipelining network requests, url encoding binary values, using Channels in Go etc.
I’d love any feedback on the challenge. Also happy to answer any questions!