Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Learning about distributed systems?
35 points by shahrk on July 23, 2020 | hide | past | favorite | 22 comments
I used to love Operating Systems during my undergrads, Modern Operating Systems by Tanenbaum is till date the only academic book I've read entirely. I recently read an article about how Amazon built Aurora by Werner Vogels and I was captivated by it. I want to start reading about Distributed Systems. What would be a good start/Road Map?



I posted this recently, but MIT's 6.824: Distributed Systems (taught by Robert Morris, of both Morris worm and Viaweb/Y Combinator fame) is completely open and available online, and it includes video lectures, notes, readings, and programming assignments from as recent as Spring 2020 (including half of the lectures recorded from home as the pandemic strikes). The assignments even include auto-graded testing scripts, so you can verify your solution to the assignments.

https://pdos.csail.mit.edu/6.824/


I posted this as well in the last thread about this class, but since we're discussing it again there is an active study group doing the labs in Clojure on reddit:

https://www.reddit.com/r/mit6824clojure/

Several of the labs have been ported in full (map-reduce, first part of RAFT lab), including test scripts. Please join if this is interesting to you.. the more the merrier!


Awesome resource to improve your Go skills at the same time since all assignments are in Go.


having gone through some assignments, can say that they are pretty helpful


The book “Designing Data-Intensive Applications” by Martin Kleppman is a fantastic read with such a concise train of thought. It builds up from basics, adds another thing, and another thing.

I kept asking myself, what would happen if I were to extend on the feature currently presented in the chapter I was reading, only to find out my answers in the next chapter.

Brilliant book


Problem with this book is: what to read after. The book is really good but leaves you with the feeling "there are some many topics I don't know yet... but all the other books out there suck". Any recommendation for someone that has already read DDIA?


https://github.com/ept/ddia-references

Here is their git repository of all online references, by chapter. Seems it doesn't include papers that don't have a clear, public link. If you have the book, though, you can get the names and search for them.


I'm currently reading Distributed Systems by Tanenbaum [1]. It goes into more detail and it's more extensive. It's more outdated but the fundamentals are there.

[1]https://www.amazon.com/gp/product/1543057381/ref=ppx_yo_dt_b...


Each chapter is extensively and exhaustively referenced with the source papers. How about reading them?


> "Designing Data-Intensive Applications” by Martin Kleppman: https://dataintensive.net/ https://g.co/kgs/xJ73FS


Yep, came here to say this.


This was truly one of the greatest, if not the greatest, book on software that I have ever read. I have read it twice at this point and fully intend to read it many more times. It is packed full of incredibly interesting information and written in a way that keeps you interested.

Not only is the main content great, but the references are numerous and open up entirely new sets of material as you progress.


And just to extend on the numorous references, this book reminded me what a time saver a good book is compared to lots of googling and blog posts.

I struggled with blog posts on raft, Byzantine fault tolerance, CAP theorem, transactions, serializable, and this book was my enlightenment


> this book reminded me what a time saver a good book is compared to lots of googling and blog posts

I had the same reaction! The information density is perfect. If anybody else loved this book and has similar recommendations, please share.


* This System Design Primer [1] on GitHub is a decent overview of how large-scale apps are designed, with jumping-off points into many different subjects.

* The Morning Paper blog's distributed systems tag [2] has a lot of good summaries of research on distributed systems, both from academia and industry.

* I maintain a list of assorted resources on distributed system design and operations on GitHub. [3]

* Also, as mentioned, Designing Data-Intensive Applications is a good starting place.

[1] https://github.com/donnemartin/system-design-primer

[2] https://blog.acolyer.org/tag/distributed-systems/

[3] https://github.com/DylanSp/distributed-systems-resources


Stumbled on this which I felt was a very good compendium - https://dancres.github.io/Pages/

Would also recommend reading VLDB and DB it shows how distri algorithms are applied - http://www.vldb.org/pvldb/vol9.html - http://www.redbook.io/

Disclaimer: I used to work at Couchbase(distributed NoSQL database) as a PM and launched Eventing.


Speaking of Werner Vogels, have you seen this blog post on DistSys reading? https://www.allthingsdistributed.com/2012/12/paper-readings-...


While I don't see it as a starting point (I think the topics require more context), I'm a big fan of the articles Amazon has published recently as the "Builder's Library"

https://aws.amazon.com/builders-library/?cards-body.sort-by=...


Maarten van Steen has got you covered. He worked with Andrew Tanenbaum on all kinds of things back in the day :)

https://www.distributed-systems.net/index.php/books/ds3/


I had found this to be a really good resource. High level patterns to use. https://www.oreilly.com/library/view/designing-distributed-s...


From a previous question re: "Ask HN: CS papers for software architecture and design?" (https://news.ycombinator.com/item?id=15778396 and distributed systems we eventually realize were needed in the first place:

> Bulk Synchronous Parallel: https://en.wikipedia.org/wiki/Bulk_synchronous_parallel .

Many/most (?) distributed systems can be described in terms of BSP primitives.

> Paxos: https://en.wikipedia.org/wiki/Paxos_(computer_science) .

> Raft: https://en.wikipedia.org/wiki/Raft_(computer_science) #Safety

> CAP theorem: https://en.wikipedia.org/wiki/CAP_theorem .

Papers-we-love > Distributed Systems: https://github.com/papers-we-love/papers-we-love/tree/master...

awesome-distributed-systems also has many links to theory: https://github.com/theanalyst/awesome-distributed-systems

- Byzantine fault: https://en.wikipedia.org/wiki/Byzantine_fault :

> A [Byzantine fault] is a condition of a computer system, particularly distributed computing systems, where components may fail and there is imperfect information on whether a component has failed. The term takes its name from an allegory, the "Byzantine Generals Problem",[2] developed to describe a situation in which, in order to avoid catastrophic failure of the system, the system's actors must agree on a concerted strategy, but some of these actors are unreliable.

awesome-bigdata lists a number of tools: https://github.com/onurakpolat/awesome-bigdata

Practically, dask.distributed (joblib -> SLURM,), dask ML, dask-labextension (a JupyterLab extension for dask), and the Rapids.ai tools (e.g. cuDF) scale from one to many nodes.


Not without a sense of irony, as the lists above list many papers that could be readings with quizzes,

Distributed systems -> Distributed computing: https://en.wikipedia.org/wiki/Distributed_computing

Category: Distributed computing: https://en.wikipedia.org/wiki/Category:Distributed_computing

Category:Distributed_computing_architecture : https://en.wikipedia.org/wiki/Category:Distributed_computing...

DLT: Distributed Ledger Technology: https://en.wikipedia.org/wiki/Distributed_ledger

Consensus (computer science) https://en.wikipedia.org/wiki/Consensus_(computer_science)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: