Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What are some interesting papers in CS for a beginner?
534 points by avinassh on Nov 14, 2017 | hide | past | favorite | 91 comments
I am looking for papers which are easy to understand, papers which are for undergraduates. I often stumble upon papers which require lots of reading to do and soon I have dug into a recursive rabbit hole. I would like to know about papers which have minimal citations/dependencies and only require knowledge of CS basics.

A good example of such paper would be Bitcoin. You don't need to be a CS wizard to understand the paper. Plus, you can ignore the math part and yet appreciate the beauty of Bitcoin.

Some other examples: Bit Torrent, TOTP RFC.




This is the list I used 27 years ago(1) when I taught at Stanford. It's a mix of what the previous guy (Bob Hagmann of Xerox PARC) used, what I read at Wisconsin, with a sprinkling of Sun's stuff. In looking this over, it's missing some of the Sun stuff, here are some links, some good papers here especially those that start with "v" (if the formatting in any of these looks wonky let me know, I have the troff source for a bunch of these and had to format it just now to get pdfs, might have screwed something up):

  http://mcvoy.com/lm/papers/SunOS.vfs_arch.pdf
  http://mcvoy.com/lm/papers/SunOS.vm_arch.pdf
  http://mcvoy.com/lm/papers/SunOS.vm_impl.pdf
  http://mcvoy.com/lm/papers/SunOS.shlib.pdf
  http://mcvoy.com/lm/papers/SunOS.tfs.pdf
and to toot my own horn, some stuff I did at Sun (and after)

  http://mcvoy.com/lm/papers/SunOS.ufs_clustering.pdf
  http://mcvoy.com/lm/papers/SunOS.nvram.pdf (proposal, never built)
  http://mcvoy.com/lm/papers/lmbench-usenix.pdf
  http://mcvoy.com/lm/papers/splice.pdf (hand waving, somewhat built in Linux)
And the Stanford list:

  http://mcvoy.com/lm/papers/stanford.txt
(1) Jesus, I'm old. Feels like yesterday I was teaching there.



Thanks. Does HN let you do inline http? If so, will do better next time.


AFAICT you can just paste in the URL, as long as it's not in a code block (lines that start with 4 spaces).


Yeah so I did that and I ended up with the same thing as sillysaurus, it looks like a bunch of crud jammed together. So I used the code block stuff to make it look reasonable.

Are those my two choices? I googled and HN seems to have very limited formatting.


Try

- foo

- bar

It isn't special formatting, but it might line the links up.

This is an interesting case. It looks fine on mobile, but desktop Chrome looks jammed together.

On the other hand, people can still differentiate between each link, so ultimately it's not so bad.


"Reflections on Trusting Trust" by Ken Thompson is one of my favorites, and fairly self contained.

Most papers by Jon Bentley (e.g. "A Sample of Brilliance") are also great reads and usually pretty short.

I'm a frequent contributor to Fermat's Library (https://fermatslibrary.com), which posts an annotated paper (CS, Math and Physics mainly) every week. In the annotations you will usually find a concise piece of knowledge that helps you understand some part of the paper without having to spend a long time in the "recursive rabbit hole". For instance, in the Bitcoin paper, there is an annotation with a succinct explanation of the essential cryptography concepts (Hash functions, Public Key Cryptography, Signatures) you need to know to understand the paper. And the nice thing is that if you feel so inclined, you can add your own annotations and make the paper easier to grasp for the next person who reads it :)

- Reflections on Trusting Trust (Annotated Version) - http://fermatslibrary.com/s/reflections-on-trusting-trust

- A Sample of Brilliance (Annotated Version) - http://fermatslibrary.com/s/a-sample-of-brilliance

- Bitcoin: A Peer-to-Peer Electronic Cash System - https://fermatslibrary.com/s/bitcoin


Adrian Colyer's "The Morning Paper" is a similar one worth mentioning: https://blog.acolyer.org/


I'm a grad student and I spend a lot of my time reading CS papers.

It is amazing that you want to start reading CS papers and I highly encourage it. However, if you don't have a CS degree I think papers might be remarkably off-putting (they usually are even to early grad students). I suggest you start with textbooks instead. Papers suffer from not being rewritten once they are published, hence from a pedagogical point of view, they are usually not explained as well as they could be after digesting them for many many years. Another problem is that most papers contain multiple ideas some of which rise and shine and the other ones die (often for good reasons). It is not easy to spot which is which without knowing about the wider context, which you naturally lack as a beginner.

If you insist on reading papers, however, at least don't read them in linear order. A good section order is abstract -> intro -> conclusion -> related work (because it uses comparisons which help) -> background -> evaluation (if exists!) -> technical chapters (those are usually best read in linear order). If there are proofs read them only if you must! If the proof is presented in the paper that is a good indication that it contains multiple subtle points which are tough to understand even if you are an experienced researcher.


Could you suggest some textbooks? And if you have the time, a word on why particularly book X and not book Y in the same subject?

There’s so much noise online now about these topics that it’s very hard to figure out where to focus and which resources are worth investing time into.

Thanks in advance!


Sure. Which subjects are you interested in?


Lamport's 'Time, Clocks and the Ordering of Events in a Distributed System'. Very readable and, if you are new to distributed systems, really fast way to get a useful new mental model.

https://www.microsoft.com/en-us/research/publication/time-cl...


On the subject on distributed systems, I like this quote by Lamport:

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." Leslie Lamport


Finally got around to reading that paper after seeing your comment. What a great piece! The logic was clearly laid out and simple to follow. I'll have to read through the proof in the appendix later.


Thanks for the feedback! I've assigned that paper to hundreds of 4th year undergrads, and I always enjoy discussing it with them. I've also had my grad students read it as a model for how to structure a paper.


That is a really wonderful paper, indeed. It's very clearly written (as anything by Lamport), and provides a simple and elegant solution to a complicated problem.


Well, except for Paxos. :P


I'm surprised no one has mentioned "the morning paper" (https://blog.acolyer.org/) yet. Some papers are more advanced but a lot I found are definitely approachable. You can subscribe to the mailing list and get a new paper every day, along with the author's summary!

That aside, I'm very impressed how he can keep up with this rhythm.


IIRC he has a backlog for a couple of weeks to keep stuff consistent.


I like Pugh's paper on skip lists [1], Shannon's "Mathematical Theory of Communication" [2], and these two might be a stretch but I also like Rong's "Word2vec Parameter Learning Explained" [3] and Levy & Goldberg's "Word2vec Explained" [4]. In any case use the recommended papers to learn paper reading skills e.g look at references when you don't understand a concept, find an introductory textbook to clarify a proof method, write a summary to make your understanding concrete. Good Luck!

[1]: http://courses.cs.vt.edu/cs2604/fall05/wmcquain/Notes/Supple...

[2]: http://math.harvard.edu/~ctm/home/text/others/shannon/entrop...

[3]: https://arxiv.org/pdf/1411.2738.pdf

[4]: https://arxiv.org/pdf/1402.3722.pdf


Great suggestions. I wish Fermat's Library would do the Shannon paper -- I suggested it, it seems an obvious choice, but no dice :(


"Why functional programming matters" by John Hughes was recommended reading in my first year undergraduate program in Gothenburg.

https://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.p...

It's more of a "position paper" than a "research paper", I suppose, so there's no theorems or proofs or anything.


I love Fermat's Library http://fermatslibrary.com/

It provides a good selection of classics annotated with the purpose of guiding you through the paper.

There's also the seminal Bitcoin paper http://fermatslibrary.com/s/bitcoin


I really like 'Hints for Computer System Design', Butler Lampson [0]. He passes on useful experience without demanding much in preparation from the reader.

[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...


Yup, came here to post this one. There are many great papers that someone should read but this one is more or less a good read for everyone.


+1 everyone building any sort of system should read this. Often.


1. On Computable Numbers, with an Application to the Entscheidungsproblem by Alan Turing

2. A Mathematical Theory of Communication by Claude Shannon

3. Recursive Functions of Symbolic Expressions and Their Computation by Machine by John McCarthy.

4. Bitcoin: A Peer-to-Peer Electronic Cash System by Satoshi Nakamoto

5. Reflections on Trusting Trust by Ken Thompson

6. Time, Clocks, and the Ordering of Events in a Distributed System by Leslie Lamport

7. As We May Think by Vannevar Bush

8. The Anatomy of a Large-Scale Hypertextual Web Search Engine by Sergey Brin and Lawrence Page


I was about to post your number 3 before I saw it here. Here's a link to it: http://www-formal.stanford.edu/jmc/recursive.html


I suggest Alan Turning's 1950 Computing Machinery and Intelligence. Many of the ideas first written in this paper are commonly referenced. It's good to have read the primary source. Do machines think? If you know what a theoretical Turning Machine is you'll have all the prerequisite knowledge. If not you can skip over that part. Do they still teach Turning Machines in Highschool?

http://www.loebner.net/Prizef/TuringArticle.html


> Alan Turning

> Turning Machine

His name's Turing


Benefit of the doubt. He could just be used to automatically typing turning. I add apostrophes to want (wan't) all the time.


It’s also a pretty common autocorrect. A fair few of us post using our phones and I’m always horrified to see what actually got posted when I look at it 61 minutes later.


Ha, I literally had 'What are Turing machines?' as my Question of the Week. Thanks for the link. Any other good links to help me find the best answer to this question?


https://www.destroyallsoftware.com/screencasts/catalog/intro... is a good resource for understanding Turing Machines. There used to be a way to share a link with non-subscribers but for some reason I can't find it now.


Not particularly scientific but you could also checkout the 2014 movie Imitation Game: http://www.imdb.com/title/tt2084970/

For bit of background about Turing’s life and efforts in WW2.


The original paper can get a bit dense at times. I recommend Petzold's "The Annotated Turing" for a guided tour.


The original paper was the 1936 one. This is a different paper.


Pixar published a lot of great, groundbreaking graphics papers in their early days. (They still do, but as the field is more mature now there's a lot more background reading required.)

For example, I think this one on their rendering pipeline REYES (Render Everything You Ever Saw) is pretty readable, and gives a great overview of how they rendered stuff like Red's Dream, in the years leading up to Toy Story: http://graphics.pixar.com/library/Reyes/paper.pdf

(Edit to add: in fact, just check out the overviews at http://graphics.pixar.com/library/ which are much better than I can describe here)


I suggest that the undergrads be encouraged to go to Arxiv and browse papers, try to work through them and see the range of papers.

The reason for this is that many papers aren't all that well written and well argued. Many are, to be sure, but it will get undergrads to understand what is clear, what is not.

It will also ensure that they are not intimidated by papers and the math on them. They should know they can dive into one and learn something and come out the other end.

But if you want specific recommendations, I find that the HCI (Human Computer Interaction) papers are very readable. Maybe its the people drawn to the field?

Here is just a random example:

https://arxiv.org/pdf/1711.03115.pdf


Duplicated in my other reply, but the Weiser paper on ubiquitous computing is excellent.


Look for survey papers in the fields you are interested in. These papers are written exactly for your situation - to provide a glimpse of the field assuming little prior knowledge.

regarding: >>Plus, you can ignore the math part and yet appreciate the beauty of Bitcoin

you can appreciate the beauty of car and even be a mechanic without physics, chemistry, and mathematics (!), but to understand cars...


Shannon's "A Mathematical Theory of Communication" is one of the most accessible and profound papers in the field: http://math.harvard.edu/~ctm/home/text/others/shannon/entrop...




It’s hard reading random papers like that without deeply understanding the problem they are trying to solve. Sometimes understanding the true problem is more difficult than the solution in the papers.

What I would suggest instead as a learning exercise is to pick a domain you want to tackle and reinvent the wheel by implementing it. while doing so you’ll naturally find yourself digging into research papers. The advantage here is that during implementation you’d have understood the problem and the context much better and can relate to what the authors are discussing and trying to solve. Instead of moving backwards from solution to problem.


>reinvent the wheel, it's how we get better wheels

-john carmack

    -michael scott


Margaret Martonosi at Princeton runs a Great Moments in Computing Course where students read a through a number of papers that describe, well, great moments in computing.

Link to a recent syllabus is here: http://mrmgroup.cs.princeton.edu/cos583/syllabusS15.pdf

The papers in it are all well-worth reading and are mostly accessible (with some effort) to a first-year grad student (i.e., anyone with an undergraduate degree in CS or something related.)

PS. Don't make the mistake of reading easy papers just because they're easy to read.


* Augmenting Human Intellect (Engelbart, 1962) [0]

* Architectural Styles and the Design of Network-based Software Architectures (aka REST from ch. 5) (Fielding, 2000) [1]

[0]: http://www.dougengelbart.org/pubs/augment-3906.html

[1]: https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm


I came here to suggest Fielding’s dissertation as well. It's eminently readable and doesn't demand any background in mathematics.


I feel like battling with complexity is an important part of growing as a programmer, and it doesn't require any domain knowledge. To that end:

No Silver Bullet - for two ways to think of complexity https://blog.acolyer.org/2016/09/06/no-silver-bullet-essence...

Not really a paper, but a turing award lecture The Emperor's old clothes - https://blog.acolyer.org/2016/09/07/the-emperors-old-clothes...


My favorites when I was a undergrad:

Architectural Styles and the Design of Network-based Software Architectures (the paper that invented REST):

http://jpkc.fudan.edu.cn/picture/article/216/35/4b/22598d594...

Solving the Dating Problem with the SENPAI Protocol:

http://sigtbd.csail.mit.edu/pubs/veryconference-paper10.pdf


The Byzantine Generals Problem's only prereq is familiarity with basic mathematical notation and a willingness to read carefully. The paper is seminal; it articulates a major computer security concern, the concern that rogue nodes in a network may lie in their communications.

https://www.microsoft.com/en-us/research/wp-content/uploads/...


Congestion avoidance and control, by Van Jacobson looks at how TCP, the fundamental protocol behind the internet, works (or didn't work to begin with). It's a pretty easy paper to digest without too many dependencies, especially if you skip the maths (which is explained intuitively anyway). And it's a fascinating read. I highly recommend it!

https://nsl.cs.sfu.ca/papers/Jaco88.pdf


Look for older papers. They are for various reasons often much easier to read. Here is one example from 1997: https://cadxfem.org/inf/Fast%20MinimumStorage%20RayTriangle%... It's only seven pages and the math is simple for someone to understand with basic linear algebra knowledge. Despite that, it has been cited thousands of times.


Kind of off-topic, but the world is really hurting for something like a distributed version of Zotero; a standardized format for collecting, distributing, and collaborating on reference lists. It's a shame that the Zotero team has no plans to make the Zotero sync protocol itself distributed or self-hostable, because in my (probably naive) opinion it would be a real killer application for both universities and private researchers. If I had enough money to fund this development myself I'd gladly do it, and if someone else kicked off the project I'd also gladly donate.

The only free-as-in-freedom-ish alternative to Zotero I know of is JabRef[0] plus a browser extension like JabFox[1] and either a manual WebDAV/Rsync synchronization solution or Bibsync[2], which hasn't had been updated in 3 years. And it still lacks the collaboration utility of a Zotero public library.

[0]: https://www.jabref.org/

[1]: https://addons.mozilla.org/en-US/firefox/addon/jabfox/

[2]: https://github.com/minad/bibsync


The Raft paper [0] is a great read but context of distributed systems and the importance of consensus algorithms is probably a prerequisite. Once you understand the context, it's a nice read that it small enough to contain within your mind in one or two reads.

[0]: https://raft.github.io/raft.pdf



Best Paper Awards in Computer Science (since 1996) http://jeffhuang.com/best_paper_awards.html

Papers I Read https://shagunsodhani.in/papers-I-read/


Why do you think best paper award articles make good reading for beginners? It's usually quite the opposite, they require a good knowledge of the field they contribute to.


The general idea is that you better off reading review papers, they are more approachable: https://www.journals.elsevier.com/computer-science-review/mo...

Or you could start with something you already knew, but in the form of a research paper to see if you could figure that out with just reading the paper. It's reading from the source: You understand the quirks and the intent of the original author.

For example, the Needleman-Wunsch algorithm should be something you already knew -- try reading the original paper: http://www.sciencedirect.com/science/article/pii/00222836709...


I would say none! Research papers are usually not written to be understood by an undergraduate student. Besides, older papers may be hard to read because they were written in a very different context. IMHO, it's better to find well-written, pedagogical and contemporary textbooks on the topic you find interesting.


A related question is "If I had the best professor in the world, what gems would they mention as examples of the beauty of computer science". These things maybe should be on the syllabus, but gems aren't necessarily seen as appropriate stepping stones in a learning process, and may only be accessible to the most capable students. Textbooks also, arguably, rarely offer charismatic enthusiasm or a glimpse of the sublime; they have to adopt a workaday attitude and play it safe.

Things that occur in books that might fit this bill of sublimity; Things that the very best practitioners get excited about:

Cormen names Tarjan's analysis of the complexity of the disjoint set union-find algorithm as his favourite thing[1]. Knuth is an appreciator of Tarjan's algorithm for finding strongly connected components[2]. Tarjan emerges as a contender for the algorithmicist's algorithmicist award.

[1] "It's not an algorithm, but a data structure. I've always marveled at the simple tree-based data structure for disjoint-set union, using union by rank and path compression (Section 21.3 in the third edition of CLRS). The code is amazingly simple, the data structure operations take just barely superlinear time, and the analysis (by Bob Tarjan) blows my mind."

[2] "While I was preparing for Volume 4 of TAOCP in the 90s, I wrote several dozen short routines using what you and I know as "literate programming." Those little essays have been packaged into The Stanford GraphBase (1994), and I still enjoy using and modifying them. My favorite is the implementation of Tarjan's beautiful algorithm for strong components, which appears on pages 512–519 of that book." and

"The data structures that he devised for this problem fit together in an amazingly beautiful way, so that the quantities you need to look at while exploring a directed graph are always magically at your fingertips. And his algorithm also does topological sorting as a byproduct."

Finding this kind of enthusiastic recommendation is always a joy.


It sounds very basic, but I highly recommend 'The Annotated Turing'[0] to any beginner in Computer Science. It's a walk through Turing's original 36-page paper on Turing Machines, and requires only high school level math to understand. I picked it up early in my CS undergrad and it blew my mind. I suddenly understood what a computer was.

[0]https://www.amazon.ca/Annotated-Turing-Through-Historic-Comp...


Edsger W. Dijkstra Archive

http://www.cs.utexas.edu/users/EWD/

Classics and very good reading. Some of the papers are hand written.


"Structural Regular Expressions" by Rob Pike: http://doc.cat-v.org/bell_labs/structural_regexps/se.pdf

This is the foundation of the text editor 'sam', see here: http://sam.cat-v.org

Pretty much everything in the cat-v.org rabbit hole is worth digging into IMO.




The dynamodb paper is pretty approachable and provides alot of context.

http://www.allthingsdistributed.com/files/amazon-dynamo-sosp...


The ethereum white paper is also quite straightforward to read. https://github.com/ethereum/wiki/wiki/White-Paper


Harry Lewis at Harvard maintains a master list https://docs.google.com/spreadsheets/d/1wS6O7-ZoFL7Cfjgt-kdh...

He is teaching an online class (for credit) in the Spring that reviews an interesting subset. You don't have to be enrolled to take the class. https://www.extension.harvard.edu/academics/courses/classics...


http://www.sciencedirect.com/science/article/pii/01676423879... It is incredible how so many concepts are clearly explained in so few pages (30). It is the foundation of caml light language that gave birth to ocaml. It explains lambda calculus, semantic of ML family languages, SECD machine, how to perform demonstration in CS. IMHO, it requires at least 5 days to fully understand it if everything is new for you.


2 classics that don't require a whole lot of backgrond are:

- An Axiomatic Basis for Computer Programming by C. A. R. Hoare

- The Next 700 Programming Languages by P. J. Landin

The most recent paper I've read is:

- Storage strategies for collections in dynamically typed languages - Bolz, Diekmann, Tratt

I blog about some papers I've read on my blog [0], and these three papers I've mentioned have their own post, where I summarize key ideas of the papers. Take a look if you're interested.

[0] https://blog.ngzhian.com/


For networking, two classics are:

The Design Philosophy of the DARPA Internet Protocols

http://tdc.iorc.depaul.edu/media/internet-design-philosophy....

End-to-end arguments in system design

http://web.mit.edu/Saltzer/www/publications/endtoend/endtoen...


Turing Awards: http://amturing.acm.org/lectures.cfm. Maybe start with Karp.

Also Jim Gray “why do computers stop and what can be done about it” http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.65...


Of Turing Awards I would recommend John Backus' "Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs"


Easy to read. This article explains the difference between CS Topics such as Artificial Intelligence, Machine Learning, and Deep Learning:

https://medium.com/machinevision/overview-of-artificial-inte...


I'd mention the morning papers (acolyer.org), also mentioned.

But why are you interested in papers? To improve your programming? To attempt to implement? Or for mostly intellectual stimulation?

If it is for programming, I'd strongly suggest certain books instead. e.g. Programming Pearls by Jon Bentley.


What do you consider to be basic CS? e.g. would knowledge of A* or dynamic programming be considered basic CS? And are you just looking for possibly interesting papers (even if written by completely random people nobody's heard of), or more well-known ones?


What paper do you mean for Bitcoin? The original white paper from Satoshi? I loved the bit torrent paper. I'd add the paper on HMMs and NFS. My list of favorite papers :) The oldies used to be better. Kinda sad really.


Google's classic papers on PageRank, MapReduce, BigTable and GFS.

Amazon's Dynamo.

Twitter's Cassandra.


I find WorryDream Refs by Bret Victor to be a very inspiring collection: http://worrydream.com/refs/


a while back on HN someone posted this link https://github.com/papers-we-love/papers-we-love and it's really good, it covers many of the other reccomendations here.

It also has a community built around it so you can meetup with others interested in the same thing http://paperswelove.org/


For databases the redbook (readings in databases) is a great resource - http://www.redbook.io/


Stephen Wolfram, A New Kind Of Science

https://www.wolframscience.com/


A Relational Model of Data for Large Shared Data Banks by Edgar F. Codd [Turing Award Winner]

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98....

This is the seminal paper that laid the foundation for RDBMS systems.

At my university (NUS) the CS program had a research paper reading program, where sophomores were tasked to read and summarize a well known paper and this was one of the most popular.


I think this is an essential read: http://usingcsp.com/


Found this through Rich Hickey, and really enjoyed it:

Out of the Tar Pit – Moseley & Marks 2006


my 2 favorites:

Weiser’s original Ubiquitous Computing paper.

TCP RFC. 86?


oops I was gonna suggest bitcoin


Playing Atari with deep q learning :)


Maybe instead of papers you need textbooks?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: