The Intuitive Guide to Data Structures and Algorithms

Veedrac · on Oct 14, 2017

One thing I find generally worth improving with these sorts of introductions is that they always seem to introduce data structures as a pick-and-mix bag of prebuilt parts, rather than a set of techniques which allow you to enforce properties which allow you to build objects that do what you want efficiently. Teaching the former is like teaching a physicist all the different kinds of gear systems, rather than the math that lets you model them.

Knowing how binary heaps and hash maps work is great, because binary heaps and hash maps are very useful data structures, but knowing the fundamental reasons behind them in a way that composes to building new things is much more useful. What happens when you want properties that someone hasn't already built a data structure for, at least as far as you can find? (Pet example: ordered container with efficient append and the ability to query the maximum of any subslice length k in O(log k). Extension: what are the minimal assumptions you need to do better, eg. amortized O(1)?)

I think this contributes to the attitude where programs are built out of monolithic, ill-fitting parts rather than actually looking at the problem and building the solution which does the right thing, the right way. (Pet example: if you designed websites as data structures and the browser to support doing so efficiently, 90% of requests on a site like Reddit would be a single `memcpy`.)

What I would attempt is rather than saying

"Here's one way we could do it: <explaination of pre-baked hash table>."

talk about the underlying tool first

"One tool that can help us is hashing: <explaination of general technique and why its properties are useful>. In this example we want <these properties> but can't immediately get it because our data has <these properties>. But we can apply hashing to get <technique that looks decent but has caveats>. Handling the caveats by doing <hash table things> gives us a data structure called a hash table."

Yes, this is probably a harder way to handle things, but I'd like to think it'd be worth the trouble.

chrisweekly · on Oct 14, 2017

That was a clearly-written, knowledgable, info-dense and effective bit of writing you just did, @Veedrac. Ever considered doing more of the same, and delivering the kind of educational material you just described?

trishume · on Oct 14, 2017

You can subscribe to Veedrac's HN comments on RSS like I do. I've found a bunch of people like Veedrac regularly write really interesting things in HN comments, so I wrote an app that lets you subscribe to them: http://hnblogs.thume.ca/

Veedrac · on Oct 14, 2017

I was expecting a far more mixed reaction, so I'm glad to hear this engaged people. Writing educational material to takes more work (and expertise) than I'd be willing to guarantee, but I'll keep the sentiment in mind :).

catnaroek · on Oct 14, 2017

> Teaching the former is like teaching a physicist all the different kinds of gear systems, rather than the math that lets you model them.

Most programming work is less like theoretical physics and more like car repairing.

yawboakye · on Oct 14, 2017

Programming is trial and error. Anybody who tells you otherwise is a salesman.

catnaroek · on Oct 15, 2017

Speak for yourself.

Dylan16807 · on Oct 14, 2017

> (Pet example: if you designed websites as data structures and the browser to support doing so efficiently, 90% of requests on a site like Reddit would be a single `memcpy`.)

You'd still want a compressed form, even if you had a good way to bypass parsing.

Veedrac · on Oct 14, 2017

Ah, yes, I botched that description, since there has to be more to support things like encryption. A more accurate claim is that the bulk of the work currently done would be replaced with a memcpy.

amelius · on Oct 14, 2017

> (Pet example: if you designed websites as data structures and the browser to support doing so efficiently, 90% of requests on a site like Reddit would be a single `memcpy`.)

Formatting a webpage cannot be done by a simple memcpy, especially since you don't know the size of the browser window in advance.

But perhaps I didn't understand the point you were trying to make.

Veedrac · on Oct 14, 2017

I mean on the server-side. Right now the process between "I want thread 35va5k" and "I want to render this html" makes extremely inefficient use of both servers and network.

scottLobster · on Oct 14, 2017

I'm all for stuff like this in theory, but in my experience I never use said knowledge for any productive purpose and eventually it degrades... until I read a post like this, feel a surge of professional inadequacy and go re-learn it. :P

petra · on Oct 14, 2017

Can this be done using software ? can i ask some software/expert-system - i need this and that properties, and it can tell me - "use this structure" ?

devonkim · on Oct 14, 2017

I’ve wondered why we don’t typically use algorithms that will self-adjust when we have so many VMs that can do all sorts of statistical analysis. We know that quicksort is inefficient on smaller and mostly-sorted lists so we oftentimes switch to selection sort below 20 items or so, but we typically have libraries that have statically picked this behavior as directed by their developers instead of over time selecting the more appropriate algorithm. Sure, if you’be never seen the data before there’s not a lot you can assume, but if over a lot of executions your VM can perform native code substitution and replace certain sections of code out based upon runtime profiling in HotSpot why can’t we come up with heuristics that say “just use a list instead of a hash table until runtimes to access start getting slower than n, in which case reallocate the data to a hash table, btree, or a heap depending upon which operations are most sensitive to system design parameters.” A lot of problems seem like we’re repeating ourselves writing the base algorithms and gluing things together when we really should be investing problems around resource constraints and system trade-offs and priorities. Then we could run test suites against these systems to determine how well they’d meet certain guarantees programmed in.

petra · on Oct 14, 2017

Yep, great idea.

>> A lot of problems seem like we’re repeating ourselves writing the base algorithms and gluing things together when we really should be investing problems around resource constraints and system trade-offs and priorities

What's the minimal platform that would enable library developers to monetize such things ? and does it or something close to it exist ? because that seems like the fastest way to make this happen.

andrei_says_ · on Oct 14, 2017

A small thing: the slide in subscription popover contains language which is dangerously close to the dark pattern of framing refusals as guilt/shame inducing.

A simple no thanks would do.

Because “they won’t remember what you did but hey will remember how you made them feel” and this made me feel “eww”.

gameguy43 · on Oct 14, 2017

Yeah, agreed. I'm thirsty for that install rate but more importantly I want us to be a force for good.

Fixed in master. Not freaking pushing while we're sitting at #1 unless I have to tho :P

Edit: Thanks for the note!

andrei_says_ · on Oct 14, 2017

You’re very welcome.

MattRix · on Oct 14, 2017

Agree completely! It almost made me leave the site without looking at any of the content, it felt gross.

cbzehner · on Oct 14, 2017

Interview cake is amazing. The way the problems are presented and the hints are given is often better than what you'll get in an actual interview setting.

After using it myself, I could not recommend it any more highly.

irl_zebra · on Oct 14, 2017

Same here, actually helped me in real life a lot, worth the money. No affiliation with them. Their IDE sucks though.

gameguy43 · on Oct 14, 2017

:) New one is on the way! It's taken a while, but it's getting close.

bogomipz · on Oct 14, 2017

Any chance of getting Go added to the list of available languages? It looks like there is almost everything except Go.

gameguy43 · on Oct 14, 2017

It's on our list!

Out of curiosity: is Go actually your most comfortable language? Or are you just interested in using Interview Cake to get some more practice with it?

pdimitar · on Oct 14, 2017

In my eyes is extremely easy to [superficially] statically analyze, plus it has enforced formatting at the language level.

Not the most comfortable language but a very worth addition!

bogomipz · on Oct 14, 2017

Ut would be the latter. I look forward to seeing this, thanks.

irl_zebra · on Oct 14, 2017

Are you "Parker" from the site?

gameguy43 · on Oct 14, 2017

irl_zebra · on Oct 14, 2017

Nice! Actually in all seriousness the site was pretty awesome and helpful. Six different companies are flying me out to SF in a few weeks for final interviews with them all. Can't say what percentage of that was due to your site and what was general study, but I found your questions intuitive and super helpful. For real, thanks!

gameguy43 · on Oct 14, 2017

Awesome! Good luck with the onsites!

When you get stuck, don't get worried or frustrated. Get excited about getting to play around with a hard problem.

brailsafe · on Oct 14, 2017

Grats on the interviews. It's a little weird to read this comment as someone in the same industry but in the polar opposite position. As another commenter asked, I'd be curious to hear about your path up to this point.

irl_zebra · on Oct 14, 2017

Sure, kind of answered below! I'm more than happy to answer additional questions.

kirbypineapple · on Oct 14, 2017

What's the best way to obtain interviews other than personal connections?

irl_zebra · on Oct 14, 2017

Sure, yeah, that's basically where I was coming from. I'm not 100% set on moving yet, just testing the waters. I have some personal connections at large companies that I intend to use to get referrals into the hiring pipeline when I do actually move, because that's apparently the only way to reliably get interviews at the huge places. For now, it was a lot of applying to jobs online with targeted resumes/cover letters at smaller to mid-sized startups in SF/bay area. There really just are a huge amount in the 20-50 person size range.

I found that once I actually got an interview or two, it was like a bit of a waterfall. I applied to companies X, Y, Z. When I got an interview at X, I emailed Y and Z and said I'm interviewing elsewhere and if all goes well plan to be in SF, and would love to interview with you. I think this is confirmation bias at play or whatever, because that really seemed to help get additional interviews.

kirbypineapple · on Oct 15, 2017

Thanks for the info. You're where I'm planning on being in about a month or so. A handful of personal connections that I'm banking on to get my foot in the door. Fill that in with targeted resumes to my top picks and a maybe untargeted to a dozen or so others.

brailsafe · on Oct 14, 2017

That's an interesting strategy. More specifically, when you emailed y,z saying that you had an interview elsewhere, did you have their direct contact information or did you just mention this in the cover letter box on their application page?

victorhooi · on Oct 14, 2017

I was tempted to buy this - howevever, what turned me away was the subscription fee - it says:

> No subscriptions to cancel—this is a 1-time charge. > Access lasts (3 months) 1 year.

So sure - the fee may not automatically renew - but then you lose access to all the material after a year.

Is there not simply a way to pay for the course, and own a copy?

yawboakye · on Oct 14, 2017

I've had the same concerns but I think--this is all my thoughts, no official sanction--Interview Cake started by catering to people who had interviews coming up, and so it didn't look like they will need more than a year's access. Neither would they need the material after they got the job. But I think @gameguy43 will see your comment and hopefully take it up with his team. I'd love to have eternal access, but then the 1-time charge might be bumped up too. Who knows?

gameguy43 · on Oct 13, 2017

Thanks for the post! Original author here. Happy to answer questions.

vvanders · on Oct 14, 2017

Shoutout for including the bit about constants in O(n) notation, 99% of things I see cover it gloss over it.

I've seen radix sort walk all over quicksort for common datasets many times. Knowing your dataset is at least as important as knowing the right algorithm.

danielvaughn · on Oct 14, 2017

I really love this. Frustratingly few articles/posts/courses actually follow through on their promise to make difficult material intuitive. I'm only halfway through your Big-O post but it's already obvious that you've put a lot of care into really making it accessible.

Well done!

gameguy43 · on Oct 14, 2017

That is soooo great to hear because every time we "finish" writing a new piece we end up spending literal /months/ obsessively polishing it before taking it live.

I always worry that we're just spinning our wheels. But it's great to hear that the extra effort is showing through :)

namanyayg · on Oct 13, 2017

This is something I've entered my email into after at least a year or so. Great presentation and design and I'm sure it's good content too -- I'll get to know over the coming week :)

notjackson · on Oct 14, 2017

I really liked how you've added the option to select between many programming languages in the snippets!

Did you write each of them yourself or is there a way to automate such simple code already?

gameguy43 · on Oct 14, 2017

We have a team of like a dozen and a half contractors from around the world who do the translations.

It was important to us to not just translate our Python into C++, but to have C++ that was actually good, C++-style C++. So we hired C++ experts to do all the C++ translations. And same for the other languages.

Now, none of us are C++ experts. So how did we know if the contractors we hired were any good? That's a whoooole blog post for another time.

In short: Noah (an engineer on the team) came up with this clever system where we hired a bunch of people to translate the same code sample (as a paid trial task), then we hired a bunch of people to rate those people's translations (again, as a paid trial task). Then we hired the translator who had the best ratings, and we also hired the /rater/ who was most /thorough/--they do code reviews of the translator's code.

faitswulff · on Oct 14, 2017

Is this actually a blog post? I'd read it.

gameguy43 · on Oct 14, 2017

Not yet!

Edit: but it'd make a good one, right?

faitswulff · on Oct 14, 2017

Absolutely!

foota · on Oct 14, 2017

Curious, how did you measure thorough-ness?

gameguy43 · on Oct 14, 2017

Definitely the least rigorous part of the process :)

Some combination of:

- Did they catch stuff that the other reviewers didn't catch?

- Of the stuff they did catch, does it look to us like that's actually stuff that'll help make our code better, or is it just unhelpful pickiness?

xenihn · on Oct 14, 2017

Happy customer since 2015. Any plans to add Go or Rust translations?

gameguy43 · on Oct 14, 2017

No specific plans yet. But notes like this are how we decide which languages to add next!

Curious: what language do you currently use on the site? And how painful is it for you to not have Go or Rust?

(Also, you're talking about having the /content/ in those languages, right? Or supporting those languages in our code editor?)

xenihn · on Oct 14, 2017

Yep, I'm talking about the content.

I use Swift and Objective-C, but I've consulted InterviewCake content for every other language except PHP at some point. The sample data structures and question answers are nice code samples when I need a quick reminder for how to do basic things in languages that I don't use on a regular basis.

Not having Go or Rust isn't a big deal, if I could choose one it'd be Go. I want to learn Go for myself, and asking for a friend for Rust.

I also never use the editor on the site, I write/run the code on my machine. I never liked using the editor the few times I tried it. I see in another comment you made in this thread that a new one is coming soon, so I'm excited to try it. Any chance I could beta test? :)

gameguy43 · on Oct 14, 2017

Shoot me an email :)

quickben · on Oct 14, 2017

A beautiful, amazing, and informative set of knowledge.

Sadly, an experience tarnished by the three, full page prompts for email subscription, right in the middle of the information flow.

partycoder · on Oct 14, 2017

I think these styles of puzzles borrow a lot from competitive programming (e.g: ACM, IEEE).

And while competitive programming requires skill, what you do most of the time is collaborative programming: create software that interoperates with other software, with a user, that is maintainable, etc. Software without these skills quickly becomes really expensive and hard to maintain.

gregable · on Oct 14, 2017

This is really beautiful. I love that the images are SVGs as well, makes perfect sense for these line drawings. Though it is a little bit of a bummer that all of them have embedded fonts, even those that don't use text.

gameguy43 · on Oct 14, 2017

Oh iiinteresting I guess we are doing that! Probably not too hard to fix. Thanks for pointing that out!

They used to be just inline svgs (instead of img tags), but (at least a couple years ago) our research suggested using img tags was better for SEO and screen readers and even page load time.

edit: (Of course, when they were just inline svgs, they got the fonts for free from the surrounding webpage).

So far I'm thinking we'll just turn off the embedded fonts for images that don't have text. Definitely not expert on this stuff though--open to other advice!

codingninja15 · on Oct 16, 2017

@gameguy43, this is great guide. What resources were most helpful when constructing this guide? Any specific books?

hbk · on Oct 14, 2017

Take it easy with that email stuff. You say "no spam" but its literally spammed all around the page.

nixgeek · on Oct 14, 2017

What kind of 'annual renewal' rate do you get at $199/yr?

Is there a good reason not to just do $199 lifetime?

neeleshs · on Oct 14, 2017

This looks good, but I prefer my copy of Algorithm Design Manual by Skiena.