Data Structures Sketches

photochemsyn · on Oct 9, 2022

These kinds of diagrams are OK for the most basic introduction to data structures, but they're not much help when it comes to creating an efficient implementation. Learning about data structure implementation without using a relatively low-level language that allows pointers seems to be a recipe for confusion.

Hence, one is probably always better off using the built-ins or libraries for any data structure in a higher-level language rather than implementing them from scratch. For example, here's a nice blog post on how Python dicts are implemented using hash tables written in C.

https://www.laurentluce.com/posts/python-dictionary-implemen...

While it's possible to build something like a binary search tree from scratch in Python using classes for nodes and with references instead of pointers, along with recursive algorithms for traversing it, it doesn't seem like that great of an idea to teach students in this manner. If the goal is to really teach students how to do this, C or similar (Rust?) is probably the better option for code examples, and a lot of time would have to be spent on memory management concepts (i.e. have students build their own dynamically allocated stack for the traversal, to avoid blowing through the stack, maybe).

raydiatian · on Oct 10, 2022

Yeah, this website is kind of like porting Windows 95 to the Apple Watch. It dies immediately after its esoteric novelty.

> Learning about data structure implementation without using a relatively low-level language that allows pointers seems to be a recipe for confusion.

All things considered, one should absolutely do data structures in a pointer-based language, eventually. I think learning CS concepts is inherently fraught with false certainty in your grasp on it early on. I’m pretty sure I did data structures in Java first, though, followed by cpp. I personally can’t see pointer jitsu having made it necessarily easier in my first bout with the concepts. Although there is certainly something to be said for the non-magicalness of doing your own memory management.

gfd · on Oct 9, 2022

To save someone a click, "sketch" has a technical definition when it comes to data structures (probabilistic data structures like count min sketch https://en.m.wikipedia.org/wiki/Count%E2%80%93min_sketch)

But this link is using the non technical definition (drawings)

n3t · on Oct 9, 2022

A non-mobile link: https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch

minaguib · on Oct 9, 2022

Cool material, but happy to have been exposed to okso.app as a (newcomer?) free online sketching tool (a-la excalidraw, a bit...)

pelasaco · on Oct 11, 2022

that's was probably the intention behind this post...

threatofrain · on Oct 9, 2022

This is a Very Small sequence of data structures...

sidlls · on Oct 9, 2022

It's uncommon that more than the ones presented will be necessary for the majority of applications, though.

morelisp · on Oct 9, 2022

You could literally only have "hash table" and this would be true for "necessary for the majority". But maybe we should raise the bar a little bit.

marginalia_nu · on Oct 9, 2022

Necessary? Maybe not. Useful? More often than you'd think, in more domains than you'd think.

Basic probabilistic structures like bloom filters and the hyperloglog in particular are severely slept on.

morelisp · on Oct 9, 2022

Forget probabilistic structures, any intro to data structures that doesn't show you a deque built with a ring buffer is shit, and I'll die on that hill.

(That being said, I'm tweaking something in the ASketch family as we speak, so I agree 100%.)

spinlock · on Oct 10, 2022

I wish the icon for hash tables was better. The rest are excellent but the hash table just falls back on a common syntax.

It bugs me how much this bugs me.

quadrifoliate · on Oct 10, 2022

The notation of head ← n for "n is stored in head" was mainly responsible for my confusion in reading CLRS for the first time. Even now, it really breaks the flow of reading this pseudocode for me, and I'm not quite sure why.

Has anyone else struggled with this, and have suggestions on how to rewire my brain so the notation "flows" better?

pasquinelli · on Oct 10, 2022

it's just a different symbol from the usual assignment operator, that is to say "put the value you're referring to as `n` in the variable `head`", just like `head = n` in most languages with assignment. the only reason for the arrow is if you're using `=` for true equality, as in `head` is always the same as the value `n`, or if you wanted to use `=` to mean `equals?`, a boolean test for momentary equality.

i don't know if any of that helps, it all just comes down to recognizing the symbol and interpreting it correctly.

deredede · on Oct 10, 2022

I have no tips, but I share the pain. My intuitive reaction when seeing an arrow is that it changes the text direction, and I do not give semantic meaning to the arrow direction. Iow, my brain essentially treats `a <- b` and `b -> a` identically.

This is the usual way that directional operators work in maths: comparisons, set membership and inclusion, implication (to be fair the symmetric version is rare enough), etc.

derekbaker783 · on Oct 10, 2022

There are programming languages where that syntax is used for assignments to mutable vars (F# being one example).

Maybe play around with a language that uses that syntax?

https://learn.microsoft.com/en-us/dotnet/fsharp/language-ref...

quadrifoliate · on Oct 10, 2022

This is an excellent suggestion! I was going to say that the only place I have seen arrows used are either struct membership in C or the fat arrow function in JS, and neither of those go from right to left. So playing around with F# might help.

minitoar · on Oct 9, 2022

This reminds me of some interview questions I was asked at Looker like 7 years ago. I felt silly drawing a stack and a queue but in hindsight I think it’s a good way to get some insight into a candidate’s intuition on these things.

kergonath · on Oct 10, 2022

I’m glad I am not alone.

morelisp · on Oct 9, 2022

Oh good, another generation of of programmers with their brains stuck thinking about box-and-arrow diagrams and have no clue how to store a binary tree in a way that don't blow your cache or a graph in an adjacency matrix.

educaysean · on Oct 9, 2022

Wait, there are people out there that don't know what you know? Sound the alarms everyone! These new generation of amateurs are going to ruin everything!

morelisp · on Oct 9, 2022

You're making a personal issue out of a pedagogical complaint. Which one of us is approaching this poorly?

It's not the next generation's fault you've utterly failed them.

spinlock · on Oct 10, 2022

I think you’re overestimating the demand for people who can implement a binary tree.

Then again, when I used to work more with node, I would complain daily that those kids were just reimplementing solved problems in computing … poorly. I’m working more with elixir these days and everything is so fast that I’ve never stopped to dig into the implementation.

jltsiren · on Oct 9, 2022

CS is built on abstractions, which make things more modular and easier to think about.

A data structure can be seen as an interface, a logical structure, a physical layout, or an encoding. When you teach them, you have to start from somewhere. The logical structure is usually a good choice, because it contains the key ideas of the structure. If you cram in too many details and too many levels of abstraction, you are just going to confuse the audience.

morelisp · on Oct 10, 2022

A data structure can be seen as a mathematical object with attendant definition of structure and operations, which can be useful for formal reasoning; or a implementation, an actual executable artifact encoding those structures and operations into a particular runtime, which can be useful to, like, write/run some code.

What you're calling a "logical structure" is not really a thing. It's a sort of outline or mnemonic, usually for the mathematical objects but sometimes the program. It's per se insufficient for either mathematical or practical use. When the outline fails to capture the right details (as it fails here) it's also harmful to anyone trying to understand either view of the data structure.

jltsiren · on Oct 10, 2022

That's a narrow point of view. It would make minor variations of the same structure different data structures, which is not very useful approach.

Logical structure is the heart of the data structure. It can be understood as the set of invariants maintained by the structure. Those invariants determine which operations can be supported efficiently by some version of the data structure. Layout and encoding choices affect the performance of those operations. The interface determines which operations are available, but it's often a poor match for the actual structure.

The lines between the logical structure, the layout, and the encoding are vague, but the concepts are useful. For example, if two data structures have the same logical structure and layout, they can often share the same serialization format. That implies that there should be a simple but very efficient conversion algorithm between the two, which can prove useful if you sometimes need to support different operations efficiently.

astrange · on Oct 10, 2022

I don't think "computers have caches that are faster to access than main memory" is just an implementation detail though; if you're designing data structures for a specific kind of storage you should teach the properties of that storage first and the data structures after.

jltsiren · on Oct 10, 2022

If you are designing data structures for a specific kind of storage, you have hopefully studied them beyond the first/second year algorithms and data structures class. Like most introductory classes, that class mostly teaches you concepts and ideas.

My job is largely about designing and implementing new data structures. Those four levels of abstraction are the ones I've personally found useful. None of them deals with implementation details, as they are all abstractions. Implementation details are an orthogonal concept that is relevant at every level.

Valmar · on Oct 9, 2022

> CS is built on abstractions, which make things more modular and easier to think about.

CS also leads to generally garbage performance with its abstractions, because computers simply don't function according to assumptions of CS...

Better to teach people how computers actually work, and then go from there.

Doesn't feel particularly surprising anymore that software has been getting slower far more quickly than hardware can get faster.

jltsiren · on Oct 9, 2022

In the CS curricula I'm familiar with, the introductory algorithms and data structures class is usually a prerequisite to the core systems classes. In order to learn how computers really work, you must first be familiar with the key ideas of organizing and manipulating data. Designing algorithms and data structures for real computers is an advanced topic few students take, because they are more interested in software/systems/theory/ML.

nextos · on Oct 10, 2022

Abstract data types can be zero-cost abstractions...

obventio56 · on Oct 10, 2022

I think you learn the box-and-arrow diagrams in an algorithms class. You learn about cache and memory in a systems class. When you get a real job, if someone asks you to write a performant binary tree in a low-level language, you combine knowledge from both these experiences and make an inference on what to do. Maybe you try, test, and iterate. Maybe you read some blog posts or find a canonical implementation in another language.

Either way, it doesn't seem like a huge knowledge gap.

marginalia_nu · on Oct 9, 2022

Yeah, I was thinking this as well. This box and line diagram style is so far away how you would implement many of these structures that you might not recognize them if you saw them.

woojoo666 · on Oct 9, 2022

What's the preferred way of implementing a binary tree, and how much more efficient is it over just nodes and references?

spinlock · on Oct 10, 2022

The heap implementation in Coran Leaderson and Revests’ Algorithms book is super interesting. But, I used to refer to that book as the cure for insomnia as an undergrad. It won’t be very engaging content to most people.

marginalia_nu · on Oct 10, 2022

I sadly think some of these subjects become sort of a snooze fest because at the time you are learning them, you are so far away from applying the patterns. It's far closer to stamp collection than any actual implementation for the average undergrad.

Back first year when we had linear algebra, I was pretty much the only guy who absolutely aced the exam. Not because I was the smartest or hardest working, but because I applied everything we learned to a game I was building, a 3d spaceship simulator. Vector products, planar projections, rotation matrices, I mainlined all of it, I pored over the book in my spare time with great enthusiasm. For the rest of the class it was just yet concept after abstract concept being piled on with no place to apply them.

vardhanw · on Oct 10, 2022

Its Cormen, Leiserson, Rivest. Was it a spell check error or a deliberate misspell for some reason?

spinlock · on Oct 10, 2022

Lol. That was all Seri. Now, I’m questioning her politics.

marginalia_nu · on Oct 10, 2022

As alluded to in the sibling comment, implicit trees are a good starting point for trees that are to be reasonably dense. Removing explicit pointers reduces the size and increases the data locality. Depending on what manner of tree you've built, you can also iterate over these values in a fast and cache-friendly way.

eps · on Oct 10, 2022

All I see at the link is a loading spinner. Using an iPad.