Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What overlooked class of tools should a self-taught programmer look into
927 points by nathanasmith on May 13, 2019 | hide | past | favorite | 401 comments
15 years ago I learned Python by studying some O'Reilly books and I have been a hobbyist programmer ever since.

The books went into detail and since reading them I've felt confident writing scripts I needed to scratch an itch. Over time, I grew comfortable believing I had a strong grasp of the practical details and anything I hadn't seen was likely either minor quibble, domain specific, or impractically theoretic.

This was until last year when I started working on a trading bot. I felt there should be two distinct parts to the bot, one script getting data then passing that data along to the other script for action. This seemed correct as later I might want multiple scripts serving both roles and passing data all around. Realizing the scripts would need to communicate over a network with minimal latency, I considered named pipes, Unix domain sockets, even writing files to /dev/shm but none of these solutions really fit.

Googling, I encountered something I hadn't heard of called a message queue. More specifically, the ZMQ messaging library. Seeing some examples I realized this was important. The step of then plowing through the docs was nothing short of revelatory. Every next chapter introduced another brilliant pattern. While grokking Pub/Sub, Req/Res, Push/Pull and the rest I couldn't help breaking away, staring in space, struck by how this new thing I had just read could have deftly solved some fiendish memorable problem I'd previously struggled against.

Later, I pondered the meaning of only now stumbling on something so powerful, so fundamental, so hidden in plain sight, as messaging middleware? What other great tools remain invisible to me for lack of even knowing what to look for?

My question: In the spirit of generally yet ridiculously useful things like messaging middleware, what non-obvious tools and classes of tools would you suggest a hobbyist investigate that they otherwise may never encounter?




Makefiles. I always dismissed them as a C compiler thing. Something that could never be useful for Python programming. But nowadays every project I create has a Makefile to bind together all task involved on that project. From bootstrapping the dev environment, running checks/test, starting a devserver, building releases and container images. Makefiles are just such a nice place to put scripts for these common tasks compared to normal shell scripts. The functional approach and dependency resolving of Make allows you to express them with minimal boilerplate and you get tab completion for free. Sometimes I try to take more native solutions (eg. Tox, docker) but I always end up wrapping those tools in a Makefile somewhere forthe road since there are always missing links and because Make is ubiquitous on nearly every Linux and macOS it is just all you need to get a project started.

Example: https://gitlab.com/internet-cleanup-foundation/web-security-...

Running the 'make' command in this repo will setup all requirements and then run code checks and tests. Running it again will skip the setup part unless one of the dependency files has changed (or the setup env is removed).


In 2019 Makefiles are a useful tool for automating project-level things. Too often webapps will require you to install X to install Y to run producing artifact Z. Since Make is old and baked and everywhere, specifying "make Z" is a useful project-level development process. It's not tied to a language (e.g. Tox) nor a huge runtime (Docker). Make is small enough in scope to be easy, and large enough to be capable without a lot of incantations.

The big downside of Make, alas, is Windows compatibility.


> The big downside of Make, alas, is Windows compatibility.

GNU Make works fine on Windows. The sources come with a vcproj to build it natively, or you get it from ezwinports. At my dayjob, we have a pretty complicated build with GNU Make for cross-compiling our application to Arm and PowerPC, and it works on Windows, even with special Guile scripts to reduce the number of shell calls which are extremely slow on Windows.


Most popular folder on Windows is "My documents", it has a space at least in some Windows versions. Make doesn't support such paths: http://savannah.gnu.org/bugs/?712

VBScript works better on Windows, IMO. Also works out of the box on all Windows versions since at least 2000 (on Win9x it was shipped with IE).


>Most popular folder on Windows is "My documents"

Not really... not since XP, anyway. Unless you have a space in your username (which is a terrible idea for many other reasons), your "Documents" path is C:\Users\JohnSmith\Documents. "Program Files" is pretty much the only important path which is likely to have spaces, and your makefiles (hopefully!) don't need to touch that.


On Windows, it's not up to me to decide where users will keep my stuff, and where it will work. Users decide.

For a software to work fine on Windows, it must support spaces in files and paths. Also Unicode in files and paths.

VBScript does, GNU Make doesn't.

> which is a terrible idea for many other reasons

If you use make to setup stuff, it's very possible you'll need to access "c:\Users\All Users" which does contain space in username. Also "c:\Program Files (x86)\Common Files" which contain more than one.


You can try the 8.3 convention,

DOCUME~1 Documents

or

<SYMLINKD> ALLUSE~1 All Users [C:\ProgramData]


> Make doesn't support such paths

That is entirely correct and really the most glaring downside of Make. In my opinion: If you have spaces in your dependency names, just stay away from Make as far as possible.


You wouldn't install Visual Studio 6 to XP easily; since it wasn't support spaces in "Program Files" directory :)


Long paths were introduced in Windows NT in 1993, VC6 released in 1998.

I’ve just installed Visual C++ 6.0 Professional on a WinXP VmWare machine. Took less than a minute, BTW — modern SSDs are awesome. The default installation path under program files also contain spaces, it’s "C:\Program Files\Microsoft Visual Studio\VC98"

BTW, they have a bug even on the very first welcome screen: http://const.me/tmp/vc6.png


You'll have WSL/WSL2 to work with, too. If not make, then CMake is now supported in Visual Studio (2017/2019) and works well.

>ezwinports

Interesting, hadn't run into this before. What's the advantage over MSYS2?


Eli is one the few free software veterans who exclusively works on Windows. His ports are excellent and native Windows binaries wherever possible ("native" meaning: no MSYS at all). It's not that "native" is always better, but it is good to have the choice. Especially w.r.t. GNU Make, I found the MSYS version to be very hard to reason with, since the additional MSYS path conversion makes things even more complicated than it already is...


Does make files ever actually work that way? In my experience they always require you to install a bunch of packages for libraries which usually only tells you the package names for ubuntu so you have to hunt down what the package is called on your distro or if that version of the package is even in the repos.


If you write them yourself they certainly can. For small projects, you can leave everything explicit and it works great.

Can you break down your problem into a bunch of rules of the form “To produce this file, I need to run these shell commands, which read those other files over there as input”? If so, Make can take care of figuring out which steps actually need to be run.


Most plain makefiles I've used use a tool like pkg-config to resolve library/header paths.



Hopefully Windows 11 is just a reskinned Ubuntu running all the old Windows programs through Wine.


I'll be surprised if there will ever be a Windows 11.


> The big downside of Make, alas, is Windows compatibility.

You'd have to give me a _very_ compelling reason to support developers who use Windows, when Windows lacks these essential tools. Besides, don't people who develop on Windows live in WSL?


Nope. I develop in Python, Java, and Kotlin on Windows and never touch WSL. Make is available natively through Chocolatey (a Windows package installer), but I prefer Gradle.

(I also write code to run on Linux, but still prefer Gradle.)


Slightly off topic but what would you suggest for someone who is familiar with build systems but who hasn’t used gradle?

I’m just getting into Kotlin and gradle isn’t something I’ve used before since I’m mostly web, .net til now.


Why don't you use WSL?

I can barely understand why you'd want to develop on Windows (ok, for non-Windows-only products) with it, but without it...


If you're already using a Vagrant or Docker-based development workflow, WSL doesn't really add much, and takes some things away. I/O performance, for example.


> If you're already using a Vagrant or Docker-based development workflow, WSL doesn't really add much, and takes some things away. I/O performance, for example.

I've been actively using WSL for over a year along with Docker and set up the Docker CLI in WSL to talk to the Docker for Windows daemon.

Performance in that scenario is no different than running the Docker CLI in PowerShell, or do you just mean I/O performance in general in WSL? In which case once you turn off Windows defender it's very usable. WSL v2 will also apparently make I/O performance 2-20x faster depending on what you're doing.

WSL adds a lot if you're using Docker IMO. Suddenly if you want, you can run tmux and terminal Vim along with ranger while your apps run nice and efficiently in Docker. Before you know it, you're spending almost all of your time on the command line but can still reach into the Windows cookie jar for gaming and other GUI apps that aren't available on Linux and can't be run in a Windows VM.


I find that it depends a lot on what you're doing. The real problem with WSL is I/O latency.

It's acceptable for relatively infrequent file access, but will eat you alive if you're doing anything that involves lots of random file access, or batch processing of large sets of small files, or stuff like that.


I just haven't seen that as a problem in my day to day as a developer working with Flask, Rails, Phoenix and Webpack.

That's dealing with 10k+ line projects spread across dozens of files quite often, and even transforming ~100 small JS / SCSS files through Webpack. It's all really fast even on 5 year old hardware (my source code isn't even on an SSD either).

Fast as in, Webpack CSS recompiles often take 250ms to about 1.5 second depending on how big the project is and all of the web framework code is close to instant to reload on change. Hundreds of Phoenix controller tests run in 3 seconds, etc..


It isn't perfect. The IO performance is currently poor and it doesn't play well with Windows Defender (wastes a lot of CPU). Also, since your IDE would live in Windows, you can sometimes have issues with Windows and Linux both interacting with the same files.


More developers are coding in Windows than any other operating system -- almost more than Mac and Linux combined. The Hacker News filter bubble might lead us to believe otherwise.

https://insights.stackoverflow.com/survey/2019#technology-_-...

Windows 47.5%

MacOS 26.8%

Linux-based 25.6%

BSD 0.1%

87,851 responses

(The Stack Overflow survey is a poor representation of the entire development community, but it's worth something, maybe the best we have.)


I've compiled a thing or two with MSYS2.


Developers in corporate environments?

VS2019 supports Clang and cmake now.


>The big downside of Make, alas, is Windows compatibility.

Isn't the big problem that you have no idea what it's doing to your system? Also that you aren't expected to be able to undo it. You can read the makefiles, of course, but it seems simpler not to have to. (Just update the necessary packages yourself, to the latest version.)

Forgive me if this is naive of me.


>Isn't the big problem that you have no idea what it's doing to your system?

As opposed to what exactly? Any other alternative, e.g. separate shell scripts, "npm run" scripts in package.json, running a Docker image, hell even cmake or other make-like tools - does stuff you don't know about without reading the files either.


With Docker at least everything is contained in the container. Which makes isolating and resetting environments a breeze. Something I worry about often is contaminating my system's 'state'. Which always leads to broken builds or incomplete build systems because a missing dependency is not spotted on your system because it was installed by some other tool some other time.

I tend to write my Makefiles to create as much of a local dev environment as possible for every project. Using Python virtualenv/Pipenv/Poetry, Ruby vendored dirs, custom Gopath per project (using direnv), etc. Most tools support some sort of isolation/localisation, but it's often just not on by default.


I wish more tools did this, I almost always want a local, self-contained environment for everything. The few times I don't actively want I don't see much pain in having one. A couple minutes setup time, maybe?

I have seriously considered hiring someone to audit and prune all the random little libraries and tools I've installed over the years for that one-off time I had to process a weird file format or wanted to try something from HN.


To keep my system clean, I use Darch.

https://godarch.com/

Every boot is a fresh install. Any one-off only becomes persisted unless I add it to my recipes.

https://github.com/pauldotknopf/darch-recipes


Maybe you'd like NixOS?


I'm fascinated by NixOS and am following it, just haven't had a lot of time to dive in yet.

It does sound like the right idea. This is hypocritical as someone who doesn't use it but I hope more people use it.


(I was heavily downvoted). What I was thinking is: as opposed to just running your built-in package manager yourself, to upgrade your system to the latest version of all the packages it might require.


Makefiles are used for more than package management. In fact, it doesn't seem very common to use them for package management. Maybe I'm missing something?


I think they were talking about projects that tell you to run `make install`, which I agree is less than ideal.


If you write your own Makefiles you know what they do. They’re not that hard to grok and even hand-rolled Makefile use is (IMO) underrated.


I always had trouble reading makefiles because the control flow is not very linear. At least with shell files basically everything is explicit.


The dependency graph is one of the better reasons to use makefiles--think of the nonlinearity as a bonus!


I often find the non linear way of working with Make an advantage. Since it allows you to break a big piece of procedural shell code with lots of control flow (if X is installed don't install again, etc) into small self contained functional pieces with clean input and output boundaries which can be run individually. It also greatly improves code reuse as every target/recipe can be considered a function.


> Since it allows you to break a big piece of procedural shell code with lots of control flow (if X is installed don't install again, etc) into small self contained functional pieces with clean input and output boundaries which can be run individually

At that point, I use a scripting -not shell- language (which is not as implicit)

I don't program in c anymore, so my major workflow is all in one language; I write my server in the same language that do s the compilation, which is in the same language that does utilities like creating network tunnels to my lab nodes


I hate makefiles.

That said, I wholeheartedly agree with the comment.

It's sad that something so central to a project and so useful and important to so many people seems like it hasn't advanced ... ever.

Developers generally do the minimum with Makefiles and get out. They are similar to 1040 forms in popularity.

I've always had a dream of a redesigned "make" system... with import statements, object oriented rules, clear targets, rules and files clearly seperated, structure and organization... sigh.


I can agree with your sentiment. I often sought alternatives for Make because some things are just missing. However I always end up with a Makefile because Make is just so basic and ubiquitous. Any big change or alternative to Make and you will loose that. That's why I try to ovoid newer Make features. So for me Make not advancing is actually a feature. As it is one of the few things I can depend on to stay the same.


I find Nix to be a really nice alternative to Make; although it's also quite heavyweight and "invasive" (it's not just self-contained binary like `make`), so it's more a case of "I'm already using Nix to install dependencies, why not use it for orchestration too?"


Bazel is pretty nice.


Bazel is great until you have to install tensorflow from source into a container and are sitting wondering why you have to put and configure a JVM inside a container-destined-to-be-static-binary, temporarily to get a python program with no Java bindings installed.

As I'm not the most intelligent developer, I'm sure there's a better, more sophisticated way to do this but I got really frustrated and gave up.


Why not use multiple stages in your docker file?


I'm sure bazel is a good tool when it is used properly, but as a greenhorn in the tech field, the constant version mismatching between bazel and tensorflow can become quite the pain when you have to build tensorflow from source.


I will +1 that. I like Bazel a lot because it really forces you into a singular way of building a project which is clean and nice.

That said, mileage may vary. It was originally built by Google so it has quirks. I find it is best suited for projects with compiled dependencies and large repos.

Otherwise, I was going to add that Gradle as a build system is very advanced and improves upon make in many ways.


Oh whoops, I basically wrote the same comment without reading yours. In any case, here here!


> It's awesome that something so popular is so well engineered that it doesn't need changes.

There, fixed it for you. In particular, I'm glad that the OO cancer hasn't spread to something as basic as a build system.


What I meant in this respect is that a lot of makefile rules have a lot of commonality - it would be great to inherit a workhorse rule and tweak an option instead of having to copy/paste a rule or twiddle makefile variables (which aren't normal programming language variables)


> Doesn't know how to write OO code. > Calls OO cancer.


I really like this tutorial to get into Make files: https://swcarpentry.github.io/make-novice/.


fabfile.py (Fabric) could be used as a Makefile in Python. If you don't ever need to ssh to other machines ti run your tasks, you could use pyinvoke library directly (tasks.py). https://www.fabfile.org/

It is easy to add command line arguments to the tasks, configure them using files (json, yaml), environment variables, to split the task definitions into several modules/namespaces.


Having used fabric in the past, I've always found it just as easy to use a shell script and make files.

There's always some level of bootstrapping a project (installing packages/software, compiling libraries and dependencies) where it's easier to just to write a shell script than to program python to do. E.g. How do you get fabric installed on a system?

There's also been this longevity of sorts that Make seems to have gotten right. People just keep going back to it because it's simple.


I've been moving away from using shell scripts in a tools/ directory to using Python Invoke (http://www.pyinvoke.org), which is the library underlying fabric.

I used bash scripts for years, but for a lot of reasons made the switch:

- It was always painful to create small libraries of functions used across multiple scripts in a project

- It's difficult to consistently configure them with per-user settings. I've written bash implementations of settings management, Invoke handles this for me.

- I'd still have to reach for Python whenever I needed to do anything with arrays or dicts, etc.

- Getting error handling correct can be a chore

Invoke has a lot of nice to haves to:

- Autogenerated help output

- Autocomplete out of the box

- Very easy to add tasks, just a Python function

- Easy to run shell code when needed

- Very powerful config management when needed

- Supports namespacing, task dependencies, running multiple tasks at once and deduplicating them

It's not perfect, but it's a lot better than my hand rolled scripts were.


Groxx replies hits the point. I might work with a small number of platforms, but the "super-simple" qualifier is the point. The point at which you need dictionaries (associative arrays) in your install script, not to mention settings management beyond a make include is the point at which you've outgrown make.


it's also far, far, far easier to make it work predictably on multiple platforms. and easier to understand and change later. that can get nightmarishly hard in make/bash, once you go outside the super-simple realm.


Snakemake is a quite nice improvement on make for data munging stuff.


Wow I've never seen such bloated python project before. It has 10 dependancies with 2 additional optional dependancies and the introduction/tutorial is absurdly overspecific.

The first example they use to describe the tool is:

> Cufflinks is a tool to assemble transcripts, calculate abundance and conduct a differential expression analysis on RNA-Seq data. his example shows how to create a typical Cufflinks workflow with Snakemake. It assumes that mapped RNA-Seq data for four samples 101-104 is given as bam files.

This is epitome of non-programmer programming, colour me disappointed.


And yet it's still less crufty than the "by hackers for hackers" GNU Automake and less over engineered than the "made by real professional programmers at a real big tech company" Luigi. Would love the hear if you have any suggestions for actual alternatives for doing this type of automation beyond what make can neatly deal with rather than just going "eww.. it has dependencies"; "eww.. it's made by bioinformaticists".


One of the worst problems with using windows (in my opinion) is that there’s no native GNU make.


> One of the worst problems with using windows (in my opinion) is that there’s no native GNU make.

GNU Make even comes with a vcproj file for building a native binary with Visual Studio. Worked fine for me. Building it with Guile support though is difficult, but fortunately Eli Zaretskii provides native binaries through his ezwinports, and they worked pretty much flawlessly for me. Of course you will usually need a shell to execute recipes, but Make itself runs natively. For more information, see README.W32 in the sources.


Scoop has it in their repos as well (the gow package).


There are a number of ports of GNU Make to Windows. MSYS2 [1], for examples, provides a reasonable development environment that includes Make.

If you just want a Make, there is [2] which can be installed separately and is part of the GNUWin32 collection.

[1] https://www.msys2.org/ [2] http://gnuwin32.sourceforge.net/packages/make.htm


Isn't non-native development on Windows a solved problem nowadays with WSL(2)?


WSL is currently horrendously (unusably, IMO) slow. WSL2 promises a 20x speed up, but it was already 100x slower than native Linux at some actually-realistic workloads that happen all the time when you're developing (e.g. `git grep`), so it's probably still too slow to be tolerable.

I had the opposite problem of wanting to develop some stuff for Windows from a Linux environment, and I settled on running a linux VM and copying binaries over by scping to WSL, which works reasonably well.


A nice thing with WSL is that you get working make and rsync. But I would like make for coding on native windows. Many FOSS projects use Makefile as the parent post described.


Windows does ship nmake but it is a little different.


I develop on Windows, and I like make as a lazy default you can just type in and as long as you maintain the make file, it will build the thing.

It is also a nice document of what you can build and how.

I also like it because Netlify supports it, so you can get it to run make to deploy your site when you push a commit, giving you a lot of control about your CI, while keeping it simple.


Any good resources for learning about make files that you can recommend?


I don't really know a good all in manual, most thing about Make I learned over years of using it from different sources. And I still sometimes discover new features (and new ones are still added in recent release, but I tend to avoid them to keep Makefiles compatible on older systems).

But the Make manual is pretty comprehensive as a guide and reference: https://www.gnu.org/software/make/manual/make.html

Also (as with most things) knowing what name some concept has makes it easy to search for references. For example the terminology of rules (target, prerequisite, recipe): https://www.gnu.org/software/make/manual/make.html#Rule-Synt...

Things I tend to google often because I forget and some are used more often than others are: automatic variables, implicit rules, conditionals and functions.

One trick that really helps making Make complete is making your own pseudo state files and understanding the dependency system. One of the best features of Make is its dependency resolving. You generally write rules because you want a target (a file or directory) to be created, based on prerequisites (dependencies) according to a recipe (shell scripts). Make figures out that if the prerequisites didn't change, it doesn't need to run the recipe again and it will reuse the target. Greatly saving on build time.

Because Make relies on file timestamps to do its dependency resolving magic if you don't have a file there is not much Make can do. So what you can do instead is create a pseudo target output yourself. For example: https://github.com/aequitas/macos-menubar-wireguard/blob/mas... Here a linter check is run which creates no output. So instead a hidden file .check is created as target. Whenever the sources change the target is invalidated and Make will run this recipe again updating the timestamp of .check. Also note the prerequisites behind the pipe (order-only prerequisites). These don't count toward the timestamp checking, but only need to be there. Ideal for environment dependencies, like in this case the presence of the swiftlint executable.


Matt Might's article is really good:

http://matt.might.net/articles/intro-to-make/


Worth noting that that's an introduction to GNU make, which, while the most common implementation, isn't the only one out there.


The GNU Make manual is excellent.

For learning advanced techniques: "The GNU Make Book" by John Graham-Cumming.




This is a nice video. The only thing I'm missing that should be covered imho (as you will encounter it even if you don't use it) is implicit/pattern rules: https://www.gnu.org/software/make/manual/html_node/Pattern-R...


"GNU Make Book" by John Graham-Cumming

https://nostarch.com/gnumake


We're doing this, and I mostly love it. I haven't found a great way to do code re-use across projects yet, and I'm not super happy with the Make function syntax (but, maybe if it needs a function, I should turn it into a shell script that itself is called by the Make command...).

All in all tho, it's a fantastic place to write down long CLI commands (ex: launching a dev docker container with the right networking and volume configurations) that you use a lot when working on the project.

Our Jenkins pipeline also relies on the Makefiles, literally just invoking `make release`, which is also pretty awesome.


When using it in multiple projects and CI you also tend to develop some kind of Developer-API with common commands/targets. No matter what kind of project you run you always use the same target names to get started. No remembering which tool is used for this lanuage, just clone it, run `make` and you're off, `make test` to test, etc.

Make does support includes (https://www.gnu.org/software/make/manual/html_node/Include.h...) which allow for some form of code reuse across projects. But then you encounter the balance between DRY and clarity. There are always exceptions, so you try to make stuff to universal, but then its hard to grok the code. And I feel that if I start to use functions I'm using Make wrong and that kind of logic better fits in shell scripts called from the Makefile. Makefiles (the way I use them at least) should be simple to read and explain themselves. But it's often hard to balance this with the features Make provides, like implicit rules and automatic variables. And if I ever turn to generating Makefiles (other than for C projects where it kind of expected) I will probably retire.


> common commands

Oh absolutely. It's fantastic for that. Our build pipeline actually relies on that; every project has a "release" target that is basically for the CI to use.

> Make includes

Yeah, I looked into that, and I think I had the same conclusion.

> scripts called from the Makefile

That's what I'm thinking is the way to level up this kind of system. Although then, why have `make init` instead of just `./bin/init` ?


The biggest reason I use Make is the dependency resolving.

In the `make init` example. It doesn't matter how many intermediate steps are involved `init` is the end-state I want to achieve. So in most of my Makefiles the `init` target will fan-out into requirements as wide and deep as it needs, including running apt to install missing system dependencies. But then the good part. If a dependency is already fulfilled Make won't have to run it again. Although sometimes its hard or clunky to convert some dependencies into 'files' so Make can do its dependency resolving work properly.


Have you ever considered using Rakefiles instead?

https://github.com/ruby/rake


Never wrote them myself but have encountered them sometimes. Had no major issues with them then I believe. However I would probably write a Makefile to manage the Ruby environment and install Rake as they don't come installed by default.


self taught dev here too. I have never used make files, but pretty sure I'm using NodeJs in a similar role. I use it to automate all my "scripting", including deploying of my SaaS product to the cloud and running unit tests.

If it sounds interesting, check out https://www.npmjs.com/package/shelljs

PS: I do my primary development on windows, but my production environment is ubuntu. node apps "just work" on both environments. truely cross platform.


I put Make in the same class as Vi. I hate using them but I have to learn them because they're the least of N evils, the most pragmatic way out of a hole.


I second this... a lot of times, broken make files are standing between you and victory, so it would be good to at least have some familiarity with them.


I also use similar Makefiles in my projects. I use "make release" to generate the docker container.


I love Make in concept and kind of hate it in practice. There is sooo much incidental complexity and so many warts to work around. I think it's a concept that is ripe for a new approach that thoughtfully keeps the good, ditches the bad, and maybe even adds some useful capabilities that aren't already there.

But of course I'm immediately skeptical of this idea a la https://xkcd.com/927/ (Standards). For instance, maybe this is what npm and all the rest thought they were doing. Certainly Rake in the ruby world tried to do this, and I never really liked it, so clearly they missed the mark somehow, at least for me. But then when I feel discouraged about the ability to improve on things, I think about how I felt this way when I first heard about Git. Why would you implement a new source control system when we already have subversion? Sure, svn has its frustrations and warts, but this new thing is just gonna have its own frustrations and warts and now we'll just have another frustrating warty thing and we haven't really gained anything. And this is totally true! Git is super frustrating and warty. Except that it's also way better than subversion, much faster and far more flexible. It was a revelation when I started using it. So I think back to Linus when he was thinking about creating git and think that he probably didn't have this discouraged uncertainty about improving things; he just had ideas for a better way and he went out and did it. (And yes, I know it was influenced by bitkeeper and other DVCs exist, so it's not like he invented the concept, but my point stands.)

So maybe someone could make a better Make?


On Windows there is great Powershell module Invoke-Build.


Makefiles are so old and quaint, why not use "{flavorofthemonth}".format(flavorofthemonth=np.random.choice(frameworks)) ?


Read the curriculum of an undergraduate computer science course and read up on the things you haven't heard of. Some courses will even have lecture notes available.

E.g. these four pages are the university of Cambridge masters in computer science:

https://www.cl.cam.ac.uk/teaching/1819/part1a-75.html

https://www.cl.cam.ac.uk/teaching/1819/part1b-75.html

https://www.cl.cam.ac.uk/teaching/1819/part2-75.html

https://www.cl.cam.ac.uk/teaching/1819/part3.html

(Or a MOOC, but the links above are easy to browse text, syllabuses and lecture notes, not a load of videos.)


I support this 100%. I worked for years as a self-taught programmer. When I went back for my CS degree, I was shocked at how much I didn't know that I didn't know.

Numerous times I'd be sitting in a class and we'd go over a solution to some theoretical problem, and I'd realize that this solved a problem that had taken me days to discover on my own (and this solution was usually better than what I'd come up with).

If you are the kind of person who can work through everything on your own (including what may seem like the the boring parts), I highly recommend doing so.


Could you give an example of one of the times something theoretical helped you solve a real world problem?

I've thought about going back for my CS degree a lot but can't really justify the cost and time investment vs self teaching. But it's something that's always been in the back of my mind.


Not the OP, but I too was a self-taught programmer as a teen who got a CS degree in my 20s. I independently came up with the idea binary search in sorted data structures. But the first time I encountered hash tables in the course of getting my CS degree my reaction was "That's impossible! You can't get O(1) efficiency!"

(Sadly though this exposure was not in the context of a theoretical course on data structures, but rather in the context of reading the docs for HashMap as my university dropped older courses and languages to jump on the bandwagon of becoming a "Java school".)


Sure. It's been a while, but the first one is that comes to mind is when we went over Floyd's Tortoise and Hare cycle detection algorithm. I realized it was a much cleaner solution to detecting cycles in a linked list than a solution I developed on my own over several days.

Another example: the automata class I took went over pushdown automata, and I immediately saw that it would solve the issues I'd been having with a finite state machine I was using to handle input for a game.

Oh and recently I needed to put different sections of a screen on different layers so that no 2 adjacent sections were on the same layer. I realized that this was basically just graph coloring, so I was able to find a solution in minutes instead of hours.

I'm sure their are people who can get through most of a CS curriculum on their own, but I'm not that disciplined. I've also never met anyone who was. It has been immensely helpful.


To clarify: the first three links are for each year of the (three-year) undergrad program, the fourth is for the Masters.

The Cambridge course isn't perfect, but they do a very good job of making as much teaching material as possible publicly available.


FWIW, I've found many undergraduate computer science courses to lag behind on tooling, so take the recommendations they have with a grain of salt.


The Cambridge course is much more theoretical than most others, afaik. Tooling on programming language semantics, for example, doesn't change that much.


Do Cambridge courses not have labs/projects? I looked at the course materials on a few of the courses and couldn't find any. Or are they given out to students separately?


There are hardware and software labs, which are administered on paper by PhD students. These include(d): ML (the functional programming language), FPGA/soft core development, Java tasks, breadboarding some logic, prolog and probably some different ones now (looks like some machine learning tasks?). Some of them are referenced and described on the links above. There's also a group project in year 2, a dissertation individual project in year 3, and a small holiday project between 1 and 2. Overall, a few students get through it without being able to properly program, but most basically self teach.


There is a system of supervisions, that is a bit like doing homework and going over it in a private (1/2/3 students to one prof) lesson once every two weeks. Sometimes the questions would be standard for a course, sometimes the professors chose their own. They are not necessarily directly tied to the course as lectured.


Thank you very much. This is very valuable.


Does an offline copy of this exist? Do you think it will go down when the term probably ends soon?


I would expect the material stay up: at the moment, everything back to 1998/1999 is still accessible:

https://www.cl.cam.ac.uk/teaching/material.html


Do you know about curl/wget? Each one does pretty much the same thing as the other, but you can start a religious war by suggesting that one is preferable.

Anyway, either of them will let you mirror a website so you never have to worry about it going down.


And, since every Unix command line tool inevitably gets mined and turned into a web service, you could always submit those urls to archive.org instead of or as well as curl-ing/wget-ing them.


https://github.com/ArchiveTeam/grab-site if you're super serious. Also archive.org will probably accept those output warcs.


1. Profiler. There's a standard tool that tells you what part of your code is slow. Over half the time it'll find something dumb and easy to fix instead of whatever you expected.

2. SQL / relational database schemas. Persistence opens up a lot of capabilities. And databases themselves are very well-optimized; if you do any nontrivial data manipulation it's likely that whatever the query planner comes up with will be faster than your first idea of how to do it by hand.

3. Graph searches. An awful lot of problems can be solved by knowing how to turn problem into a graph search. Make sure not to fall into the trap of thinking a graph search is limited to paths through space - you can solve problems like "get through this dungeon with keys and doors" by adding duplicate nodes for the different states.

4. Sequential Bayesian Filters. Are almost as useful as graphs, but aren't in a standard CS curriculum so you'll look like a wizard. These solve the problem of "I want to know a thing and I know how it changes over time, but I only can get rough estimates for its current state." Kalman Filters are simple and give great results when applicable. Particle Filters have lower quality but are applicable to more problems and dirt simple to code.


Support for 4! Yet, my understanding is that particle filters are superior but computational more demanding. For nonlinear problems, the extended Kalman Filter linearizes the task, whereas particle filters don't and work with many point estimates instead.

I loved this book: https://users.aalto.fi/~ssarkka/pub/cup_book_online_20131111...

and also Thomas Schoen group does great work on Sequential Monte Carlo (SMC), MCMC for sequential data :) http://user.it.uu.se/~thosc112/index.html

They are also building a probabilistic programming language for sequential data! https://github.com/lawmurray/Birch


Regular old Kalman Filters are the best (literally perfect) when your problem fits all their requirements. They also have a lot of nice properties if you're dealing with a problem that mostly fits their requirements. But the linear-gaussian requirement is pretty steep, they don't always work.

I don't like the EKF much and prefer the UKF. The core filtering code is a little more complex but they're much easier to actually work with; you can give them arbitrary functions like a particle filter.

Particle filters have the advantage of being able to handle arbitrarily wacky distributions. But they are random and do some wacky things in edge cases. They'll behave much more poorly in low-evidence situations than other filters will. And they fall over spectacularly if you switch from low-evidence to high-evidence (there's a workaround for this but it's still counterintuitive). Finally they're just more computationally expensive than the others.

Birch sounds interesting, I'll take a look.


strongly agree on profiler and SQL.

Horror story re SQL: in my SaaS I skipped sql and went with cloud datastore (NoSql) and regret it. basically (to simplify) you can't query your data without doing a full table scan (IE Slow).


NoSql is not no sql though..


Unit testing, mocking, and various other testing techniques.

Why? Any project of sufficient complexity is very hard to test. If all you're doing is code -> build -> run to debug your code, you can very easily break something that's not in your immediate attention.

The problem is that good unit testing is hard, and time consuming. It can be so time consuming, that unless you can really plan in advance how you test, you could spend more time writing test code than real application code. (This is what happens when writing professional, industrial-strength code.)

So, when a hobby project becomes sufficiently interesting enough; such that the code will be so complicated that your code -> build -> run loop won't hit most of your code, you should think about how to have automated tests. They don't have to be "pure, by the book" unit tests, but they should be an approach that can hit most of your program in an automated manner.

You don't need to do "pure" mocking either. If you're writing something that calls a webservice, you could write a mock webserver and redirect your program to it. If you're writing something that works with pipes, you could have a set of known files with known results, and always compare them.

The goal is that you should cover most of your program with code -> build -> tests; and only do code -> build -> run for experimentation.


Let me second this. And in particular, I strongly encourage every developer to try starting a new project in a test-driven fashion (by which I mean that you advance the code by writing a bit of test and making it pass, and then doing that over and over.)

There's a qualitative difference between working in a well-tested code base that's very hard to describe convincingly. A lot of my early development experience was in code bases that had little or no testing. Experiencing a well-tested code base totally changed things for me. Instead of work being a death-of-a-thousand-cuts experience, it became pleasant, steady progress.


> Experiencing a well-tested code base totally changed things for me. Instead of work being a death-of-a-thousand-cuts experience, it became pleasant, steady progress.

I had the luxury of taking a well known data process and rewriting it with integration tests (input in, matching output with a golden file). It changed my professional life. Whereas before our deployment process included a 3 day wait and manual data checking on stage, after I was able to do deploys multiple times a day with confidence.

Made a believer out of me.


Unit tests can give you false positives (test failed but code is correct) and false negatives (test passed but code failed).

And TDD seems to create so many tests that you get huge false positive rates. I recently jumped on a project and I made a couple of fairly small code changes (a couple of hours) which caused 100 tests to fail. I then spent the next two days going through and correcting all 100 tests none of which found an issue in my code.


If you're saying that it's possible to do testing badly, I agree, just like it's possible to write production code badly. Sometimes teams new to unit testing do it ritualistically, without really understanding the purpose. That can lead to all sorts of bad outcomes. E.g., lots of tests that look impressive and even generate good coverage numbers, but don't really test what matters. Or tests that are highly duplicative, such that changing one thing in the code requires changing a lot of things in tests.

I have definitely dealt with code bases like that, and that sucks. But I have also dealt with code bases where the tests were great, and that's an amazing experience.

To do TDD well, I think it's important to release early and often and to reflect on one's experience (e.g., with weekly team retrospectives). That way if people are doing something unhelpful, like writing very duplicative tests, pretty soon they'll become an impediment to progress. The team will learn to write the useful tests, while skipping the ones that might fit some hypothetical pattern. It also helps people learn to design for testability; often, painful tests are a sign of bad design of the production code.


What are some resources for “good testing”, test boundaries, and possibly antipatterns?

(Ruby, Rails)


I've read a couple TDD books and this definitely seems to be a big blind spot. How to deal with the maintenance issues of unit tests.

They all seem a little fanatical in their pro unit test talk and don't discuss the downsides.


I find https://www.youtube.com/watch?v=EZ05e7EMOLM describes my own experiences with automated testing quite well.

tl;dr:

- Focus on "automated testing", don't get obsessed with philosophising about "the true nature of a 'unit'", or other such dogma.

- Be empirical: base your rules on what works; don't base your work on "the rules".

- The goal of testing is to expose problems in our program: "test failure" is a success, because we've found a problem (even if that problem is with the test!). Anything else is secondary (e.g. isolating the location of failures, documenting our API, etc.). Avoiding this goal defeats the point (e.g. choosing to ignore edge cases).

- Focus on functionality rather than implementation details, e.g. 'changing a user's email address' rather than 'the setEmail method of the User class'. This improves reliability and makes failures more useful/meaningful (i.e. "this feature broke" vs "this calling convention has changed").

- Mocking is a crutch: it works-around problems that can usually be avoided entirely during design; it can still be very useful when a design can't be changed (e.g. adding tests to a legacy system).

- Testing a real thing is objectively better than testing a fake thing; we should only mock if testing the real thing is unacceptable.

- If two components always exist together, pretending that they're independent is a waste of time and complexity.

- Having some poor tests is better than having no tests. Tests can be added, removed and improved over time, just like anything else.

- "Property checking" is a quick way to find edge-cases and scenarios we wouldn't have thought of.

- Fast feedback loops are important. Reducing I/O and favouring pure calculation usually speeds up testing more than reducing the number or size of tests (e.g. "unit" vs "end-to-end"). Incidentially, this is also how we avoid having to mock.


The type of engineers who would screw up 100 unit tests independently are exactly the kind of engineers who should be forced to write tests for their code. Can you imagine the integration tests had they not been doing any testing at all?


Does that indicate that the tests were not written correctly in the first place?


I don't think so. They probably could have been written better, they weren't written poorly, but it's really hard to write 200 unit tests for a feature that don't break when the feature is updated.


This is the gospel truth. It does take discipline though, because writing tests sucks. I like to have a policy of never committing a non-trivial function without a test. That way, I can never put it off and wind up with a huge chunk of untested code.


Are there any resources out there you would recommend for learning testing techniques in a Python context?



Is that useful if you’re never writing django apps?


It depends on your background. Having written web app before let's you quick grasp the ideas laid in the book.

To me, the most important chapters are

- https://www.obeythetestinggoat.com/book/chapter_mocking.html - https://www.obeythetestinggoat.com/book/chapter_purist_unit_...

Having said that, the concepts are universal.


Brian Okken's "Python Testing with pytest"[1]. More recent than Harry Percival's book.

[1] https://pragprog.com/book/bopytest/python-testing-with-pytes...


You are completely wrong.

Mocking is a huge design smell. The more mocks or integration tests your projects requires to get full coverage the less modular your program is. A program that uses many mocks is a sign of very very poor design. You will find the code more complex to reason about and much harder to reuse code without necessitating a lot of glue code to make things work together. Without proper knowledge you won't even know the program is poorly designed.

I will grant you that 90% of programmers out there don't know how to design programs in a truly modular way, so most engineering projects will require extensive mocking. In fact most engineers can go through their entire career without knowing that they are making their programs more complex and less modular then it needs to be. Following certain design principles I have seen incredibly complex projects require nearly zero mocking (very very rare though).

Mocking indicates a module is dependent on something. Dependency is different from composition.

     Dependencies                                Composition


           C                                        C
 +---------------------+
 |                     |       +----------------+       +-----------------+
 |     A               |       |                |       |                 |
 |                     |       |                |       |                 |
 |        +----------+ |       |                |       |                 |
 |        |          | |    in |                |       |                 |  out
 |        |          | |    -->+       A        +------>+         B       +-->
 |        |    B     | |       |                |       |                 |
 |        |          | |       |                |       |                 |
 |        |          | |       |                |       |                 |
 |        |          | |       |                |       |                 |
 |        +----------+ |       |                |       |                 |
 +---------------------+       +----------------+       +-----------------+
What's going on here? Both examples involve the creation of module C from A and B.

left: 'A' exists as wrapper code around B and is useless on its own. To unit test A you must mock B.

right: every module is reuseable on its own. Nothing needs to be mocked during unit testing. No dependencies.

The only exception to the right example where you MUST mock is a function that does IO. IO functions cannot be unit tested period, they can only be tested with integration tests.

There's a name for the left approach. It's called Object oriented programming using inheritance or composition(the oop version of composition; not functional composition) as a design pattern. (both are bad)

There's also a name for the right approach. It's called functional programming using function composition.

I don't advocate that you strictly follow either style. Just know that when you go left you lose modularity and when you go right you gain it. All functional programming does is force your entire program to be modular down to the smallest primitive unit. Extensive mocking in your program means you went too far to the left.

tangent: Another irony around this world is that a lot of functional programmers (javascript and react developers especially) don't even know about the primary benefit of functional programming. They harp about things like "immutability" or how its more convenient to write a map reduce rather than a for loop without truly ever knowing the real benefits of the style. They're just following the latest buzzword.


Forgive me, if I'm being dense, but doesn't either of these cases depend on how the composed objects are being used?

In your functional example A is an input to B (or vice versa?), how do you propose testing one of the modules without first instantiating the other one?


I'll give you two examples. One functional and the other OOP. Both programs aim to simulate driving given an input of 10 energy units to find the final output energy.

  #oop

 engine = Engine(10)
 car = Car(engine)
 car.drive() #result 8

  class Car:
    def __init__(self, engine):
     self.engine = engine

    def ignite(self):
     self.engine.energy =- 1

    def run(self):
     self.engine.energy =- 1

    def drive(self):
     self.ignite()
     self.run()
     return self.engine.energy

 class Engine:
  def __init__(self, energy):
   self.energy = energy


 # ignite not testable without engine
 # run not testable without engine
 # drive not testable without engine and a car
 # ignite, run, and drive are not modular cannot be used without engine. 
 # engine testable with any integer. 
 # Car useless without engine
 # engine useless without car




 #functional \
 def composeAnyFunctions(a,b):# returns function C from A and B. See illustration above. 
  return lambda x: a(b(x)) 


 def ignite(total_energy):
  return total_energy - 1

 def run(total_energy):
  return total_energy -1 

 drive = composeAnyFunctions(run, ignite)
 drive(10) #result 8

 # compose testable with any pair of functions
 # run testable with any integer
 # ignite testable with any integer
 # drive testable with any integer
 # all functions importable and reuseable with zero dependencies. 
 # input_energy -> ignite -> run -> output_energy


"I think the lack of reusability comes in object-oriented languages, not functional languages. Because the problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle." - Joe Armstrong

you don't necessarily need the car or engine to simulate the energy output of driving.


I've been using static methods in java that follows the pure function way, it proved very easy to maintain even to those who inherited my code later on.


That's mainly just namespacing. The only point to use an Object in object oriented programming is to unionize objects and state. To combine them together into a single primitive. This combination breaks compose-ability.

Static functions avoid state. You put them in an object in java because java has to have everything in an object. In any other language these would just be top level functions namespaced into a package or something. You are basically using java in a more functional way. Which is fine.


Thank you so much for a concrete example. I need to think about this some more. Clearly the code make sense, but in a wider context, can you have a banana without a jungle? I'm dabbling with some functional programming but I definitely have more experience with oop, so what you're saying is difficult for me to grasp, but the benefits are hard to ignore.


There are downsides to FP as well. I am not advocating one over the other. But there is a concrete theoretical reason why FP is more modular, reuseable and organized than OOP code.

Smalltalk is possibly the only OOP language that lets objects be compose-able and modular. Check out Pharo if you're interested. If you learn smalltalk well enough, you could apply its principles to traditional OOP languages and gain the modularity benefits.


> There are downsides to FP as well.

I've seen Java 8 functional stuff get unreadable.

But other than that, is there any other concrete downside?

For using Pure Functions, I don't see any downside to this. Aside from it being impossible to use for outside the program side-effects like IO to device.


I mostly agree with what you're saying, but I will add that is is also possible to write well-designed, modular, easy-to-test (minimal mocks) OOP code. It does provide more guns to shoot yourself in the foot with, I will admit.


Yes you are correct. Check out smalltalk, it fits the paradigm you describe. It was actually rated the most productive programming language in the world according to namcook. Ironically, it's Definitely one of the least popular languages as well.


Your argument appears to be, in TL;DR form: OOP and dependencies are bad and wrong, you must use Functional Programming or you will be wrong.

Isn't that a little extreme?


No. You are putting words in my mouth and accusing me of being extreme. I am NOT promoting one paradigm over the other. TLDR? I hope you read my stuff. I find it rude if someone just comments with a one liner and summarizes everything I said into a catchphrase that is a perversion of the truth. I feel like a presidential candidate.

Anyway, this is what I am saying:

If you use functional programming your code will be more modular and reusable because the paradigm forces you to be that way.

If you use Object Oriented Programming your program will automatically be less reusable and less modular but more object oriented.

This is all I am saying. Your mistaken statement that I am promoting one style over the other is based off of this assumption: Modular programs are better than less modular programs. This is not True.

Something like a physics engine is a better fit for OOP then it is for functional. Although your program will be less modular as a result, OOP is still a better fit because physical objects are easily modelled with OOP objects.

Trees, graphs and algorithms involving things of that nature are a better fit for objected oriented programming then functional because many of these algorithms involve mutating nodes. Again, if you follow this style your program will become less modular overall as a result.

The ideal program is one that spans the spectrum of both OOP and functional. When it calls for it use functional or OOP depending on context. Overall for complex web applications that most startups make, in my opinion, the program should be more functional then it is OOP. A web request is basically a function that takes in a request as an input and outputs a response. The form factor of a function better fit for this, and you get high modularity as a side benefit. There is no point in simulating the request/response paradigm in a stateful Object while losing modularity in the process.

For a game. OOP is better in my opinion. Gaming entities involve constant mutation of things with state so OOP is a better fit. UI is a better fit for OOP as widgets are better represented by objects (FRP aka react&redux, imo works well but is an awkward abstraction)

There is one exception to this rule. In general Objects in object oriented programming are not compose-able. However, Smalltalk is an object oriented language where objects ARE compose-able. Smalltalk is the language that coined the term "object oriented" and although it is no longer popular as it was before it is still a very robust language and learning from it has huge benefits.


Thank you for the clarification!


Learning in-depth your various options for persisting data, is very useful since most applications have to deal with persistence in some form, and increasingly in a distributed manner. Go beyond simply skimming the surface of SQL vs. NoSQL and the marketing claims different databases make about their scalability and consistency. Learn what ACID and CAP stand for and the tradeoffs involved in different persistence strategies. Learn SQL really well. Learn how to read a query plan, which is the algorithm your SQL query gets compiled into. Learn about the tradeoffs of row-based vs column-based storage. Learn how indexes work, and what a B-tree is. Learn the MapReduce pattern. Think about the tradeoffs between sending code to run everywhere your data is stored vs. moving your data to where your code is running.


Two great resources I've been going through are

- https://dataintensive.net - Really deep dives into different types of data storage solutions, their history, and how they actually work.

- http://www.cattell.net/datastores/Datastores.pdf - Good paper that helps differentiate similar but different datastores. Really helpful when you're trying to pick a modern data solution.


Designing Data-Intensive Applications is probably the best O'Reilly (if not overall technology) book of the past decade.


The talk on "Turning the database inside-out" [0][1] by the author, Martin Kleppmann, is a fantastic intro to these dynamics, and it's something I'll always recommend to both experienced and inexperienced data modelers and backend developers.

It goes pedagogically through the way things are typically done in a relational database in such a clear way that word-for-word it's one of the best tutorials I've seen... but it also weaves a narrative of "how can this be done better/more scalably/more reliably/more flexibly-to-business-needs" in pointing to a streaming/event-sourcing architecture. You may or not need the latter right away, but it's a fantastic tool to have in your toolbox to be able to say "ah, this new requirement feels like it would benefit hugely from this architecture."

Especially for OP who's starting to think about the "why" of messaging queues, this could be a fantastically valuable first step.

[0] https://www.youtube.com/watch?v=fU9hR3kiOK0

[1] https://www.confluent.io/blog/turning-the-database-inside-ou...


Another good resource that guides one through both the philosophy (why) and technical details (how) of building a web application is Software Engineering for Internet Applications:

https://philip.greenspun.com/seia/


Learning how to use dtrace / bpftrace [0] is very valuable if you ever need to get into serious systems profiling.

There are some really cool data structures out there you might not know about. One of my favorite basic ones that I get a lot of use out of is the trie [1] (a.k.a. prefix tree). Very useful for IP calculations.

Also look into probabilistic data structures [2], very amazing things can be done with them.

[0] https://en.wikipedia.org/wiki/DTrace

[1] https://en.wikipedia.org/wiki/Trie

[2] https://en.wikipedia.org/wiki/Category:Probabilistic_data_st...


DTrace is life-altering.

I keep hoping that someone will build a dtrace(1) CLI that transpiles to bpftrace.


Profiling, period


Approximate Windows equivalent of DTrace is sysinternal process monitor, freeware. Very useful sometimes.


The Windows equivalent of DTrace is.. DTrace. [0] DTrace is about far, far more than snooping the filesystem. At best, Process Monitor is an equivalent of Brendan Gregg's DTrace utility, opensnoop. The true power of DTrace is to correlate events across subsystem boundaries. Like, graphing the top quartile of latencies from network acceses initiated via a given function in your application.

[0] https://techcommunity.microsoft.com/t5/Windows-Kernel-Intern...


Bloom Filters are awesome.


Shell scripting for processing text. You can often get so much done with so little code and effort.

Also on a semi-related note, I think as a self taught programmer, it's easy to get stuck on things that seem cool but are just procrastination enablers (I know, I've been guilty of it for 20 years). Like, if you're about to start a new project and you want to flesh out what it's about, you really don't need to spend 5 hours researching which mind map tool to use. Just open a text document and start writing, or get a piece of paper and a pen. It won't even take that long.

I spent about 1.5 hours the other day planning a substantially sized web app. All I did was open a text file and type what came into my head. For fun I decided to record the whole process too[0]. I wish more people recorded their process for things like that because I find the journey more interesting than the destination most of the time. Like your journey of eventually finding message queues must have been quite fun and you probably learned a ton (after all, it lead you to message queues, so it was certainly time well spent).

[0]: https://nickjanetakis.com/blog/live-demo-of-planning-a-real-...


These days it might be better to just learn python. It's cleaner and scales better to complex code. And it's ons most system modern systems available out of the box where shells are available too. Shells are still good for simple oneliners, and knoting multiple processes together, but text-processing involves so many different commands, each with their own quirks, that a consistent simple language is IMHO superiour.


For processing text, using Python doesn't really make sense in a ton of cases.

If I want to search a text file for a specific string, why wouldn't I just use `grep "hello" myfile.text`, or if I wanted to do it on a directory of text it's a minor change of `grep -R "hello" .`.

Why would I go through the trouble of opening a Python interpreter, or writing out a Python script to do the equivalent in Python?

Or if I wanted to grab the third column of a CSV, I would for sure just use the `cut` command or maybe `awk` (depending on what I'm doing).

For more complex parsing you can often pipe together a few commands and maybe convert it into a 5 line Bash script to make it a little easier to create variables, etc.. It becomes something you can whip up in 1 minute.

Then there's also more involved text parsing that doesn't require piping a bunch of commands together or shell script glue, in which case it comes back to using grep with its various flags and potentially a regexp. It's a natural fit for the problem and you can iterate on it so quickly.


Honestly this applies to Ruby & Javascript/Typescript as well and not just Python. I really don't see the value of learning shell scripting anymore when the newer languages are just was easy to learn, terse, and you can adapt better to changing conditions when needed with libraries.


I often find multi-line Python scripts with `import os` and others that could be a fraction in size (and just as clear) in bash. Even more ridiculous are the times I find a node script (published to npm, even) that is little more than a wrapper on a shell script.

Inevitably someone will read these arguments and think “those are just bad programmers”, but your point was that you “don't see the value of learning shell scripting”. The value is in not spewing absurd code like that. Shell commands are fast and efficient. There isn’t an emphasis on libraries because instead you use tools. Is `grep` not enough? Try `the silver searcher`[1] or `ripgrep`[2].

Are shell scripts the best instrument for every job? No, but no tool is.

[1]: https://github.com/ggreer/the_silver_searcher

[2]: https://github.com/BurntSushi/ripgrep


Just because someone knows shell scripting I'm not going to consider them "bad programmers". I know it myself and I've used it extensively before I learned python & ruby.

My point is that the cost for learning and using shell scripts is just too high compared to just using a modern language that's just as terse and a lot more powerful and flexible. Context switching from one language to another isn't free either.

imo the only time shell scripting was practical was when the only major programming languages were C, C++, and Java. imo even Perl5 is more practical than shell scripting.

Also I doubt that the python program you mentioned was that much bigger than a shell script


This 1000x. I put of just getting by with shell script for some years and when I finally decided to get deeper into it, it's magical.

A good series of piped commands with tools available basically everywhere can solve problems you had no idea could be so simple to solve.


have any good resources for this?


  man bash  
  man awk  
  man sed  
  man grep
You can also do most of this stuff with Perl one liners if that suits your fancy.

sed + awk - https://www.amazon.com/sed-awk-Dale-Dougherty/dp/1565922255

awk - https://ia802309.us.archive.org/25/items/pdfy-MgN0H1joIoDVoI...

general *nix text processing - https://www.tldp.org/LDP/abs/html/textproc.html


Perhaps these are a good start:

awk: - GNU Awk User Guide - https://www.gnu.org/software/gawk/manual/gawk.html#Getting-S...

- Grymoire guide - http://www.grymoire.com/Unix/Awk.html

sed: - Grymoire guide - https://www.grymoire.com/Unix/Sed.html - Official docs - http://sed.sourceforge.net/#docs


The IBM developerworks articles about this are old (2000!), but still incredibly useful and well written. Start here:

https://www.ibm.com/developerworks/library/l-sed1/index.html https://developer.ibm.com/tutorials/l-awk1/


There's a free O'Reilly book from last year: https://www.datascienceatthecommandline.com/

awk is pure magic.


Also, just get to know your shell.

big timesaver: control-r

basic but useful all over:

    for i in *.c; do cp "$i" "$i.bak"; done
etc...


What does double quotes dollar do in the command?


Ensures filenames with spaces in them get passed as a single argument, instead of being inadvertently expanded into several.

Quoting and expansion issues are a pain in shell languages...


$i is the variable i declared in the for loop. Quotes just wraps it, so that it's (somewhat) safe if the file has a space in the name


What do you mean, "somewhat"? This looks safe for all I can tell.


You know how you should almost always eat your vegetables?

You should almost always quote your Bash variables too. Here's why: https://nickjanetakis.com/blog/here-is-why-you-should-quote-...


It prevents filenames with spaces from expanding into two arguments to the command.


I'd split your first point in two:

- Shell scripting for running commands and managing files

- Unix utilities for processing text

These happen to complement each other nicely. It's not that bash is better at manipulating text than Python, it's that Python makes it painful to invoke commands (like those Unix utilities) and pipe data between them (e.g. https://news.ycombinator.com/item?id=17733865 ).


https://dataintensive.net/

I can't recommend this book enough. I have a CS background, and still had quite a few "I can't believe this thing has been hiding in plain sight!" moments while reading it.


It's great. Incredibly dense with useful information and it just blows my mind how much knowledge Martin has about the topic. I recommend watching this talk from him to give a little glimpse of the book: https://www.youtube.com/watch?v=5ZjhNTM8XU8 This is just about a little part of one of the chapters.


Oh man, this is good, thank you.


I'm now torn between reading this one first or the Architecture of Enterprise Applications.


I loved Designing Data-Intensive Applications. It gives you the reasons why NoSQL databases exist and the problems they solve. Moreover it gives you reasons to select one over another. It's really excellent and one of my top two CS books


Your other top CS book out of interest?


If it helps, IMO "Designing Data-Intensive Applications" is a better bang-for-the-buck. Enterprise-scale applications are a world unto themselves.


Edit: I meant Patterns of Enterprise Application Architecture by Fowler in my comment above. Recommended by DHH.


My advice would be to skip it completely. It's just packed full of standard GoF OO dogma.


Thanks. So, what you're saying it is redundant if you've read GoF?


But this is mainly for distributed (web) systems, right?

Are there good books for data intensive desktop apps? Like games or CAD design tools?


Debuggers and property based testing. It is a select few people that can actually productively (not their own metrics) use print statements for debugging. Learning how to craft repro scenarios and adequately capturing state in a debugging session can enable junior devs to easily surpass senior devs.

Property based testing aren't quite formal methods, but I think they are a good stepping stone. And they also somewhat force your code into an algebraic/functional style which also make it amenable to refactoring, better testing and is easier to understand.

Design tools like Swagger can help one think through services w/o diving into code. Code itself is a liability and should be thought of as "spending" not creating. Code is a debt.

Refactoring and code understanding tools, if you use PyCharm (you should, it is free in all senses), learn how to navigate into your libraries. Read you libraries.


This x1000 debuggers are seriously undervalued by many developers. It’s like a super power.


What are some good resources to learn about debugging patterns and tips/tricks? My preferred language, Julia, recently introduced a nice set of tools related to debugging. I feel like there's probably things that would make me more productive but I think the techniques would be more broadly applicable than a specific language.


One thing I always do these days is I step through any new code I've written the first time I run it. This usually weeds out some bugs that might take a while to find because they are easy to miss. It also ensures that you actually go through each line of code you write, doing a forced code review on yourself early on.


I highly recommend learning PROLOG & understanding how to write your own simple planner system. The hairiest real problems are hairy because they're best suited to a declarative style (and programs written declaratively can be made much more efficient through more clever solvers -- given naive code, a clever solver has a much bigger efficiency boost over a dumb solver than an optimizing compiler does over a non-optimizing one -- although PROLOG itself leaks too much abstraction for many of these techniques to be viable in it).

I also recommend understanding message routing systems used in file sharing, like CHORD.

If you don't have a strong background in the math behind theoretical computer science, you might benefit a lot from an understanding of the formal rules around boolean logic, symbolic logic, & state machines -- especially, rules about when certain kinds of things are equivalent (since stuff like demorgan's law are used for simplifying and optimizing expressions a lot, and rules for state machines are used to prove upper limits on resource usage).

If you don't already, learn to use awk. It's a much more powerful language than it seems, and fits extremely well into the gap between command-line prototyping in shell one-liners & porting a prototyped tool to python or perl, and so it's a huge time saver: it is faster to write many kinds of tools in a mix of shell and awk and then rewrite them in python than it is to write them in python in the first place.


I've never used Prolog in the 15 years since I learned it in college. It's an interesting take on programming, for sure, and I appreciated the mind-expanding exercise, but hasn't helped me in my career at all.

Totally agree on awk. I use it almost every day for quick little one-liners. Big time saver.

Also agree on state machines, because from there it is a short hop to understanding formal grammars and the foundation of compilers and languages, which has been immensely useful in my career.


Learning about automata theory was one of the most mind-expanding experiences I had in college. It was certainly not something I would've stumbled upon without guidance. I believe that describes most of the value I derived from a degree, being nudged in the right directions toward solutions and problems that a lot of smart people have though about for a while.

Understanding how different languages, or inputs more generally, can be transformed into meaningful outputs is pretty satisfying. It's a topic that almost seems to transcend the realm of computer science.


The best thing you can do for your career is learn things that don't apply to your career. After all, it's impossible to predict unknown unknowns, which makes accidentally having already learned something nobody else valued enough to pick up the most valuable skill.


Formal methods.

It took me nearly a decade of working in distributed systems to be introduced to TLA+ and other tools in this space. Until then my knowledge had been built from textbooks describing the fundamental data structures, algorithms, and protocols... but those texts take an informal approach with the mathematics involved. And since I was self-taught I was reading those texts with an eye for practical applications than for theoretical understanding. I had no idea that a tool existed that would let me specify a design or find potential flaws in systems and protocols, especially concurrent or parallel systems, with such ease.

I think type theory and category theory have also been great tools to have... but I think mathematics in general is probably the more useful tool. Being able to think abstractly about systems in a rigorous way has been the single-biggest booster for me as a practitioner.


> the single-biggest booster for me as a practitioner.

Can you instantiate this claim with an example? I'm somewhat knowledgable in both math and computer science theory but have yet to feel as though my math background has helped me in practical CS.


About 3-4 years ago I was working on an open source cloud platform for a company deploying a public cloud. There was a particular error that sometimes happened in our production environment where a rebooted VM would come up but couldn't connect to the network.

We tracked it down to a race between two services in the data plane. It turns out the VM controller wouldn't wait for the network controller to unplug a virtual interface before requesting the interface to be plugged back in. There was a lack of co-ordination happening. However it only happened when the network component was under heavy enough load that it would take too long to respond before the VM service finished rebooting the VM -- usually it was fast enough and the error wouldn't appear.

I managed to model this interaction at a high level in TLA+. From there I had suspected that the error was in the mutex locking code in the async library this system depended on so we refined the model pretty close to that implementation. As I recall we found that the mutex code wasn't the culprit -- a fine result. We ended up implementing some light-weight co-ordination mechanism to ensure that the VM service waits to acknowledge the progress of the network service.

Since then I've continued to use TLA+. I find that programming languages are insufficiently expressive enough to describe high-level interactions between other processes, network events, and humans.


I'm sorry but I don't see why the TLA+ modeling was necessary. You said that you noticed the lack of coordination before that. From your description, it seems like the mutex thing was a diversion. Anyway, a mutex would not necessarily be adequate for coordination (and many types of mutexes will not work between processes at all).

So to me it seems that you could have gone straight to the lightweight coordination mechanism without the TLA+ model. And anyway, if there was a problem with the mutex, you could test that theory by doing additional logging or an experiment around the mutex functionality.


Sorry for what? It was my first time using such a tool. I found it useful for understanding the system. I've improved my understanding of TLA+ since then and it has been valuable.


The premise that self-taught programmers necessarily have core holes in their knowledge and skills that would have been filled if they had a CS degree is entirely false.

Start with the example you gave of messaging middleware. There are many BS CS curricula that do not address this at all. Also he mentioned that he had already learned about named pipes on his own. For many applications, named pipes could be a perfectly valid alternative to some external message queue system.

Looking at the items submitted, the vast majority are core skills that would necessarily be picked up by people who need to work with them. The idea that someone would not know about Makefiles or debugging or profiling or SQL just because they were a hobbyist or self-taught is ludicrous. If you are serious about C programming, whether it's a job or your hobby, you are going to learn about Makefiles. Likewise anyone seriously working on a data-centric application is likely to become well-versed in some database technology, up until a few years ago that would have automatically been relational.

And one other thing. Some of the most important skills in programming are in the domain of software engineering. Software engineering is very poorly addressed by many BS CS programs. So again, whether they have good SE skills is often not going to be determined by whether they have a BS degree or not. It's not even necessarily determined by whether they are working in a professional environment. It's mainly going to be a factor of their motivation to self learn and above all practical experience.


My experience has been quite different. True that some of the most technically skilled programmers I know of had no degree, but the polished ones, the ones I find easier to work with, tend to have one. Further it's pretty easy for me to tell if a person's degree was a CS degree or not just by talking to someone about the problems they have and how such problems might be solved with code.

That's not to say it's required; some of the best professionals I know have non-CS degrees (one in fine art -- painting) or no degree. But if you're still young, I submit that a CS degree is totally worth your time.


The argument that I was making was clearly stated at the beginning of my comment. It is significantly different from the supposed argument that you seem to be refuting.

Notice how I did not say that "the developers that certain degree holders find to be most 'polished' and easiest to work with will on average not have a degree".

Notice how I also did not say that a CS degree was not worth people's time.


We all have holes in our knowledge.

We all, generally, specialize. I have never worked with 'big data', implemented any part of a commercial web site, or worked on HPC(just to give a few examples). What may be useful to one may be useless to another.

Definitely think about Software Engineering. Maintainability, debuggability, extensibility. Don't get lost in stupid details. Don't nitpick the coding style or try to optimize code that isn't currently meeting the requirements(if there are requirements, and if there aren't you should try to define them).


SQL (even if just SQLite) as databases open up a lot of power.

Vim or Emacs for powerful text editing.

A low level language. Sometimes Python doesn't cut it, or it is pretty suboptimal. If you're writing a trading bot, is speed of execution not important?

Operating System knowledge can be helpful at times. I bought one of the No Starch books "How Linux Works" and it is very helpful.

The command line and by that I guess you should know the common Linux commands (cat, grep, sort, uniq, head, tail, ls, top) if you use Linux and how to chain them together via pipes. To give some context, I can write one command which would require 8 lines of Python (saves you valuable time). If you use Windows, learn enough Powershell to be comfortable with it. On occasion I'll use Powershell over Python even though it is dirt slow for reading files.


I thought this course did a good job at getting SQL to stick in my brain, largely due to the relational algebra section. https://lagunita.stanford.edu/courses/DB/2014/SelfPaced/abou...


The W3Schools site is what helped me

https://www.w3schools.com/sql/

It has a database you can query and add to and what not.


Firmly agree -- I'm pretty sure I've said it elsewhere, this was a transformational course for me.

I got my start in product support where a lot of our problems could be solved in SQL. We were encouraged to learn it, but most people were happy with the most basic syntax for DML. I worked through the entirety of Dr. Widom's course and it gave me the fundamental data understanding to be a real value to the teams I've worked for, and I used that opportunity to transition into being a full time developer.


Value/message/actor/event oriented programming like you mentioned is useful for building distributed systems. I am a huge fan of going a step further and learning this model:

Imperative shell, functional core.

The external shell of code in a project is responsible for network connections, console IO, etc. But the internal guts of a program should be largely functional, that is, instead of mutating (changing) values, consider returning different forms of the same value.

Decisions (branches, logic) are made at one level, data dependencies at another.

The talk Boundaries by Gary Bernhardt describes this model in detail: https://www.destroyallsoftware.com/talks/boundaries


I'm also a fan of "imperative shell, functional core, imperative implementation of that core," where within a function that you promise to have no side effects or very specific side effects (say, a component's render function, or a transformation from one complex data representation to another), you should still feel free to use loops, imperative-style control flow, even network requests, etc. I find this puts folks used to "imperative everything" at ease, while still maintaining almost all of the benefits of functional-core provided that execution is scheduled properly.


It's really easy to do this with pure functional programming in impure languages like Scala. You can use arbitrary impure code within lazy IO values. Outside, the functional purity makes reasoning easy. If, as recommended, there is only a single point in your program where the IO values are actually executed, then the execution order can be reasoned about statically.


* Regular expressions. The most valuable "not a language" that I know. * SQL. For me, it was a trip coming from a procedural language background. I kept trying to figure out how to do loops. * Command line - DOS and unix * Batch languages for those. * HTML / CSS / Java (please don't kill me) Script and the DOM


Second regular expressions. I use it all the time to edit large bundles of code.


I'm surprised nobody's mentioned spreadsheets - specifically Google Sheets or something scriptable and hosted. Recently I've built up a small system which sucks in data from a few places (fitness, task management, calendar,..) and analyses it against several goals I've set. This means I can see how I'm progressing towards what I want without even touching it anymore.

They're a really nice UI for bootstraping projects, and even some small databases. A current project of mine for a client uses sheets as a backing store for an email collection list. Since there's only a few hundred rows, it makes sense at this scale and works really well for non-technical users.


Similarly Microsoft Access. I learned Access for my first job (100% of my job was moving tasks into Access, 3 years later I helped some interns transition to a SQL server+web application). 10 years later I'm still using Access for some aspects of my new job. The ability to quickly prototype something and receive almost immediate value is vastly underestimated (I know of one company which runs all of their tasks out of Access).


I suppose what I've described is just a little more of an unstructured version of this. Mixing data and computation in such a way is definitely a tech debt tradeoff though.


Excel & Google Sheets are probably the best enterprise software development platform ever made.


Your IDE, its refactoring tools, but especially its debugger.

VSCode or PyCharm (assuming you are still a Python developer) could be a good place to start. I'm always surprised when I see professional developers coding in Sublime Text and debugging with print statements (or their equivalent). Usually you have better options than that, especially for statically-typed languages - but even for JS and Python.


I don't know how people could work on large scale projects without a step through debugger.


Honestly I still find a lot of engineers don't know git properly. Like they know enough to commit and push but that's about it. It really helps to understand everything git has to offer.


Absolutely. When I get people into learning Git, I start with the obvious (checkout, commit) move to the essential (branches, forks) and then usually say "When you're comfortable with that, start playing around with rebase. When you get into trouble with that, come talk to me again, because that's the key learning moment."

I've learned more messing up when using fancy git rebase stuff than any tutorial.


O'Reilly have a free book on Git that is amazing. I find this to be the perfect level of detail. Easy to enough to read in a short enough time, detailed enough to grasp the magic under the hood.

https://www.oreilly.com/library/view/git-pocket-guide/978144...


Yes, and in the vein of this git became way less scary after learning how it works this course helped a lot (unfortunately paid but a lot of businesses have a licence - https://www.pluralsight.com/courses/how-git-works).

I also use magit (https://magit.vc/) - inside spacemacs (http://spacemacs.org/)


Ha, I just did a talk and asked how many people knew about git bisect. No one.

I found this zine (not free) to be helpful:

http://ohshitgit.com/


The three most important basic git operations to know (in my opinion)

    git checkout -b
    git log
    git rebase -i


I strongly prefer git merge over git rebase.

Using rebase results in a cleaner history and simplified workflow in many cases. However it also means that when you have a disaster, it can be truly unrecoverable. I hope you have an old backup because you told your source control system to scramble its history, and you don't have any good way to back it out later.

For those who don't know what I mean, the funny commit ids that git produces are a hash signifying the current state of the repository AND the complete history of how you got there. Every time you rebase you take the other repository, and its history, and then replay your new commits as happening now, one after the other. Now suppose that you rebased off of a repository. Then the repository is rebased by someone else. Now there is no way to merge your code back except to --force it. And that means that if your codebase is messed up, you're now screwed up with history screwed up and no good way to sort it out.

That result is impossible if you're using a merge based result. The cost is, though, that the history is accurately complicated. And the existence of a complex history is a huge problem for useful tools like git bisect.


I don't know. Since git uses content-based addressing, you can't actually alter any commit, only create new ones. And orphaned commits don't get garbage collected for like 30 days even if you explicitly tell git to clean things up. So, the original stuff will still be there. It might just not be obvious how to access it. Part of the commit is the reference to zero or more parent commit object ids. So, if you find the old commit, it still has it's history intact. `git log -g` is a handy command to see a composite git log that travels across branch changes.

I do get what you mean though. You effectively create new commits with an alternate view of history. I don't get quite why/how that causes a situation in which the code can't be merged? I don't rebase much, I prefer merging. Is there any resource that can explain why rebasing might be dangerous like that?

In general if branches diverge too far then you have difficulties merging no matter which strategy you use and sometimes if it diverges too far it just becomes hopeless. Mostly though if you are working in a team, commit daily and merging/rebasing frequently it should present fairly few problems.

I find I never run the actual command git bisect. I just do `git log --decorate --oneline --graph` and eyeball a good commit to start from and then basically do it by hand using commit messages to aid in making reasonable guesses as to where to try but following the basic binary search philosophy. Works well enough even with a complex history.


Here is an example of how to create a problem.

You rebase your private branch off of a shared master and pull in other people's commits. Someone else pushes out their rebased version using force. More commits are made on top of the other people's commits, including reversing some bad commits. You try to rebase off of the shared master.

In your last rebase, you are trying to replay all of the commits in your history that are not in the remote history. However git does not understand which local commits are from you and not pulled in on the previous rebase. It therefore tries to play them on top of the remote master if it can make sense of them. Which means that you bring back the reversed commits. You might find conflicts in code that you have not touched. You resolve them as best you may. And now you've got the definitive version of what happens, and no way with the screwed up history to figure out why it is going to go wrong. Then you force commit because that is how a rebase flow works..and everyone is screwed.

I agree on branches diverging too far. Merge early, merge often.

If you never run the command to git bisect, you should try it. What it's for is finding the random commit that recently broke a piece of functionality that nobody realized would break. Because nobody realized it, the log messages will say nothing useful. And you don't need to figure out where the change is - just write a test program for the breakage, run git bisect, and look at the offending commit.


> Then you force commit because that is how a rebase flow works

Absolutely not. Force pushing a shared master is probably the worst sin one can commit with git. I guess you already have come upon the 'why' of it.

A "Rebase workflow" works so that devs use rebase to 'move' their work on an updated master after a pull/remote update, resolve potential conflicts locally, and do a fast-forward push to origin/master. This also works on copying work between different feature branches just as well.


Ah OK. Hmm my policy is to always disable force push on master. Force pushing to master should never be allowed.


> Since git uses content-based addressing, you can't actually alter any commit, only create new ones. And orphaned commits don't get garbage collected for like 30 days even if you explicitly tell git to clean things up. So, the original stuff will still be there. It might just not be obvious how to access it.

SmartGit does a splendid job with this. In the Log window where you see all your commits and how they are related to each other, there is a Recyclable Commits checkbox. Turn that on and everything in the reflog shows up in the log tree, just as if it were any ordinary commit. You can right click one of these commits and add a branch there, or do any of the other operations you can do in the commit log window.

Same thing for stashes. Did you know they are just commits too? I didn't, until I clicked one of the stash checkboxes in SmartGit. Then the stash showed up in the tree just like any other commit.

I don't understand why so many developers are resistant to the idea of using a powerful Git GUI like SmartGit. For me it is like having a superpower compared to the meager options the Git command line gives you.

Even if you like the command line, it's not like you have to choose one or the other. You can use SmartGit and the Git command line, whichever makes any task easier for you.



I work on a small open source project. Our master branch is protected from force pushes, and I only use rebase on personal branches for features or bug fixes. It's very nice if I'm planning to merge several commits but I need to fix something in one or more of them.

I agree that rebasing should not happen in the main branch(es) of a repo.


I'll have to add `git reflog` for instances where you've felt that you've completely screwed up something as one can always move back to a previous state. I think this is essential.

A useful one that I'll add is what I call the sword command: `git log -S<word>` This one allows one to list commits that contain a particular change. This has been useful in tracking down old changes


I know you're probably just trolling but I'll take the bait — I'd put all of these six before any of those three:

    git pull
    git clone
    git status
    git commit -a
    git push
    git diff
I mean, the ones you mentioned are pretty useful, but if you don't have a repo in the first place, even git-log isn't going to be very useful; and if you're branching and rebasing, you probably have to commit first.

(I actually prefer Magit, to the point where I sometimes run Emacs just for Magit when I'm using a different IDE.)


I'm not trolling, and to assume that I am doesn't exactly "assume good faith." [0]

The comment I replied to says:

> Like they know enough to commit and push but that's about it.

In that vein, I was suggesting what I consider to be the most basic git commands outside of the "clone, commit, push" workflow.

[0]: https://news.ycombinator.com/newsguidelines.html


Oh, I didn't know you meant to imply that qualifier; I was responding only to what you said, which was absurd, not what you meant, which I now see was reasonable.


Things that really helped me understand git as a tree of references.

Git reset —soft and git reset —hard

Git cherry-pick

Git rebase

Git reflog

Git stash

Understanding this makes you able to manipulate branches and commits like a wizard. Once I learnt those, I can get myself out of the hairiest git problems.


I also find git blame extremely useful, along with code exploration tools like DeepGit [0]

[0] https://www.syntevo.com/deepgit/


`git add -p` changed the way I think about commits


I love that one too! It's great for creating commits that address one specific thing.


When I went from amateur Python programmer to Google Cloud developer support, I remember being completely blown away by technology and design patterns I've never heard of that goes into modern web/enterprise architecture. I had to learn it all the hard way, but these days there are great free (or mostly free) courses you can take to learn this stuff.

For Google, check out the study guides for their certifications, specifically Cloud Architect (basic overview), Cloud Developer, and Data Engineer.

https://cloud.google.com/certification/

Be sure to follow along with the recommended Coursera and Qwiklabs tutorials and do the exercises. You'll learn about all kinds of neat stuff, like scalable application design, container technology, monitoring+metrics, various types of database technologies, data pipelines (including Pub Sub messaging), SRE best practices, networking+security, and machine learning.

I currently work on AWS, and don't find it a good starting point for diving in to these things quickly, but most companies use it so it wouldn't hurt to learn I guess. I still recommend GCP over AWS to start with, as their technology is far more interesting and focused, and quicker/easier to work with.


A new version of "The Pragmatic Programmer" recently came out. [EDIT: not available yet, only preorder at amazon, beta version available at pragprog.com.] That book is all about tools and methods that a self-taught programmer should look into:

https://www.amazon.com/Pragmatic-Programmer-journey-mastery-...


For me, that Amazon page is listing it as a pre-order, without any release date. And all the other versions (Kindle, Paperback) are the 1st edition instead of the 2nd.

Very frustrating, as I considered the first edition to be essential and upon reading your comment, instantly went to purchase the 2nd edition.

Edited to add: Found a date, Amazon is listing it as October 21, 2019.


Sorry, I thought I'd read a review of it already, so just didn't look closely to see it wasn't available yet.

It looks like you can get a DRM-free beta version of the ebook on their website, with free upgrades to published version once it's finalized: https://pragprog.com/book/tpp20/the-pragmatic-programmer-20t...


Learn Emacs. Stick with it to get comfortable. Text editing is one transferable skill that you can reuse again and reuse in different projects and on different languages.

Org-mode in Emacs. My poor man's project management for a number of projects consists of a todo-<project>.org file listing all the planned features, the pending TODO items, the DONE items for the current working release, and release notes describing features and changes for each of the released version. In one place, I have the future features, immediate todo items, current completed item, and the history of all past releases in an org file, making things simple to access and manage.

For theoretical stuff, learn about transaction.


Satisfiability Modulo Theory (SMT) and a nice friendly implementation like Z3. When you have a difficult algorithmic problem and can't be bothered to dream up a good solution, why not try chucking it into a solver and see if a computer can solve it for you? Surprising what it can find - not just Sudoku puzzles and "Mr Green lives next door to Mr Black, the baker lives across the street to the butcher, ... " type stuff either.


Here’s a great talk by Raymond Hettinger on those things and more: https://m.youtube.com/watch?v=_GP9OpZPUYc


Develop a project that has two local processes that have to communicate, one local database, and one remote service that the two processes have to communicate with.

Doesn't matter what the project is. Could just be tossing around a string of data that eventually gets dumped into the db and sent to the server.

Research different methods of architecting such a system. Code up a few.

By doing this, you will actually find the answer to the question you posed. And, arguably, have fun doing it.


I love it. The best way to learn is by doing, and the suggested project idea is simple enough as a starting point, and complex enough to require deep thinking and exploration in various aspects to figure out a good solution.

Many of the recommendations in other comments could tie into architecting such a study project too: data structures, algorithms, communication among distributed processes..


Excel. Seriously. For more things than you imagine

Let's say you want to make a property assignment for some class;

this.a=x.a this.b=x.b ...

While you probably would want to do this some other way to start with, and while you of course can solve it using some emacs wizardry, I can whip that up in Excel using some formulas in a matter of a minute. Moreover, I can keep adding to it when I realize something was missing.

I can make a diff, join or union of the results of two queries from different databases (or even database engines). Not to mention calculation and design mockups.


Alternatively, use Jupyter Notebooks & Pandas for this type of work. It has the same interactivity & visualization as Excel, but is far more powerful and you can reasonably move the code you create into your final product, rather than rewriting Excel formulas into a proper programming language.

I've heard that Airtable occupies a similar space, but I haven't used it enough to recommend it.


I'm not following. What good is it to have equivalent of "this.a=x.a this.b=x.b ..." in Excel, when what I need is to have this in code in my editor? Are you saying you use Excel to create the code, then copy/paste it into your editor? Or what?


I think the parent's using Excel essentially to do repeated template expansion: e.g. for a given set of member variable names ([a, b, c...]), give me the assignment statements I'd need to use those in a constructor.

Which I could do pretty trivially in Excel... but could also do trivially in about two lines of Python:

    vars = ["a", "b", "c"]
    statements = ["this.{0} = x.{0}".format(var) for var in vars]
    print(statements)
Spreadsheets can be an incredible tool-- as an interactive environment that allows non-programmers to express domain knowledge and quickly automate parts of their workflow, they're really unparalleled. But this isn't a very good example of something that Excel is particularly well-suited for.


Yeah you got my example.

But regarding your conclusion; my entire point is that it IS more useful than you might think. Disclaimer: I often use emacs e g regex-replace and other means of code generation as well, both a couple of internal dsls that I've built and directly one-off by programming (normally clojure).

To just expand on the code generation part: The fact that it is a large "cell based" structure means that I can move stuff around manually, quickly see the entire new structure, make overrides and so on. If I have some class that's supposed to match a csv with 80 fields, it's really hard to view that in any good way in source code.

I won't argue this with anyone, I realize many people vehemently hate spreadsheets, and Excel in particular, and disagrees with anything positive said about them. I'll just end with two final points.

1. The spreadsheet is a general purpose functional programming language with very large adoption and pretty much unseen "code editing" tools (outside of emacs), such as duplicate removal, sorting, user editable conditional formatting, and so on.

2. This was a tips on a tool to have in one's arsenal that is often overlooked, I wasn't looking for a debate.


Those can often be done in a decent text editor with regexp-like search and replace.


Took me 21 seconds with Emacs keyboard macros; no search and replace involved.


The next obvious step after message queues and distributing work would be streams.

Here is an excellent introduction to unified logs and stream processing by the author of Kafka at LinkedIn:

https://engineering.linkedin.com/distributed-systems/log-wha...


Here is a list of technologies/tools that I have found quite enabling (in addition to what has already been covered):

1. Earlier use of VMs and now Docker containers -- It was around 2004 that I got introduced to VMware tools to create, configure and run VMs. Life was never the same after that. No more fretting about installing pre-beta software for the fear of hosing my perfectly working system. There was a time when I had multiple servers running VMWare hypervisor, each running multiple systems. Then around 2014 I switched to using Docker containers for similar purposes and haven't had a need to use VMs.

2. Jupyter notebooks/pandas/plotly -- Can't imagine how one can explore data without this.

3. SQLite -- Perfect for writing unit tests for code that deals with SQL databases.


If you're a self taught hobbyist you may not have had much structured exposure to fundamental data structures and algorithms and complexity analysis. I think that type of thing is easier to learn when you already have some experience so you can relate it back to real world problems you have encountered as you describe doing here. Now might be a good point in your development to dig into some of those fundamentals if you have not done so much in the past.


If you're strictly asking about things a self-taught hobbyist programmer may have missed, then I second the suggestions here to skim through a CS degree curriculum and dig into anything that's unfamiliar. As one possible example, maybe you're solving a problem trying to parse some text, and you're in over your head with ad-hoc regexes and conditionals and type casts and exceptions all over the place. If you've seen the concept of a grammar (likely to come up in CS programs, though not all) and of generating parsers / validators for them, you can eliminate a whole lot of programming by specifying a grammar in some common format (for instance, ABNF) and running it through a program that generates a parsing program for you. The general category of programs writing programs is worthwhile to look into, and playing with languages where such a feature is first-class (like Common Lisp, which apart from having macros also has a compile-file function you can invoke at runtime) can be enlightening.

A lot of the comments here are highlighting things that not even most CS degree holders will know about, nor many professional programmers. Those can be useful for a hobbyist too, so maybe the lesson is that regardless of your current level of knowledge it's important to keep in mind: "There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy."

Your comment about feeling comfortable in the practical details reminded me of how I used to feel as a fresh self-taught PHP-wielding teenager after a few years of it. As an "expert" PHP coder, I could do anything! Even an OS if I wanted! Well I learned better eventually. :)


"If you've seen the concept of a grammar (likely to come up in CS programs, though not all)"

Really? That was one of the very first topics in my first semester. How do you teach CS without grammars and the Chomsky hierarchy?


I wondered if someone would ask about that... It's just my own recollection from years ago when I looked at a bunch of CS curricula from public and private schools. There was a lot of patchy variation. If the topics were covered at all they'd typically be done in optional electives like a compilers course, or "theory of computation" course, taken in junior or senior years. Also depending on the school such an elective might not actually be available anymore (I looked at current-semester/last-semester offerings to try and identify those) and is only part of the catalog for historical reasons. Maybe it was offered once, but not anymore, in part because students having the choice would rather take the new data science course or Advanced Networking or whatever to fulfill that elective instead.

I do think the problem is mostly a function of having lots of choice. Everyone's got their own list of what a CS degree absolutely must cover, but there are going to be differences, and schools are incentivized to let students carve their own path. Let's look at this list: http://matt.might.net/articles/what-cs-majors-should-know/ I've looked at a lot of intern resumes over the past few years and broadly they've been impressive from a hire-to-BigCo perspective but I would be surprised if many of them knew much if anything about a large majority of topics from that list. Even narrowing to the "better" schools. I also don't even agree with that list, I just don't think it's realistic to cram all that into a 4 year program on top of all the other STEM courses, humanities courses, and project courses...

Going back further there's the continuing problem of "dumbing down". It ties into a school's incentive to offer more choices to fulfill the graduation requirements (creating an "easy" path), but as a problem in itself it's worth considering. https://www.joelonsoftware.com/2005/12/29/the-perils-of-java... is a classic rant about it, not a very good one but it points at one consequence of the problem. I would bet there's still a sizable bunch of current-year CS graduates who never had to deal with pointers. Maybe I'll do some data exploring some time with the resumes I still have to see what fraction put C or C++ somewhere to make a weak proxy measure.


"they'd typically be done in optional electives like a compilers course, or "theory of computation" course"

Okay, maybe it's because here in Germany, we have a separate line of eduction for "mere" programmers and system administrators as well as a separate branch of higher education for more "practical" skills. But if you study CS at an university, both of these course would be mandatory, together with a lot of math and some physics (to know what actually happens in a circuit).


The best tools you can teach yourself are arguably more fundamental than any specific language or library, but instead computer science problem solving techniques. I've written books on this subject specifically targeted towards self-taught programmers: https://classicproblems.com/

Sorry for the self-plug, but it's super relevant I think.


I used to read High Scalability [1] which has write ups of real-life architectures of lots of the largest web sites. I found it great for learning about different architectures. I'd definintely recommend it as a way to find out about thing like messaging queues, asynchronous vs synchronous processing, etc.

On a similar note but more theoretical, my life changed when I learned about design patterns. Martin Fowler's books/site are a good place to start [2].

[1] http://highscalability.com

[2] https://www.martinfowler.com/eaaCatalog/index.html


https://ocw.mit.edu/courses/electrical-engineering-and-compu...

In my experience, most hobbyist programmers are fine writing scripts and other small programs but have no exposure to the sort of ideas that are necessary for making large programs. This course at least exposes you to a lot of those concepts.


The latest course is available here: http://web.mit.edu/6.031/www/sp19/


State machines. They can be a helpful abstraction for UI or business logic, and they also make regular expressions make a lot more sense.


Yeah, far too often people create horrific spaghetti code of if/else statements which could be neatly coded as a state machine. This is a very important concept to grasp.


Learn to use a profiler for your programming language. It will tell you what’s worth optimizing in your programs. I.e. if your program spends only 10% of the time talking to others than that may be the maximum you can optimize by choosing a fancy communication pattern.


Some simple things with good ROI:

1. Ergonomics tooling: Set up your screen at an appropriate height and your keyboard at an appropriate distance, so that you aren't in pain. If you use a laptop often, try setting it on a shoebox. Use a work timer to take breaks(I've been using Workrave on its default settings lately, which is quite harsh but has worked for my productivity).

2. Diary keeping: Note what you did, what you plan to do, and the date and time. Note as often as you feel necessary. Record notes both in source comments or in general purpose diaries. Use the date to eliminate or rewrite notes that are stale.

3. File management, process management and editing. Take a little time to learn things about your operating system and editor. You don't have to master it, or do elaborate configuration(in fact, having a complex config makes it hard to transfer to other environments). What you want are little things like knowing a few handy shortcut keys or a few built-in tools.

4. Gaining familiarity with writing simple "skeleton" or mock-up code, and waiting patiently for it to mature. When first building a system it may be tempting to apply the biggest algorithmic hammer or design pattern you know of. This is the kind of trick you are learning in learning about message queues. But the most likely outcome of trying to use a trick, no matter how well intentioned, is that this will get you a wrong result more slowly, because the full shape of any problem tends to come into view slowly, progressing from blurry and unclear with a rapidly changing design into something sharp, with well-defined boundaries. As such every new system demands a beginner's mindset, and some ability to refuse engineering in-depth until you absolutely must do so to progress. Done properly, you build a system that leverages existing tools well, has some kind of value now(even if it's limited or lacking a critical feature), and then can harvest the learned lessons into a more complete form later. Trying to get all the features into one pass creates a messy soup: iterating on a subset of them naturally leads towards development of tricks like the message queuing architecture, without any prompting.

(edit)

5. Read c2 wiki when bored - it covers a lot of recurring discussions in programming: http://wiki.c2.com If you know the stuff in it you'll be reasonably prepared to think about any unusual ideas and compare them with existing examples.


Godbolt Compiler Explorer: https://godbolt.org/

DTrace: http://dtrace.org/


'The Imposters Handbook' is good for foundational knowledge, it was written for exactly your use case.

https://bigmachine.io/products/the-imposters-handbook/


Came here to recommend this too. It’s goal is very much aligned with the OP.


A few decades ago, a new programming environment exploded: the web. Looking for a ridiculously useful tech stack? Look no further: HTML, HTTP architecture, SQL backend... a guide written at the time:

http://philip.greenspun.com/panda/

Paul Graham also wrote about why the web was such a deal, IIRC in the Beating the Averages essay. In particular: you can use whatever tools you want and avoid deploying to client machines.


PostgreSQL + PostgREST + react-admin == fantastic stack.

You can write an entire application in SQL and PgPlSQL but using an HTTP JSON API as the interface with a static and responsive BUI.

This allows you to be extremely agile in development and ops (because, e.g., you get to use logical replication).

I can't say enough good things about this approach.


This reminds me of how we used to create thick client desktop software, connecting directly to the database. Doing most of our business logic on the front end and leaning on the database for auth, relational constraint enforcement and (via stored procedures) transactional consistency and validation. It felt weird at the time when the whole world seemed to move to a three layer model with app servers sitting in the middle - now it's hard to imagine it any other way.


indeed if your app is about managing your personal dvd collection, or variants thereof, such anemic UI-to-database tools work very well. Not when there is complicated domain logic involved.


Having built such an application (w/ complex business logic), I have to disagree. But I'd like to hear what problems you've run into.

I would agree that react-admin is a bit too simple, and this stack really calls for a BUI to be built specifically to fit the PostgREST model.


I would recommend learning the SOLID [1] design principles. I've found them to be a very helpful guide when designing software components.

[1] https://en.wikipedia.org/wiki/SOLID


Particularly the S — the Single Responsibility Principal. So much messy, convoluted code is convoluted because it lacks a singular, clear purpose, and bundles up multiple responsibilities into one section of code, be that a module, class, or whatever is appropriate to your language.

They're all good, and you'll get good insight from them all, but I think that first one is more important and has provided me more value than all the rest. I think Liskov substitutability would be my second pick.


Agree completely.


Not nearly as big a deal as some of the other tools and techniques being mentioned, but tmux/screen are lifesavers when you need them.


> Googling, I encountered something I hadn't heard of called a message queue.

Former high frequency trader here. Messaging middleware is a god-send for distributed systems. It used to be quite commonplace in the late 1990s/early 2000s for creating trading systems.

By mid-to-late 2000s - and especially post GFC (great financial crisis), volatility lowered, technology improved and middleware systems were never again put in the critical path.

The important measure here is "tick-to-trade" - from the moment a market-data message comes off the wire to the moment you send an action to buy/sell/cancel back on the wire. A middleware system just slows this down considerably. As a result, you want "tick-to-trade" to be in the same process, preferably single-threaded and boost your thread affinity to minimize context switching.

To answer your actual question: I would say learn about the concepts that the big cloud providers are pushing. If you open the AWS app drop down - there are dozens of concepts that have been encapsulated in managed or serverless frameworks. They are all worth learning IMHO - as they define the current and next generation of computing.


Message queues are really cool, but to paraphrase an old joke, "Some people see a distributed systems problem and think, "I know, I'll use a message queue." Now they have otw problems.


Don't go too overboard with message queues. There's nonzero development and operational overhead incurred when part of your application takes its input in a weird binary format, and when the data in your queue is thrown away after processing, and when you need to think about scaling of workers and concurrency. If you're not working with real "big data" – and, let's be honest, almost nobody is – I would advise using an HTTP-based service (REST, SOAP, whatever, take your pick) for communication and a SQL/NoSQL/NewSQL database for state.


ZeroMQ is not really a message queue, it's more of a networking library. It takes TCP sockets and adds other concepts on top, like request/reply or publish/subscribe.


Doing a task queue in an RDBMS is not as simple as some might think, though. https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...


I don’t think there’re many ridiculously useful things that are universally useful.

If some stuff is domain specific or have much theory involved doesn’t mean it’s not considered a fundamental concept in some areas of work.

If some other stuff was very useful for you doesn’t mean it’s universally applicable. For example, if you had worked on a high-frequency trading bot, I don’t think you would have used neither Python, nor ZMQ or other general-purpose messaging middleware, nor even OS-provided TCP/IP stack — they all cause too much latency.

I’ve been programming for living since 2000, worked in a lot of different stuff from web dev and enterprise to videogames, embedded, robotics and GPGPU. Yet I can name many huge areas which I hadn’t seen close enough, or at all, along with libraries and tools used by people working there.

Every time I start working in a new area, or when I resume working in an area after a long (years) pause, I read a lot of relevant stuff. Continuous learning is the key to stay good, IMO.


Learning the basics about how programming languages work - parsers, interpreters/compilers. I've heard good things about Writing An Interpreter In Go [1]. Related, I've enjoyed Martin Fowler's DSL book [2].

  - [1] https://interpreterbook.com
  - [2] https://www.martinfowler.com/books/dsl.html


Just noting that the URLs weren't formatted as links but code


More important than any specific technique or tool is that you get an experienced mentor to look at your work once in a while and point out the obvious things relevant to the specific project that you don't know you don't know. Seriously, you'll probably save a few months of work in the first ten minutes of talking to a senior developer about your hobby project.

Talk to someone with many years of experience, though, otherwise you'll most likely get sent on a quest for beauty of implementation that has nothing to do with the goals you're trying to achieve. Sadly, that is a lesson that takes a long time to internalize.


(Learning all the things recommended in this thread would take a few years, and maybe half of them would be useful given you know them, a lot fewer useful enough to justify the price of learning them)


Get good at math. It'll serve you well and never go out of style.


Math is a bit too broad. From personal experience, the most relevant topics for programmers are Linear Algebra and Discrete and Combinatorial Algebra.


Eh I mean, I think there's a certain discipline that comes with studying any branch of math to a certain degree of rigor. But yes, linear algebra and discrete math are probably the most useful. I think control theory and optimization are under appreciated amongst programmers though.


I'd recommend statistics specifically. Comes up everywhere and it's easy to be wrong about if you don't dig into it.


Second this, if you only work through the Khan Academy stuff on the various things like scatter plots, standard deviation, the normal distribution, etc. you'll be much better off for it.


One minute ago I discovered that HN comment threads are collapsible by tapping/clicking the "[-]" on the right side of a comment header. I think that could fall under the class of non-obvious tools (maybe not for a hobbyist, but as a long time HN reader I'm shocked I never noticed it before)


At some point, it helps to broaden your programming language exposure, even if you stick to mostly one language for most of your work. You'll find ways to apply ideas from other languages/communities to your work.

Try to spend some time learning idiomatic programming from one of the Lisp family (Scheme, Racket, CL, Emacs Lisp, and Clojure all have different thinking, but a lot of overlap). Play a bit with Smalltalk or a similar descendant, even if you're already doing OOP elsewhere. At some point you should learn a textual expansion language, like one of the Unix shell scripting ones, or Tcl (and learning basic Bash scripting will probably be useful in tech work). Try a logic programming language, like Prolog, or one that's a minilanguage within another, like Mini-Kanren. Maybe buckle down for hardcore functional programming (e.g., Haskell, OCaml, or discipline yourself to do it in a Lisp?). You should also get comfortable with C or at least an assembly language at some point, to have a better idea of what other languages are and aren't giving you, and also C is just a really useful thing to know when you need to write a little fast code, FFI to a native library, or get into languages/IRs for newer target architectures.

(Disclosure: I've been especially involved with Racket, an energetic close descendant of Scheme, and have some interest in promoting it, but I'd list a Lisp as one of the first in any case.)


Racket is a very good Lisp for beginners, DrRacket makes it easy to get up and running with little effort.


Agreed. Racket is from a particular school of thought, and you won't get all Lisp family ideas from it, but it's great.

You can start up DrRacket (a simple IDE for students that can also be used for professional work, and has some powerful features in it), or just use your favorite editor and the `racket` command-line program and REPL. There's way too much documentation at: https://docs.racket-lang.org/

You can also do the old MIT introductory CS textbook, SICP, using Racket.


> You can also do the old MIT introductory CS textbook, SICP, using Racket.

    #lang sicp
https://github.com/sicp-lang/sicp


I'm not sure if 'hoarding' such knowledge would be practical.

Sometimes, having a limited toolset would better focus you on the problem at hand. Then, once the challenge is clarified, the search for alternative ways to architecture and implement it would become practical.

If just for fun of exploring something new, pick whatever interests you in general, language, framework, a domain. Async processing and idioms?


For me a lot of programming involves some amount of data wrangling. E.g., getting input data ready. Or generating and understanding results of some technical experiment. I recently came across VisiData and adore it. It has a steep learning curve, but I've found it very much worth it: http://visidata.org/


Not one in particular besides the obvious. If it is really fundamental, it is not overlooked.

Every programmer has their favorite tools. Some will use debuggers, others will prefer logging. Some will use class diagrams, others will grep. Some will use IDEs and GUIs, others will use text editors and shells.

Same thing for programming languages, techniques, libraries, frameworks, etc...

So you are very likely to get a lot of answers. It seems that you got an epiphany when you learned about message passing. All programmers had similar experience as they discovered the one thing they really needed.

In reality it depends on the project. Great programmers simply have a lot of experience with many, many things.

My suggestion: continue what you are doing.

Try new things, don't blindly follow other people lists. In the process that led you towards that messaging library, you learned named pipes, unix sockets and shared memories, all useful, and maybe they will be essential to your next project. Had someone else served you that library on a platter, you would have lacked that insight.


Heh, funny that you mention that ZMQ is overlooked. While often though of a message passing library or serverless queue, Zero MQ has some pretty severe limitations that implementers fail to consider.

To be clear, I think it's an amazing library which is unmatched in its performance, but it comes at a cost: reduced reliability.

ZeroMQ will drop messages in a number of situations. The library does not handle delivery guarantees which means that the application must do it themselves. Whether or not this works for you is an application level concern. However, having used it at two companies now: both times it ended up being thrown out for a more reliable queue (kafka).

http://zguide.zeromq.org/py:all#Missing-Message-Problem-Solv...

So maybe the reason you haven't stumbled upon it sooner is because it's overhyped? Definitely useful but with a grain of salt.


The "MQ" part of the name is unfortunate, but apparently came about because of the original idea to come up with a "better" implementation of AMQP (http://zeromq.org/docs:welcome-from-amqp).

But you're right -- ZeroMQ doesn't do queueing (except in some very limited circumstances), and if you need reliable delivery you must implement that yourself "on top of" ZeroMQ. I've done that, and while it's not a simple task, it is certainly possible.

You can get reliable delivery "out of the box" with other software, that in fact does do queueing. (kafka may be one, but I don't know enough to say).

But what you give up when you do that is performance -- ZeroMQ can easily be orders of magnitude faster than those other solutions, and for some applications (e.g., real-time market data) the work to provide a custom reliability solution on top of ZeroMQ is worthwhile.


Yeah, every new user has to learn that ZMQ is not a proper queue, but simply a network library and protocol. How is it even possible to bungle the name so badly and then do nothing about it for twelve years?


An IT automation framework like Ansible (my favorite), Chef, Puppet, Salt. Highly recommended if your work includes doing repetitive remote server tasks or any sort of infrastructure / devops stuff.


At the job I'm at, I've picked up three tools either for the first time, or in a very new way:

1. Makefiles. See @aequitas' comment for more.

2. Terraform. Seriously, just using this tool taught me [a lot of] devops. It's fantastic!

3. Docker (as a tool!)

I'm going to go into the third one a bit - I feel like Docker is mostly thought of as useful for deploying things to the 'net (kubernetes, ECS, etc), but I think it's also amazing for local development and build pipelines. I actually have no idea one way or the other how much other people use it this way as well, so maybe it's just me that's finding it unexpectedly awesome for this.

Put together the right CLI command + Dockerfile, and you can hand someone a repo and they can launch a complete, reliable development environment in a single command without any other system prep. No more worrying about which dependencies need to be installed in what way; it's like `rvm` + `bundle exec` but for EVERYTHING. No more dealing with whatever custom system modifications someone has going on. `git clone`, `make dev` move on with life.

And then you can also have Dockerfiles that are specifically for producing your build artifacts, and then completely ignore the container for execution. This is how I'm using both AWS EMR and Lambda.


I'm also occasionally using Docker t generate build artifacts (so +1 for that) - how do you pull the built blob out of the image? I've used `docker exec` plus `docker cp`, but it feels a little clunky.


Depends on the nature of the build artefacts? Gems, jars and NPM packages get posted to a jFrog artefactory. Go and Cocoapods just use straight Git, so Jenkins actually makes commits for those (and in the case of Go, version tags) from inside the Docker container. For the EMR clusters, there's a step in the build script that uploads the files to S3.

Another way to do it would be to have the `docker run` / `docker build` script be given a mounted volume that maps to the system's, and then it would write the blobs to that volume, where (I think?) they'd then be accessible to the outer system.


You can just mount a volume and write the blob into it, very easy and convinient!


Ooh, right, you can mount in `docker build`? That'd solve it, thanks!


You mount it when you run it using -v hostfolder:containerfolder

I recommend you though to use Docker-compose, somehow everything becomes much saner than using the normal "docker" executable



Event-driven, asynchronous, programming

Here's a valuable addition to your toolkit of mental models for programming: Event-driven, asynchronous, programming (in the style of ES6 Javascript or a similar language. )

Some suggestions about how to learn the basics? tutorials on...

-- building a so-called "single page web app" with a framework like vue.js or even jQuery.

-- node.js to build a complex back-end server without using a threading model)

-- React (to build an interactive program to run in a browser)


Tools that have changed how I think about things, and yes some are hyped things you have probably heard of. But the opposite effect of people ignoring it because it's too hyped might come into play:

Docker - figuring out what you really need to get something running and having a reproducible way of doing it. I did a blog post on it: https://stackabuse.com/how-docker-can-make-your-life-easier-...

Node JS - much derided, but nothing compares in terms of the speed you can quickly hack up a tool in Node JS and the ecosystem you have access to. I don't use it as production server, but just as a quick way to hack up tools.

Pandoc - being using this tool to convert markdown to PDF, and it does a nice job. Uses Latex & friends so you can find a nice template somewhere and base off that.

Markdown - I love using markdown formats. Especially with the tooling, things like Pandoc and Github are enough to justify learning a bit of MD.

Touchtyping - Makes typing a bit nicer and a bit faster. I could type without looking at the keyboard before, but now I don't lose my hands, just feel the bumps!


I’m not sure if I count as self taught or not since I first learned programming outside of formal education, only later getting a degree in IT (for which I really didn’t learn anything new except graph theory). But we never talked about queues in school, in fact I only learned about them when I started working professionally. If you want to be exposed to all kinds of interesting problems, I would suggest working at some mid-stage startup (i.e. around the B or C rounds).

They’ll be starting to get in good people who are fixing up the mess the early employees made* and can help you learn why certain patterns are anti-patterns and how to fix them. At this stage they’re still small enough that if you pay attention, you can just learn by paying attention to what everyone else is working on.

* Before you downvote me I’ve contracted for pre-seed startups, am currently working at a seed stage startup and have been at companies all the way from a series A to a series E. So yeah I’ve been the one making a mess (because of the whole minimum part of MVP) and cleaning up said mess.


Tools you can use to recover from the loss or destruction of your laptop/desktop as fast as possible. Or having to hand over your password to the authorities. This involves compressing all of the state and data and removing and/or sending it somewhere, and then being able to recover it quickly. And it must be simple, robust and automated.


This was recommended in a blog post I read. I haven't read it myself but the table of content looks promising

https://www.educative.io/collection/5668639101419520/5649050...

Designing a URL Shortening service like TinyURL

Designing Pastebin

Designing Instagram

Designing Dropbox

...

Key Characteristics of Distributed Systems

Load Balancing

Caching

Sharding or Data Partitioning

...


What you want to do is ponder over how common functionality around you must work - if you can't come up with a reasonable solution, it's time to google it and learn it. If you can come up with a reasonable solution, google to check if you're correct.

This is the most pragmatic way I've found to learn and stay in touch.


DEBUGGING! And I don't just mean echo'ing out statements to check the values, I mean an actual attached debugger to your code.

GDB for C / compiled languages XDEBUG for PHP etc

The actual act of stepping through your code and looking at the values, datatypes and their transitions massively increases your productivity if something is tricky!


If you're coming from Python you should start looking into how other languages handle concurrency. Python has a GIL (global interpreter lock) that only allows for single threaded execution under normal circumstances. Learn about threads, locking, mutexes, semaphores, green threads, race conditions...


Caching, logging, centralized configuration, Security, and Design Patterns are all probably easy to overlook.


Monitoring

From the beginning of my career in web development I worked in companies that had good to great monitoring capabilities in place to log and measure performance, resource usage, availability, errors, etc. in all parts of the system. so I took it as something self-evident that you would add that first thing, before deploying a new project.

Only recently as I started mentoring people in other companies did I realize that many developers or not aware of what exists in this space.

In two cases I got asked for help to deal with performance issues, and in neither case did they have any idea what was causing them, because they had pretty much no tooling to tracking it down, so they resorted to speculation. We installed a application monitoring solutions and they were able to fix the problem in no time once it was identifiable.


Can you recommend books/tools? Thanks!


This chapter of the Googles SRE book gives you a good overview.

https://landing.google.com/sre/sre-book/chapters/monitoring-...

But I think it is less abut what specific tool to use, but more to just get started with one and learn how to understand your production system behavior and dependencies from metrics and graphs.

An easy way to get going for monitoring an application is using a hosted solution like newrelic.com Their free version should be sufficient for a very long time.

If you want to run it yourself there are opensource solutions like prometheus.io or riemann.io among many others.


It's absolutely critical to learn object-oriented programming in a mature language like C#. (Java's OO model is slightly broken, as well as fundamentally underdeveloped; it's better tackled after learning a correct, functional object0oriented language like C#.)


Books: Operating Systems/Database/Networking/Computer Security/Computer Architecture textbooks, Software Engineering textbooks (Clean Code, Design Patterns, Designing Data Intensive Applications, Domain Driven Design as a short list off the top of my head)


"Designing Data Intensive Applications" is an absolute goldmine for things like message queues but also going beyond understanding the full implications in database selection and other common, distributed-oriented engineering decisions modern software engineers may come across.


By the same token "documentation". Today I explained to a colleague how they could enable port-forwarding across their already-open-SSH-session via "~C".

Every now and again I pick a tool I use a lot, and read the man-page. Things like "less", "bash", "ssh". Complex enough to contain surprises, but simple enough that you take them for granted.

Almost always this has been time well-spent.


I've extensively used Diff tools as an analytical aid across many different domains. It has uses far beyond code change tracking!

Specifically, I like vim's interactive diff'ing capabilities (although any interactive diff tool with sufficiently powerful text-editing capabilities should suffice).

So much of the troubleshooting that we do in programming is asking "this thing used to work, what changed?". Don't rely on your eyes to find the changes, let the computer do the work for you. The ability to load up two different log files in a diff session, regex-substitute the noisy portions away (dates, process/thread id), and view the key functional changes really helps me go from noise to signal in an optimal way.


You touched on it briefly but I'd like to highlight regular expressions in day to day editing and log delving. It's a massive time saver in my experience at least.

Coworkers often come to me to help them write a quick regex for something, or to have me double check their work.

If you need a playground to get comfortable, https://regex101.com/ is a great resource. Dump some examples you'd like to match and some you don't in the bottom section, and try to write a regex that matches in the top. It will dynamically match as you type, and the right side shows a token by token breakdown of what your regex does.


* Parser generator tools like ANTLR https://www.antlr.org/, or even lex and yacc, useful for parsing languages/config files and probably generating a better/more robust parser than you'd cobble together by hand

* Dynamic programming https://en.wikipedia.org/wiki/ -- great for relatively-quickly (in computation time) coming up with good-enough solutions to some hard (like NP hard) recursive problem that would take forever (almost literally) to technically find the absolute best solution to, but not something you'll use every day.


I was under the impression DP was usually used for getting the exact answer to certain classes of problems faster than exhaustive search. I usually see approximations compared to the ground truth dynamic programming answer.

Do you see the reverse in your work?


That's weird. DP, though better than naive brute force, is still very slow since it looks only for an exact, optimal solution.


IDEs - A proper work-grade IDE like Visual Studio will have some tools built into it that will save you tons of time. Features like "Edit & Continue" have saved my bacon many times by making intractably difficult bugs in algorithms much easier to understand because you can experiment with your code on the fly.

There's also some more common features in most IDEs like being able to jump between symbols that saves a lot of time day-to-day.

If I'm interviewing you and you say you don't like using an IDE or a debugger, that speaks to your work experience, your productivity, your self awareness about your productivity, and really puts an upper limit on the difficulty of the problems you've had to solve.


I find that it is usually the opposite. Devs who don't use big IDEs like visual studio are much more likely to know why things fail in a build pipeline etc.

If you can only build a project by bumbling through menus and pressing a big green button at the end, that is worrying.

If you can only debug by immediately jumping into the debugger and single stepping that is also worrying.

Devs who reach for the tools appropriate to a given situation inspire much more confidence in their abilities.

Knowing the time saving features of your editor is a huge boost. However, the old editors have much more of these ;)


IDEs aren’t useless, you just don’t want them to be a crutch. For this reason, I usually recommend new developers use Nano or a notepad-esque editor until they understand why they might want vi keybindings, then use Vim until they understand why they might want an IDE or something like Emacs. Starting with the IDE hides layers and layers of both junk and useful tools, while experienced developers know which layer of the stack to work on at which time.


> If I'm interviewing you and you say you don't like using an IDE or a debugger, that speaks to your work experience, your productivity, your self awareness about your productivity, and really puts an upper limit on the difficulty of the problems you've had to solve.

Or you have surpassed the limit on the difficulty of the problems your debugger & IDE can help with.


I would say studying any of the AWS paas offerings. Not saying you have to use them, but they cover a large segment of the system component space.

They can help you answer questions like "When would I use an in memory cache vs a rdbms vs a key value store?"


If you happen to program in C/C++ you should absolutely familiarize yourself with the sanitizers of gcc and clang, most notably address sanitizer. (However there are good reasons why you shouldn't program in C/C++ to begin with.)


Low level unix tools. Learning how to read a file, tail a file as it grows (like a logfile), or look at just the beginning, or just the number of lines in the file, or merge a bunch of files into one, etc...

How to get a file from one machine to another with scp. How to use more advanced features of SSH like agents, forwarding, your config file, SOCKS proxying, etc..., How to debug system issues, find where config files should be, find out why your app wont compile. Learn how to install code from source, using configure and make. Learn how to operate your own basic network services like HTTP servers, mail servers, local file sharing with NFS or SMB, etc...


As a programmer, learning how to use a user analytics tool really shift my perspective on things. Profilers and debuggers are tools that tell you where to fix the code. A user analytics tool tells you where to fix the product.

Being able to hypothesize ideas, measure conversion, engagement, retention, A/B test, slice by cohorts and validate whether an experiment validates your hypothesis. It makes you tackle problems very systematically.

Being able to measure both impact and effort. Always evaluating whether the impact was worth the effort and fine tuning it.

The product version or tree falling in the forest joke: If you build something and no one uses it, does it really exist?


Devops; being able to stand up your entire stack in an automated way. Some programmers dismiss this as "yaml/bash engineering", and that's true, but it also challenges all of your assumptions in your stack. If you can't stand up a complete duplicate programmatically, you almost certainly have implicit assumptions that you haven't verified, which will make it much harder to recover from disaster or scale.

Put another way, devops is a mix of declarative programming (YAML) and writing idempotent, imperative code in the presence of large side effects (bash). Learning to handle both is very educational.


Did you know there are document storage/management systems out there? Its useful when one needs to save a lot of documents(duh) or other too-large-to-save-in-rdbms type of content with metadata. Examples are FileNet or Alfresco.

Did you know there is a protocol called CMIS [1] which you can use to query content from these systems? It's similar to a subset of SQL.

[1] https://en.wikipedia.org/wiki/Content_Management_Interoperab...



SQL


You think a self-taught programmer might never encounter SQL?


Speaking as a technical development manager, I can say that many people have far less than adequate exposure to SQL, and that training people to use SQL effectively and safely is all too often a common requirement before letting them loose on the database.

There are so many ways people misunderstand and misuse SQL and relational databases, it's honestly staggering.

So, I second the OP. Learn SQL, and you'll stand out.


Yeah, learning real sql is a very useful skill. At one employer, I became the "sql expert" because I knew the difference between inner and outer joins. Which, if you know sql, means you know next to nothing. But knowing next to nothing was better than knowing nothing at all, so...


This might not be for a hobbyist since the setup is difficult but validating your frontend work with automation is very interesting. I didn't realize for a long time that I could use visual regression tools and Selenium at the end of development to catch bugs. I'm surprised that there aren't more tools in this space targeted for developers. Selenium Grid with WebdriverIO is incredibly helpful. If you put together a visual regression system for validating your daily work you could save some serious time.


There are so many specialized areas that are useful, like common design patterns, basic data structures, various algorithms, transactions, multithreading, parallel processing, etc.

Too many things to list, but I agree with a poster below that suggested looking at the course offerings for a computer science degree. You might be able to find outlines of the courses to get specific topics.

On a day to day basis, code coverage is something that more people should use when writing unit tests to ensure they have tests that execute all their code.


Learn to use Proxy Servers for Troubleshooting

Whenever your web app, mobile app, etc depends on a network response, Charles Proxy (https://www.charlesproxy.com/) can be super helpful. It sits in between and captures all the HTTP request/responses and allows you to manipulate them. So, you could capture an API response (or request), manipulate it, then let it continue. It also let's you interrogate the requests easily.


Design patterns were enlightening to me. The idea of expressing concepts in software design which can be reused was what got me out of the stone age of programming. You can describe Pub/Sub or the Builder Pattern and quickly implement it into a project and know it's advantages and limitations, definitely worth checking out if you haven't already.

http://wiki.c2.com/?CategoryPattern


If understanding how Linux works interests you: http://www.linuxfromscratch.org/


The concept of monitoring. This simple, obvious concept changed a lot how I think of software maintenance. I usually use slack but that's an implementation detail.


Anything that automatically generates pictures from text: mscgen, imagemagick, graphviz, SVG, povray, gnuplot, R, etc.


Using GitHub trending for each language look at new things each month.

Build something and think how would I scale this project to X capacity then rinse and repeat.


Visual Studio, especially its visual UI designer, build & reference management, and scaffolded website development (ASP.Net MVC with Entity Framework from scaffolding). Night and freaking day.

Also, WCF for all your inter-computer communication needs. I wouldn't be developing software today if I weren't introduced to low-trouble application development with Visual Studio


Not tools but more of a mindset: Application Security, OWASP top 10, Web Security. Learn what is XSS, CSRF, what is a CSP, why HSTS exists, why CORS allowing * can be dangerous. Also what is OAuth, OIDC, SAML, etc. How to store passwords securely, how to add security to your devops cycle. Having a good security mindset can be a great asset to any team.


Automated deployment. Even for smaller projects. Ansible for example is a python product.

Online schema change tools if you use RDBMS in production.


Just a few that I've found liberating over the past few years:

1. In memory caches for small projects / one-offs.

Redis and memcache are sometimes not needed. Wrapping your cache in an class whose storage is dependency injected is a nice way to keep moving along.

---

2. grep

Not much to say here other than it's a great search tool. Admittedly, I'm still learning working on wielding it even better

---

3. Piping output: [CMD] > myoutput.txt

Lifesaver for large output.


It depends on what you want to achieve. I can tell you to read about security but that is only relevant if you want to secure your software. I can tell you about pub/sub, but that is only relevant when you need it.

The things you’ve learned were mostly covered in my computer science curriculum. So I am going to second Robin_messsage his message.

Skim through a CS curriculum.


SQL Stored procedures and triggers. It's some of the most useful stuff you may not have considered if you're just a hobbyist. It's not for every project but if you do need a proper database for some reason, there's often great ways to use stored procs or database triggers that greatly simplifies your system.


Corporations have been pushing hard on marketing their vendor-lock-in as-a-service solutions so you don't hear about simple free open source tools anymore; all developer channels are saturated with corporate marketing. So much so that you don't even realize that it's marketing.


For learning what’s going on in compiled programs strace can be very helpful.

https://www.tecmint.com/strace-commands-for-troubleshooting-...


The design of everyday things. It's a design book that will make you a better programmer too.


I use ack (https://beyondgrep.com/) many times per day, and it has saved me a great deal of time. That said, if you use IDEs (I don't), it may be somewhat redundant.


* relational databases (e.g. sqlite, postgres)

* regular expressions

* lisp / scheme / clojure

* emacs

* mini kanren / datalog / prolog

* neural networks and deep learning

* state machines

* caches


If you're into python, look up decorators and coroutines. They will blow your mind.


This is still my favorite mind-blowing introduction to them, from 2009: http://www.dabeaz.com/coroutines/Coroutines.pdf Don't have hardware interrupts or threads but still want a multi-tasking operating system? Not A Problem.


Macros, interpreters and embedded languages [0].

[0] https://github.com/codr7/g-fu/tree/master/v1


So many of the problems are “people problems” so in addition to the excellent technical suggestions in this thread I’d add books like “Thanks for the Feedback”, “How to Win Friends and Influence People”, “Getting to Yes”


"start with no"


Perl one-liners, the ones that overflow to several lines of code in your terminal.


Shell scripting (trust me, this is way more useful than you might think).


SQL


Is this trading bot you’re writing available online?


Design Patterns


Just do yourself a favor and read it the way it's intended: As a dictionary.

Design patterns give you a vocabulary to describe design choices you made. They don't give you a set of things that inform design.


I can't remember who, but someone's said that design patterns really would be more appropriately called something like "palliatives for static manifestly typed languages."

Knowing idioms that relate to certain expressive / organizational problems is a good thing (especially if you're primarily working in a static manifestly typed language), but there's a weird overcelebrated status to them, and I'm not sure I'd encourage a developer I was training to become familiar with a full catalogue of them.


It seems to me that you could regard something like the Y Combinator as a design pattern that is a palliative for dynamic implicitly typed languages.

All kinds of languages have patterns. Whatever kind of language you use, learn the patterns that are relevant/useful for it.


A design pattern is a well-known way to do something. It's important so you don't reinvent the wheel.

They're different than external libraries, because usually the details of your application are closely integrated with how you implement the design pattern.


What are some resources outside of gang of four?



Head First Design patterns


Patterns of Enterprise Application Architecture

Enterprise Integration Patterns

Just start reading some Martin Fowler books, you will be up to your ears in patterns in no time :)


tldr: redis

Hey man, I went down an eerily similar path as you. Seld taught and building a trading system. Wading thru sockets and pipes before discovering zmq and having my whole paradigm for programming completely shifted. Absolutely love zmq, and enjoy thinking of new projects to use with it.

More towards your actual question, I don't think it's as ground shifting of a discovery as zmq, but Redis has also helped me a lot with the trading bot. I use Interactive Brokers, and how their api examples are set up, I haven't figured out how to design my system using only message requests like you see in micro-service style programs. Also when requesting a price for any given contract, I don't like the idea of having to request it once I need the price and then waiting on the network for the brokerage to receive my request and then the price to come back to me. By that point the price or whatever data you're requesting can get stale depending on the speed and timefrime you're wanting to play. So what I did was just request ahead of time streaming data for contracts I'm interested in, and I write that continuously to a localhost redis server. And then whenever any of my bots need information, they check to pull the latest value from that redis server first before going out to the brokerage platform to request the data directly. Basically cuts the full round trip time in half if it can find the data locally first. I believe this is a similar programming paradigm as using redux if you have experience with that, but I've never personally used redux so don't quote me on that.


Absolutely agreed. Redis is a godsend for small time traders like us. I've scaled up to where I'm trading on about 20 crypto exchanges in addition to dabbling in stocks and prior to redis I was using flat json files to store all my data so every time a new price came over the wire the bot would read the file, parse the json, append the price then write it all back out. That was quick and easy to put together and it worked great at first but later when I had all the exchanges going even with an 8700k and Optane drive the computer would just fall over when volume got high. Got sick of that, converted it all over to redis with rejson, added another 32 gigs of RAM and problem is utterly solved. Redis' built-in queue has been extremely useful too as a job system.


Nobody mentions what self taught programmers miss the most.

Theory. And not just algorithm theory.


Could you expand on that if you don't mind? I'm a self-taught programmer currently in the process of realising just how much I miss from theory, my list so far on theory fundamentals includes:

• Mathematics and probabilities, and their applications to CS (e.g. formal methods)

• Design patterns (OOP, functional programming)

• Data structures

• Algorithms, time complexity

• System architectures

• Software strategies: CI, CD

• Database principles: SQL

It may sound naive, but I'm kind of overwhelmed by all this and it's not helping my impostor syndrome, I may make a repo of the list with links to ressources I've identified for learning as it seems like a common struggle, maybe it'll help someone.


None of what you mentioned is theory expect for Mathematics, probability, formal methods and algorithms.

Stuff like design patterns, system architectures and software strategies are like flavor of the week stuff. Opinions basically. Patterns like microservices are bad or good depending on opinion, but theory is always correct. Theory gets less bang for the buck but it's always what many programmers especially self taught ones are missing.

Theory is so hard that it will be hard to see applicability until you're a more seasoned programmer. Many seasoned programmers get by without ever knowing theory. But you will be a better programmer if you know it.

If I were to recommend one theory to study it would be category theory. If there was any true axiomatic theory for design patterns or how to design programs... categories are it. The study of morphisms is the study of the simplest form of a compose-able module. Knowing this theory you will begin to understand why Some design patterns don't work and why it's sometimes hard to reuse patterns in code that was that not properly designed. Theory doesn't answer all questions but for the questions it does answer you will get a definitive answer and not an opinionated one.


The canonical undergrad CS theory text is Sipser's Introduction to the Theory of Computation.

The prereqs are a comfort with discrete mathematics and proofs and a course in algorithms and their analysis.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: