Hacker News new | past | comments | ask | show | jobs | submit login
How to fuck up software releases (drewdevault.com)
318 points by zdw on Oct 14, 2019 | hide | past | favorite | 92 comments



A thousand ways. One really good idea I read was that you write an empty script, which prints the instructions for each step and you press enter, it prints the next step. It can only print instructions at first, then later, you can write code to automate some (maybe even validate things) steps. Something like this:

    def git_tag():
        print("1. git tag the repository")
        input("Press enter to continue")

    def git_push():
        print("2. git push the repository")
        input("Press enter to continue")

    def pip_deploy():
        print("3. Deploy, run: python setup.py sdist upload")
        input("Press enter to continue")

    def main():
        git_tag()
        git_push()
        pip_deploy()
As simple as that! Then later, you could write code for every step as you have time.


I think this might be the article you read:

https://blog.danslimmon.com/2019/07/15/do-nothing-scripting-...


Yes! Thanks! I couldn't find it.


> print("3. Deploy, run: python setup.py sdist upload")

The setup.py upload command is actually deprecated and has been replaced with "twine upload".

https://setuptools.readthedocs.io/en/latest/setuptools.html#... https://pypi.org/project/twine/


At Braintree, we developed a framework for this:

https://github.com/braintree/runbook

One of our biggest use cases is for including ample sanity checks as part of our releases to make sure we don't mess it up.


Another "fun" one: typo in git tag names.

The rakudo project has monthly releases with tags like 2019.07. Once I mixed up the year and month in e release, so I created a tag 2018.06 instead of 2016.08 (don't remember the exact numbers), and pushed it.

I deleted the wrong tag as soon as I noticed, and pushed the correct one, but of course when the date of the typo'ed release month came around some years later, things blew up for everybody who still had the wrong tag in their local repo (which turned out to be quite some developers).


This is one thing I really loved about subversion. I just used the VCS revision as the tag and version therefore avoiding this issue. Didn’t even need to create tags.

That was until the version overflowed the 16 bit field it was stored in the assembly manifest. Kaboom. :(

People spend way too much time arguing with semantic versioning when 99% of the time the only important thing is the next number is bigger than the previous number.


I agree, it is such a cluster to get human-usable version numbers out of git. I use a convoluted Rube Goldberg contraption based on tags and git-version[1]. It's so much less convenient than old SVN revision numbers.

[1] https://gitversion.readthedocs.io/en/latest/


I find `git describe` pretty human-usable. (I use it in a lot of automated processes.) You still need to tag every now and then, as the format is based on the most recent tag, but it tells you a bunch of useful information fairly compactly and almost semver compliant (in many cases you want to swap the first hyphen with a plus for true semver compliance if your tags are semver releases). It's format:

  {mostRecentAnnotatedTag}-{numberOfCommitsAfter}-g{headCommitShortHash}
Such as: v2.0.3-3-g1cafe9

That has enough information to get you back to exactly which branch was built (3 commits after v2.0.3 with a hash starting 1cafe9). Of course if you git describe when you've checked out the tagged commit itself you just get the tag back v2.0.3.

It's not as simple as simple "revision ID", but it is pretty simple.

In some of my CI pipelines I've been considering swapping my simple `git describe` tricks for `git-version` to make it easier to get more semver-like build metadata in order to CD them, but I've also considered just regexing that first hyphen to a plus and calling it a day.


My favorite is the complexity of having an in-tree changelog that lists (shortlog) commits under appropriate tag headings - checked in under appropriate tags.

It roughly amounts to:

1) commit fix (git commit -m "fixes #foo bug"

2) get hash/shortlog, add to changelog

3) commit changelog referencing commit in 1)

4) tag hash in 3) with new minor version 1.1.2 to 1.1.3

5) realize you forgot that 1) warranted a bump in minor version, edit changelog to reflect new minor version

6) commit new changelog

7) move tag from 4) to 6) - hope you didn't push yet...

And people say RCS $Id:$ was a hack...

Now,the real issue is actually to find a nice way to tag actual releases, with current tag and hash. I realize this is mostly tricky when a project is "checkout and run" - without a build step (eg ruby on rails project). It's still somewhat painful to make sure there's an up to date global variable that correctly reflects the running version.


I clone the project into a temp directory, then inject the version number into the source.


SVN revision numbers are the only thing I miss after switching to git.


I miss knowing what I'm doing :)

Git has so many footguns it's unreal.


Does it, though? It's immutable. As soon I understood that, I no longer felt that git has _any_ footgun. Whatever dumb thing you do - you can just do `git reflog`, find last good known hash, go back to it, and it fixes everything. Just don't push to github (or whatever git server you use) & don't run garbage collection while you're in a middle of a botched rebase :)


Pushing messed-up history is the biggest footgun. Pushing a wrong tag is a close second.

Two smart git strategies mitigate the problems: don't push things, and don't pull things.


> Pushing messed-up history is the biggest footgun.

But this one is a footgun only in the sense that you may mess up things for others, and embarrass yourself. Not in the sense of "permanent data loss", right?

"don't pull things" is definitely bad advice (even if it was ironic) - you don't mess up anything by pulling things; you can easily go back to how things were.

Instead of downvotes, maybe clarify how you mess up things with git? Maybe I can help. I really find it very hard to mess things up. Like, how do you mess up pushing the wrong tag? Unless you force-push it, but why would you do that? Or you mean, you've tagged something wrong? Where's the (permanent) harm in that? I mean, I can imagine "harm" in the sense of "inconvenience", but nothing more serious, unless the mistake goes unnoticed for a very long while (and then, how would any other version control system help you?)


> I really find it very hard to mess things up.

In my experience the difference between my trouble-free use of git and inexperienced users' troubles with git amount to: I avoid doing the wrong thing in the first place, inexperienced users do the wrong thing, then try to fix it.

For example, I just never commit a gigabyte binary file to the repo. But if an inexperienced user does by mistake, and that makes .git/objects/pack big and everyone's checkouts become slow? And then they notice a few days later, after other developers have pulled and pushed other changes? And they try to fix it? Disaster zone.


How do you use git without push and pull?


Have you replied to the right comment? I didn't say I use git without push and pull.


Don't push things. When you push messed-up history to a public target, you now have messed up history forever. Whoops. You will never publish the correct history now.

Pushing messed-up tags can lead to really annoying consequences because tags are a global namespace (except when they aren't), but this comes up much less frequently, especially since tagging a release is a bit ceremonial.

Don't pull things. This is not "ironic" (what is that supposed to mean?) and it will avoid the other side of git footguns. Pulling is a maintainer's shortcut operation (i.e., Linus Torvalds or one of the subsystem vice-Linuxes) and messes up history forever if used on public targets.

I go back and forth depending on mood on what the right thing to do about git's footguns is, but you were right that there's no way to permanently screw up a git repository - if you don't push things and don't pull things. (Seriously, don't! When you publish things, publish them.)


> This is not "ironic" (what is that supposed to mean?)

I thought you were being ironic, in the sense of "if you only use git locally you don't mess stuff up". You definitely pull the shared branches; you just don't force-push to them, and that makes it ok.

> Seriously, don't! When you publish things, publish them.

By "publish" I suppose you mean "push, when there is no upstream branch"? Well, yes, but I also force-push a lot and it's quite ok to do so when you work alone on a branch (e.g. to keep it constantly rebased on top of master). I wouldn't say "don't push, ever" - that's an overkill. I'd just say "protect you master branch so that people can't force-push on it".


> When you publish things, publish them.

What does that mean operationally?


If you only do the things you can also do with svn, the footguns disappear.


The reason why I prefer Mercurial to Git, whenever I can.


Typos are the worst when it comes to things like that.

Especially when you're the type of person who always seems to have typos after you proof read something a few times. But then when you proof it again AFTER you post it, suddenly they all jump out. I can't count the number of times I've gone and edited a HN comment even after proof reading it a few times.

Needless to say dealing with tags or writing a tweet often includes a silly amount of triple checking.


Also typos in container names.


I keep a release TODO list and about 50% of every release, I fuck up and need to add something to the list. It's about 40 items long now (containing various terminal commands and activities), which includes making sure manual entries, documentation, the website, social media, dependencies, etc are handled correctly.


Every time I "fuck up", I add a few commands to a shell script, so I don't mess up the next time.

Over time that has grown into over 1000 lines of shell, e.g.

https://github.com/oilshell/oil/blob/master/devtools/release...

And this is why I'm writing a shell -- because it's good for automating things that you are otherwise too lazy to automate!

I find that "semi-automation" is a good middle ground. This is where shell is better than "real" languages. You don't have to commit to writing a polished program -- you just incrementally improve things and it makes problems go away.

The time you put in is saved immediately, as opposed to other languages, where you might spent 30 minutes writing something that will save you 2 minutes in the next year, which isn't necessarily worth it.

If you keep a text file with a bunch of terminal commands, then you might as well chmod +x it and put #!/bin/sh on the front. That's what a shell script is!

----

Here is the output of the release process, which has tarballs, change logs, documentation, test results, benchmarks, etc.

https://www.oilshell.org/release/0.7.pre5/

It's way to much to do by hand.


What kinds of things do you need to check for that aren’t automated? Maybe I’m spoiled by our CI pipeline.

I’d love to take a look at your list


Here's a somewhat redacted version. `m` is my alias for `make`.

    - Finalize source
     - Legal check `LICENSE-dist.txt`
     - Update `CHANGELOG.md`
     - Bump version in `Makefile` and `Core.json`
     - Commit "Bump version".
    - Build (three OS's can be done in parallel)
     - `m clean`
     - `git pull`
     - Make sure you have the latest `Fundamental.zip` package in source root.
     - `m dist`
     - Manually test installer and fragile features (audio drivers, patch loading) for ~10 minutes.
     - `m notarize` (on Mac)
     - `m upload`
     - `git tag vX.Y.Z`
     - `git push --tags`
    - Release
     - Update version title and URLs in `Rack.pug`
      - `m upload`
      - At this point, normal users have access to new version.
     - Update server version in `config.coffee`
      - `m restart`
      - At this point, normal users will swarm to download new version. Keep an eye on server bandwidth.
    - Publicize
     - Twitter https://twitter.com/home
     - Facebook https://www.facebook.com/vcvrack/
      - Share on group
     - Forum https://community.vcvrack.com/c/announcements


Interesting list, thanks for sharing. I'm the PM for CI/CD at GitLab, and I don't mean to hijack the topic, but wanted to give a heads up that we are looking at building a feature for these kinds of procedures that aren't quite pipelines, based on Jupyter Runbooks. Would love to hear your feedback: https://gitlab.com/gitlab-org/gitlab/issues/9427


Currently the most difficult and most time consuming step in the release checklist is finding space on my desk for three laptops so I can use them all simultaneously. Releasing only takes 1-3 hours, and I unfortunately wouldn't want to spend hours learning and setting up automated methods to perhaps shave 10 minutes off the process.

Manual releasing allows me to iteratively improve the process and spot possible issues, whereas any automation would reduce build reliability.


> whereas any automation would reduce build reliability.

I guess this depends on what your release process is; for example, most npm libraries benefit from CI that runs the build script and runs `npm version $TAG && npm publish`. There's not much that could be improved here, especially if the build script is just an alias for running `webpack`.


There's a method I read about on lobste.rs for this. You write down the list as you did and then turn pieces into "Do X by running Y" and over time more things get automated.

That's a hefty deploy protocol, though, and automating cross-platform builds may be more trouble than it's worth if you're releasing rarely.

It's pretty neat you follow a deploy checklist.


Related to the process of "write down what you did", I'm reminded of this article about "literate sysadmin/devops" which used Emacs' org-mode as a way of combining prose, commands and command output. http://howardism.org/Technical/Emacs/literate-devops.html


I read you should write it as an interactive script e.g.:

    input("Build X")
    input("Build Y")
    input("Zip X & Y")
    input("Upload release")
THen run the script and press enter as you do the steps. It means if you're interrupted as your proceed, you can resume exacly where you were before.

If you implement it as functions:

    def buildX():
        input("Build X")
    def buildY():
        input("Build Y")
    def ZipUp():
        input("Zip X & Y")
    def Upload():
        input("Upload release")
    if __name__ == '__main__':
        buildX()
        buildY()
        ZipUp()
        Upload()

Then you can automate parts piecemeal.


this list is a very simple algorithm. Why don't you tun it into a script deploy.sh ?


Some of the steps look like they'd be hard to do that for. Like for example the one about making sure the license is correct, and the one about testing parts that apparently don't have good automated tests / interact with the real world.


> Some of the steps look like they'd be hard to do that for.

Then those steps must be removed. I thought the mantra "deploy is ONE step" was a more or less universally acknowledged truth.


I mean unless a computer can check that all dependencies have compatible licenses, that is unrealistic. It's a necessary step for publicly released software to avoid legal issues


Linux distributions (though I can only speak to openSUSE's process) have automated scripts (which I believe are written by our lawyers) which check whether the license of a package matches the license of the files inside the package. It's how most package legal review gets done (and if the script can't figure it out, it gets escalated to our actual lawyers too review. You cannot submit a package to any one of the distributions we ship without the legal review being approved.

So it is clearly possible to do -- and there are all sorts of tools which figure out what SPDX license entries apply for every dependency (or vendored dependency) of a given project.


I took the parent posts to mean that step involved checking the project license, not its dependencies.

For the latter though, Fossology, Scancode etc. can help.


[flagged]


Unfortunately, not everyone acknowledges this truth;)


…or has a sense of humour about it! ;-)


Because it involves inherently manual processes.


No it's not.


But it almost is, already! Half of the lines are literally shell commands. And except maybe for the legal check, the rest of the steps can also be written as shell one-liners. For example, to update the version on a README file you can "sed -i" with an appropriate replacement, and so on.


Can you explain how your ./deploy.sh script forks onto three laptops (Windows/Mac/Linux) and "plays around" with the installers/build a bit to make sure it works? I obviously can't release builds that don't even run because I didn't take a few minutes to test them.


Sounds like you are mixing release & QA, makes a lot of sense in a small shop.

In a larger team you’d have a small agent on each of your N (likely >> 3) machines, CI pushes to the agent for build/release/automated tests.

Then if those pass on a given system, it fires off a message to start manual QA validation for performance and other “intangibles”.


Why don't you use virtual machines?


Good question. 1) A MacOS virtual machine is probably illegal. 2) I need to test the builds on all three machines, and VMs don't handle OpenGL, USB drivers (e.g. for MIDI controllers), and audio drivers very well or at all. 3) I need to have a rough idea of performance of the build. VMs affect performance by non-constant/unpredictable factors. For example, CPU performance might be 1x the speed of a bare-metal OS, but graphics performance might be 0.25x. 4) I don't know how to write a shell script on a host computer that launches Windows 10 and runs commands with an MSYS2 Mingw64 shell.


> A MacOS virtual machine is probably illegal.

true. And that is a very sad state of affairs. I use the travis osx hosts for that, but it's not ideal. There's no interface, but at least you can check that the code compiles, runs, and passes automated tests. That's already huge!

For linux and windows hosts, it seems to me that it is a solved problem, as pointed elsewhere.


> A MacOS virtual machine is probably illegal.

It's not, as long as you run it on genuine Apple hardware. We use a bunch of macOS VMs that are 100% legal.

The problem with macOS VMs is that there are a lot of compatibility issues, and in my experience it's a lot of effort to set up macOS VMs. If you have the space and the money, real machines are a lot easier to deal with.


> I don't know how to write a shell script on a host computer that launches Windows 10 and runs commands with an MSYS2 Mingw64 shell.

Launching a VM via shell script is trivial. And you can install OpenSSH on Windows 10: https://github.com/PowerShell/Win32-OpenSSH


We use a combination of real machines and VMs for building software, and dealing with macOS VMs is a major hassle. There are always compatibility issues that are hard to debug (eg. don't try to configure your VM to use more than 2 vCPUs if you want to run macOS 10.12). There is a lot of manual work that goes into maintaining VMs, and often automating those tasks doesn't pay off. If your app has any GUI components, those are often buggy in VMs (anything using the GPU is extremely unreliable).

Our main build machine is an actual Mac, because it's just so much easier to keep it running.

VMs are nice if you need a lot of different setups (eg. one app we distribute has components that need to be built on different versions of macOS, and VMs are nice for that).


How have you come to that conclusion?


I can do every step on your list with any CI platform. I do something extremely similar to build apps for ios (macos), android (linux), and web (linux). I think it's so important that it's one of the first things I do on any project.

I wanted to give you an example, but it all seems so easy, I'm not sure where you're getting stuck.


Can you really do a legal review from a CI platform?


If it can be automated, then obviously, yeah.

If not fully automated, many CI platforms allow you to pause the pipeline by requiring manual confirmation from an authorized user. I know Jenkins and CircleCI support this off the top of my head. You could have the CI perform any relevant searches or diffs, then display that to the user to get manual confirmation. It’s still not “perfect”, but it does allow you to reliably get someone to look at the data and say “yes” or “no.”


Yeah, that's one approach. Another might be to commit some kind of `./legal/sign-off-v1.0.0` so you can see who did it, and so travis can know if it should deploy the build or not


The Scala release checklist [1] is a good example of a release checklist with both automated and manual steps.

[1]: https://github.com/scala/scala-dev/issues/645


It's so relieving to read this thread. Last year I took over management of some legacy scripts and was tasked with updating them (move from Python2 to 3, interfacing with some new APIs, etc). One of the biggest stresses I had was little fuck ups I kept doing with the releases. It was never anything huge, but every few releases I would end up taking 15-30 minutes longer then I planned, and would involve me updating the release process slightly.


Sounds like a great potential SaaS offering


Manifest.ly is designed for this workflow - https://www.manifest.ly/


A couple of practical solutions here.

1) Automate your releases and run that on a CI server if you can. "It works on my laptop" is not ideal for releasing stuff.

2) Have a release branch, and only allow stuff to be merged that passes your CI tests and trigger the automation under step 1 after a merge to your release branch. This works for continuous deployment but can also work for releases of other stuff. If you use semantic versioning, the CI server should tag the release and publish the artifacts. That's one less thing that can go wrong. If you forgot to bump the version number on your master branch, the build should fail.

3) Have a release checklist. People forget stuff and having a small list of "Are the release notes and readme updated? Is our CI not complaining? Have all PRs been merged? Etc. Even with the above, premature releases through an early merge to your release branch could happen. A checklist can prevent that. And of course, if you can integrate those checklists into your CI. If your release notes are unchanged since the last version

4) Release often. Small deltas are less work to test and less risky to impose on users and you get feedback on stuff you did earlier.


The big lessons from this article (which I love and resonate with)

- You discover more bugs with your shins than your foresight

- As long as you keep fixing it in code, you will eventually win - it's like a video game, as long as you can respawn, you will always defeat the game. Just keep buggering on.


> I am doomed to creatively outsmart my tools in releases.

That’s a very honest admission, and one I think lots of “why don’t you automate everything”-crowd seems to fail to recognise as a real-world factor.


I would say the article actually supports the "automate everything" position. The author prevents future mistakes by adding more automation rather than less.


and yet, future mistakes are still happening. A lot of work has gone into automation but the outcome is often the same. So is it worth it?


Obviously the effort isn’t worthless, but it’s not a silver bullet which will magically “fix everything” either.


I agree. While I like to automate away things both for saving time and consistent results, we have a tendency to throw away good old thinking or the window. Have I covered everything? Did I check all the outputs? What am I missing? What step that I did last time have I missed this time?

Sometimes it feels like we're throwing out good old proper thinking and proper work in order to blame automation, that the results is a natural outcome of all the complexity or that we're pressed for time. Sometimes it's just a matter of taking responsibility and doing a good job.


I love the humble nature of this post and how fallible we are.

I have spent many days playing whack-a-mole with issues and hacking out solutions. What is equally great is the community sharing their gotchas and how they solved them.

A wholesome and organic thread, thanks for taking the time to write it!


Don't do it on Friday.

On a serious note, I have found that the only way to not fuck up is to remove as much human interaction from the release process as possible, which means scripting. Even with scripting, if the inputs are bad, the output will also be bad.


Seriously, though, don't release on Friday. No matter how good your CI/CD pipeline is, and how confident you are in your testing, you'll let some bad releases through, and if you release on Friday, some poor bastard, whether you or your coworker, is going to be left holding the bag, and have their weekend ruined scrambling around to fix what is broken.

It's just inconsiderate. Things can wait until Monday.


I'm waiting for the next "disruptor" to make it a sales pitch: "Move fast, break things, and only release on Friday!"


In fact it's standard practice in the financial industry to deploy everything on the week-end, and indeed starting from Friday night, because that's when most markets are closed.

Obviously don't do that if you don't have to, but sometimes it's actually the best time to deploy stuff.


That's the point of this post. Automation is hard and probably won't ever be perfect. You can't remove human interaction from software development.


Automating builds is fun! On the one hand I'd be worried if I joined a team or contributed to a project which still had manual processes for release more complex than 'run this script' or 'push to the x branch'. On the other hand, that's a fun initial project.

When developers aren't (or are prevented from) improving their own environment, that's a bad sign...


Nice heads up! Working on a similar script to help with updating versions and packaging across five platforms. At least I have a (mostly) monorepo. Which is great until it’s not.


For bumping the version there's also bumpversion: https://github.com/peritus/bumpversion


Two things should be noted:

1. `bumpversion` is more general and has more functionality than `semver` in the article 2. `bump2version` is the currently maintained version[0]

I currently use `bumpversion` to manage versioning in applications with many different versioning schemes as well as managing the versioning of their deployment environments (such as updating Terraform files, etc).

[0] https://github.com/c4urself/bump2version


I really like how the OpenStack project does releases - it has a ton of tooling, but for people who are deciding what to release, it is super simple.

e.g. - the latest RC of a project I maintain is L10-14 on https://opendev.org/openstack/releases/src/branch/master/del...

by committing that file, the tooling chooses the right commit, does a tag, and builds the python tarballs, and puts them on releases.openstack.org, and for the client libraries, it pushes them to pypi.

it avoids a lot of the footguns of special scripts in repos, while allowing an easy release process.


I came here expecting to read about how Apple is (mis)treating their macOS lately or how Microsoft has managed to fuck up their windows update process, but instead I find a few good tips while building and numbering software.


This reads like the "Deploying on a Friday Afternoon Greatest Hits".


> Update the signing script to save the tarball to disk (previously, it lived in a pipe) and upload these alongside the releases…

This should have been done from the start.

GitHub does not guarantee checksums for the generated source archives to be stable, so they can change when GitHub updates their software (and yes, this has happened).


For Python projects I use the excellent Versioneer:

https://github.com/warner/python-versioneer

It means you only specify the version as a git tag, and all the other things that need a version number get it from that.


Phft, amateur. I've forgotten more ways to screw stuff up than this person ever knew.


A lot of these can be avoided by making all the steps repeatable. To do that, you should run everything in a Docker container that's built solely from version controlled files, and run it on a remote system that gets blown away regularly. You can use Jenkinsfiles to specify exactly what to do with different branches, and to really simplify things, you can have a different job for every kind of release. Build in a dry-run option so you can run a whole release without actually publishing anything. Each of your software projects can have its own specific release steps in a script run by your release container, while still following the same basic release process for all your projects.


Can dang or sctb fix the title of this to the correct one?

"How to fuck up software releases"


[flagged]


That is literally the title of the post.


Some people find profanity inflammatory in a way that could reasonably be considered linkbait, in which case the site guidelines would require a title change (https://news.ycombinator.com/newsguidelines.html). There's no solution here that will satisfy everybody.

Generally we find that it's better for HN not to edit away profanity. If there's some other reason to edit a title, that's different. But profanity per se is often just energetic and direct language, and if you take it away you run the risk of Bowdlerizing or bringing in euphemisms, which is not in the spirit of this site.

https://news.ycombinator.com/newsguidelines.html


For some prior art in this space, it may be useful looking at ESR's shipper:

http://www.catb.org/esr/shipper/

And some blog posts he's written on it:

http://esr.ibiblio.org/?s=shipper&submit=Search




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: