How are you computing upgrade paths? This seems impossible to do accurately, esp...

stevepike · on June 8, 2023

Can you expand a little? Here's some technical background on what we're doing:

We have our own database of every version of every rubygems package alongside its runtime dependencies (like you see at https://rubygems.org/gems/pundit).

Then we parse your Gemfile and Gemfile.lock. We use the Gemfile to figure out gem group and pinned requirements (we run turn your Gemfile into a ruby AST since Gemfiles can be arbitrary ruby code; we use bundler's APIs to parse your Gemfile.lock). This gives us all of the dependencies your rely on.

Then we let you choose one or more package that you want to upgrade and the version you want to target (let's say Rails 7.0.4.3).

Now we have [your dependencies and their current versions], [target rails version], [all of the runtime dependency constraints of these gems]. We run this through a dependency resolution algorithm (pubgrub). If it resolves then you're good to upgrade to that version of Rails without changing anything.

If this fails to resolve, it's because one or more of your current dependencies has a runtime restriction on rails (or another indirect gem being pulled in by the new rails version). This is where the optimization part comes in. The problem becomes "what is the optimal set of versions of all your dependencies that would resolve with the next version of Rails". Currently we solve for this set trying to optimize for the fewest upgrades. As our dataset of breaking changes gets better we'll change that to optimizing for the "lowest effort".

Happy to elaborate.

ilikebits · on June 9, 2023

Sure. There are a couple of stumbling blocks here:

> We use the Gemfile to figure out gem group and pinned requirements (we run turn your Gemfile into a ruby AST since Gemfiles can be arbitrary ruby code [...]

Here's an example Gemfile:

  if RUBY_VERSION < "3"
    gem "minitest", ">= 5.15.0", "< 5.16"
  else
    gem "minitest", ">= 5.15.0"
  end

This cannot be statically analyzed. And this is not a made-up example either! It comes from the Rails project here: https://github.com/rails/rails/blob/fdad62b23079ce1b90763cb5...

This makes it impossible to statically determine the direct dependency requirements of a project.

> Now we have [your dependencies and their current versions], [target rails version], [all of the runtime dependency constraints of these gems]. We run this through a dependency resolution algorithm (pubgrub). If it resolves then you're good to upgrade to that version of Rails without changing anything. > > If this fails to resolve, it's because one or more of your current dependencies has a runtime restriction on rails (or another indirect gem being pulled in by the new rails version).

This actually isn't that big of a problem for Bundler, which uses pubgrub, which to my understanding is deterministic. A deterministic algorithm means you can actually take requirements and simulate builds. There are two places where I would be hesitant:

1. This determinism only works if you also know what the universe of possible dependencies looks like. In many corporate environments, this is not true! Many corporate environments use private registries that mirror public dependencies, and may have a set of dependencies available that looks different from the public registry, which means your simulated builds will resolve to incorrect dependency versions.

2. As you move to support other languages, many other tools use non-deterministic dependency resolution algorithms. In particular, NPM is famously non-deterministic, which makes it impossible to simulate an NPM build.

---

When you're trying to just determine that there exists a build that resolves properly, these issues aren't particularly painful. At $DAYJOB (and what I suspect you will want in the future), we are often trying to predict the exact build of a user given a new set of dependency requirements (e.g. so we can predict their new vulnerabilities), which means doing accurate build simulations is much more important.

stevepike · on June 10, 2023

This is just an excellent comment, thank you.

One note - we're not as concerned with predicting exactly the user's build as I think your $DAYJOB might be. We need to scan the space of possible valid resolutions that'd do a big framework upgrade, then pick an optimal (least effort) option from that space and chart your path towards it. Concretely that'll mean opening incremental PRs against your codebase where we provide lockfiles that are individually valid and accumulate in getting your upgrade done.

ilikebits · on June 11, 2023

> where we provide lockfiles that are individually valid

Providing lockfiles is a really interesting idea! That certainly solves the "we need your non-deterministic build tool to reproduce an exact build that we found" problem.

We haven't explored this route yet because a lot of our customers use tools that don't support lockfiles (e.g. Maven - Java in general has a lot of legacy stuff).

If you want to build off of our work, our dependency analysis bit is open source: https://github.com/fossas/fossa-cli

erik_seaberg · on June 11, 2023

Maven POM dependencies act as version pins, until you run versions:resolve-ranges to bump them. The only exceptions would be using a SNAPSHOT version (where each build gets a timestamp and is only cached briefly) or your Maven repo admin replaces the contents of a pre-existing artifact (and you'll see a lot of cached checksum warnings about it).