I am in the process of migrating a client’s CI to GA and have seen rough edges.
One example is there is no native way to serialize workflow execution. This means if you run two commit-based triggered actions in quick succession, you can have tests pass that should not. (They run against the wrong checked out code!)
I had this happen in front of the client, where it appeared a unit test of assertTrue(False) passed! It was so undermining of GA I had to chase down the exact status and prove ability to avoid (using Turnstyle) to the client. I wrote an email to the GA team specifically telling this story and expressing my concern.
Another is scheduled actions, where even if you specify a cron, the action has a window of several minutes when a GitHub runner will execute it, and there are cases where the window is missed and the action is not run!
This isn’t documented, though GitHub recently made the docs community-editable, possibly realizing there is too much to do.
All that said, the product is still relatively new, and even with these rough edges it is __really__ good.
Having CI definition alongside code, with free-ish VMs available at all times, (except when it goes down,) and results all inside GitHub is amazing.
Also, the self-hosted runners are amazing, as they allow some really simple matrix definition to run CI on custom HW and network environments.
If it weren’t for all the recent negative attention recently, I would have said GH was doing really well, inclusive of GitHub Actions.
The fact that you can’t:
1) Apply any security policy for runners (e.g. require a label before running a PR)
2) Have runners quit after a single job so you can build ephemeral runners
3) Build your own runner against an open API
... means that self hosted runners are non starter for anything open source. It’s like they had to try hard to make the architecture that obtuse and closed. It’s unclear if it’s really poor/design or an active attempt to somehow drive the business.
I've been overall disappointed with GH actions. Their YAML format lacks the ability to reuse common bits of configuration so you just need to copy/paste it all over. Our project has minitest tests, rspec tests, and another type of tests. All the setup/boiler plate needs to be copied to job section, and kept in sync.
Each test run depends on a huge number of 3rd party services being up. It pulls in docker images (DockerHub) and ubuntu packages (wherever the Ubuntu packages come from). It requires various azure services to be running (to use their cache and artifacts actions). It depends on github being up of course. Test runs fail multiple times per month because one of these services is down at the time the tests run.
Until recently their cache action was often slower than just rebuilding the assets from scratch without the cache.
The only compelling reason to switch to gh actions from any other CI service I've used is to reduce costs. you get what you pay for.
I agree that the YAML format can be a pain. Both because of the copy/paste stuff you mentioned but also just validating syntax can be time consuming.
There is an offline workflow tester I've tried but it is not close enough to the real thing to add that much value.
I've also struggled at times with how data moves around. There are docs and even example actions, but the overall state of education on doing cool stuff with GA is pretty weak.
I've also struggled with some of packages, and services--although some of the pain I've experienced is just software and service dependency management in general and that's a big challenge unrelated to GA.
Speed is a real consideration, though self-hosted runners can potentially solve that problem for you.
Rather than a get what you pay for take, I feel more like it is still early for the product. The visibility is high because it is Github, but it is still getting a lot of things worked out.
For me, GA is too conveniently integrated to Github to ignore.
I can confirm that the scheduled actions definitely don’t always run on time. Sometimes up to 10 mins late. According to support this happens because the system gets overloaded sometimes.
I had some content slip through into my main site a day early because the job that rebuilds the site at the end of the day was 10 mins late, and in the interim new content was added so when the job ran it got lumped in.
I will probably have to re-architect the whole workflow at some point but it’s good enough for now.
GH keeps rolling out more features for Actions well after the initial release, adding things like Organization wide Secrets. Overall I'm quite happy with the progress.
I've run about ten thousand builds now and find the biggest issue the mysterious cancelling of jobs (especially on Win/Macos host), crashing of jobs across the board during GH outages, and the intermittent long pauses (e.g. 15min) between jobs in a multi-job build when nothing appears to happen (scheduling delays). GH could have more clarity on the status board, since I'll see many jobs fail, and check the status page to see everything is green. :/