4.5 KiB
Improve ARC workflows for autoscaling runner sets
Date: 2023-03-17
Status: Done
Context
In the actions-runner-controller repository we essentially have two projects living side by side: the "legacy" actions-runner-controller and the new one GitHub is supporting (gha-runner-scale-set). To hasten progress we relied on existing workflows and added some of our own (e.g.: end-to-end tests). We now got to a point where it's sort of confusing what does what and why, not to mention the increased running times of some those workflows and some GHA-related flaky tests getting in the way of legacy ARC and viceversa. The three main areas we want to cover are: Go code, Kubernetes manifests / Helm charts and E2E tests.
Go code
At the moment we have three workflows that validate Go code:
- golangci-lint: this is a collection of linters that currently runs on all PRs and push to master
- Validate ARC:
this is a bit of a catch-all workflow, other than Go tests this also validates
Kubernetes manifests, runs
go generate,go fmtandgo vet - Run CodeQL
Proposal
I think having one Go workflow that collects everything-Go would help a ton with
reliability and understandability of what's going on. This shouldn't be limited
to the GHA-supported mode as there are changes that even if made outside the GHA
code base could affect us (such as a dependency update).
This workflow should only run on changes to *.go files, go.mod and go.sum.
It should have these jobs, aiming to cover all existing functionality and
eliminate some duplication:
test: run all Go tests in the project. We currently use the-shortand-coverprofileflags: while-shortis used to skip old ARC E2E tests,-coverprofileis adding to the test time without really giving us any value in return. We should also start usingactions/setup-go@v5to take advantage of caching (it would speed up our tests by a lot) or enable it onv3if we have a strong reason not to upgrade. We should keep ignoring our E2E tests too as those will be run elsewhere (either useShortthere too or ignoring the package like we currently do). As a dependency for tests this needs to runmake manifestsfirst: we should fail there and then if there is a diff.fmt: we currently rungo fmt ./...as part ofValidate ARCbut do nothing with the results. We should fail in case of a diff. We don't need caching for this job.lint: this corresponds to what's currently thegolanci-lintworkflow (this also coversgo vetwhich currently happens as part ofValidate ARC too)generate: the current behaviour for this is actually quite risky, we generate our code inValidate ARCworkflow and use the results to run the tests but we don't validate that up to date generate code is checked in. This job should rungo generateand fail on a diff.vulncheck: EDIT: this is covered by CodeQL the Go team is maintaininggovulncheck, a tool to recursively analyzing all function calls in Go code and spot vulnerabilities on the call stack.
Kubernetes manifests / Helm charts
We have recently separated
Helm chart validation and we validate up-to-dateness of manifests as part of Go / test.
End to end tests
These tests are giving us really good coverage and should be one of the main actors when it comes to trusting our releases. Two improvements that could be done here are:
- renaming the workflow to
GHA E2E: since renaming our resources theghaprefix has been used to identify things related to the mode GitHub supports and these jobs strictly validate the GitHub mode only. Having a shorter name allows for more readability of the various scenarios (e.g.GHA E2E / single-namespace-setup). - the test currently monitors and validates the number of pods spawning during the workflow but not the outcome of the workflow. While not necessary to look at pods specifics, we should at least guarantee that the workflow can successfully conclude.