Housekeeping: move `adrs/` to `docs/` and update status (#2443)

Co-authored-by: Francesco Renzi <rentziass@github.com>
2023-03-27 16:38:27 +02:00 · 2023-03-27 16:38:27 +02:00 · c23e31123c
parent 56e1c62ac2
commit c23e31123c
10 changed files with 87 additions and 3 deletions
--- a/docs/adrs/2022-10-17-runner-image.md
+++ b/docs/adrs/2022-10-17-runner-image.md
--- a/docs/adrs/2022-10-27-runnerscaleset-lifetime.md
+++ b/docs/adrs/2022-10-27-runnerscaleset-lifetime.md
--- a/docs/adrs/2022-11-04-crd-api-group-name.md
+++ b/docs/adrs/2022-11-04-crd-api-group-name.md
--- a/docs/adrs/2022-12-05-adding-labels-k8s-resources.md
+++ b/docs/adrs/2022-12-05-adding-labels-k8s-resources.md
--- a/docs/adrs/2022-12-27-pick-the-right-runner-to-scale-down.md
+++ b/docs/adrs/2022-12-27-pick-the-right-runner-to-scale-down.md
--- a/docs/adrs/2023-02-02-automate-runner-updates.md
+++ b/docs/adrs/2023-02-02-automate-runner-updates.md
@ -2,7 +2,7 @@

 **Date**: 2023-02-02

-**Status**: Proposed
+**Status**: Done

 ## Context

--- a/docs/adrs/2023-02-10-limit-manager-role-permission.md
+++ b/docs/adrs/2023-02-10-limit-manager-role-permission.md
@ -2,7 +2,7 @@

 **Date**: 2023-02-10

-**Status**: Pending
+**Status**: Done

 ## Context

--- a/docs/adrs/2023-03-17-workflow-improvements.md
+++ b/docs/adrs/2023-03-17-workflow-improvements.md
@ -0,0 +1,84 @@
+# Improve ARC workflows for autoscaling runner sets
+
+**Date**: 2023-03-17
+
+**Status**: Done
+
+## Context
+
+In the [actions-runner-controller](https://github.com/actions/actions-runner-controller)
+repository we essentially have two projects living side by side: the "legacy"
+actions-runner-controller and the new one GitHub is supporting
+(gha-runner-scale-set). To hasten progress we relied on existing workflows and
+added some of our own (e.g.: end-to-end tests). We now got to a point where it's
+sort of confusing what does what and why, not to mention the increased running
+times of some those workflows and some GHA-related flaky tests getting in the
+way of legacy ARC and viceversa. The three main areas we want to cover are: Go
+code, Kubernetes manifests / Helm charts and E2E tests.
+
+## Go code
+
+At the moment we have three workflows that validate Go code:
+
+- [golangci-lint](https://github.com/actions/actions-runner-controller/blob/34f3878/.github/workflows/golangci-lint.yaml):
+  this is a collection of linters that currently runs on all PRs and push to
+  master
+- [Validate ARC](https://github.com/actions/actions-runner-controller/blob/01e9dd3/.github/workflows/validate-arc.yaml):
+  this is a bit of a catch-all workflow, other than Go tests this also validates
+  Kubernetes manifests, runs `go generate`, `go fmt` and `go vet`
+- [Run CodeQL](https://github.com/actions/actions-runner-controller/blob/a095f0b66aad5fbc8aa8d7032f3299233e4c84d2/.github/workflows/run-codeql.yaml)
+
+### Proposal
+
+I think having one `Go` workflow that collects everything-Go would help a ton with
+reliability and understandability of what's going on. This shouldn't be limited
+to the GHA-supported mode as there are changes that even if made outside the GHA
+code base could affect us (such as a dependency update).
+This workflow should only run on changes to `*.go` files, `go.mod` and `go.sum`.
+It should have these jobs, aiming to cover all existing functionality and
+eliminate some duplication:
+
+- `test`: run all Go tests in the project. We currently use the `-short` and
+  `-coverprofile` flags: while `-short` is used to skip [old ARC E2E
+  tests](https://github.com/actions/actions-runner-controller/blob/master/test/e2e/e2e_test.go#L85-L87),
+  `-coverprofile` is adding to the test time without really giving us any value
+  in return. We should also start using `actions/setup-go@v4` to take advantage
+  of caching (it would speed up our tests by a lot) or enable it on `v3` if we
+  have a strong reason not to upgrade. We should keep ignoring our E2E tests too
+  as those will be run elsewhere (either use `Short` there too or ignoring the
+  package like we currently do). As a dependency for tests this needs to run
+  `make manifests` first: we should fail there and then if there is a diff.
+- `fmt`: we currently run `go fmt ./...` as part of `Validate ARC` but do
+  nothing with the results. We should fail in case of a diff. We don't need
+  caching for this job.
+- `lint`: this corresponds to what's currently the `golanci-lint` workflow (this
+  also covers `go vet` which currently happens as part of `Validate ARC too`)
+- `generate`: the current behaviour for this is actually quite risky, we
+  generate our code in `Validate ARC` workflow and use the results to run the
+  tests but we don't validate that up to date generate code is checked in. This
+  job should run `go generate` and fail on a diff.
+- `vulncheck`: **EDIT: this is covered by CodeQL** the Go team is maintaining [`govulncheck`](https://go.dev/blog/vuln), a tool to recursively
+  analyzing all function calls in Go code and spot vulnerabilities on the call
+  stack.
+
+## Kubernetes manifests / Helm charts
+
+We have [recently separated](https://github.com/actions/actions-runner-controller/commit/bd9f32e3540663360cf47f04acad26e6010f772e)
+Helm chart validation and we validate up-to-dateness of manifests as part of `Go
+/ test`.
+
+## End to end tests
+
+These tests are giving us really good coverage and should be one of the main
+actors when it comes to trusting our releases. Two improvements that could be
+done here are:
+
+- renaming the workflow to `GHA E2E`: since renaming our resources the `gha`
+  prefix has been used to identify things related to the mode GitHub supports
+  and these jobs strictly validate the GitHub mode _only_. Having a shorter name
+  allows for more readability of the various scenarios (e.g. `GHA E2E /
+  single-namespace-setup`).
+- the test currently monitors and validates the number of pods spawning during
+  the workflow but not the outcome of the workflow. While not necessary to look
+  at pods specifics, we should at least guarantee that the workflow can
+  successfully conclude.
--- a/docs/adrs/2023-04-14-adding-labels-k8s-resources.md
+++ b/docs/adrs/2023-04-14-adding-labels-k8s-resources.md
@ -86,4 +86,4 @@ Or for example if they're having problems specifically with runners:
 This way users don't have to understand ARC moving parts but we still have a
 way to target them specifically if we need to.

-[^1]: [ADR 2022-12-05](2022-12-05-adding-labels-k8s-resources.md)
+[^1]: Supersedes [ADR 2022-12-05](2022-12-05-adding-labels-k8s-resources.md)
--- a/docs/adrs/yyyy-mm-dd-TEMPLATE.md
+++ b/docs/adrs/yyyy-mm-dd-TEMPLATE.md