* Removed `modprobe` Script
I was able to find out that this script originates from https://github.com/docker-library/docker/blob/master/modprobe.sh but our image does not have `lsmod` nor `modprobe` installed. Hence, if it were in use, it would fail every time. 🤔
* fix: correct command order
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
* Improved Bash Logger
This is a first step towards having robust Bash scripts in the runner images. The changes _could_ be considered breaking, depending on our backwards compatibility definition.
* Fixed Log Formatting Issues
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
* feat: copy dotfiles from asset to service dir
* Fixed `UNITTEST` Condition
* Load `/etc/environment`
See https://github.com/actions/runner/issues/1703 for context on this change.
* Add env variable to configure `disablupdate` flag
* Write test for entrypoint disable update
* Rename flag, update docs for DISABLE_RUNNER_UPDATE
* chore: bump runner version in makefile
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
The unit tests are simulating a run for entrypoint. It creates some
dummy config.sh and runsvc.sh and makes sure the logic behind
entrypoint.sh is correct.
Unfortunately the entrypoint.sh contains some sections that are not
mockable so I had to put some logic in there too.
Testing includes for now:
- the normal scenario
- the normal non-ephemeral scenario
- the configuration failure scenario
Also tested the entrypoint.sh on a real runner, still works as expected.
Adding a basic retry loop during configuration. If configuration fails,
the runner will just straight into a retry loop and will continuously
fail until it dies after a while.
This change will retry 10 times and will exit if the configuration
wasn't successful.
Also, changed the logging format, adding a bit of color in the event of
success or failure.
This add support for two upcoming enhancements on the GitHub side of self-hosted runners, ephemeral runners, and `workflow_jow` events. You can't use these yet.
**These features are not yet generally available to all GitHub users**. Please take this pull request as a preparation to make it available to actions-runner-controller users as soon as possible after GitHub released the necessary features on their end.
**Ephemeral runners**:
The former, ephemeral runners, is basically the reliable alternative to `--once`, which we've been using when you enabled `ephemeral: true` (default in actions-runner-controller).
`--once` has been suffering from a race issue #466. `--ephemeral` fixes that.
To enable ephemeral runners with `actions/runner`, you give `--ephemeral` to `config.sh`. This updated version of `actions-runner-controller` does it for you, by using `--ephemeral` instead of `--once` when you set `RUNNER_FEATURE_FLAG_EPHEMERAL=true`.
Please read the section `Ephemeral Runners` in the updated version of our README for more information.
Note that ephemeral runners is not released on GitHub yet. And `RUNNER_FEATURE_FLAG_EPHEMERAL=true` won't work at all until the feature gets released on GitHub. Stay tuned for an announcement from GitHub!
**`workflow_job` events**:
`workflow_job` is the additional webhook event that corresponds to each GitHub Actions workflow job run. It provides `actions-runner-controller` a solid foundation to improve our webhook-based autoscale.
Formerly, we've been exploiting webhook events like `check_run` for autoscaling. However, as none of our supported events has included `labels`, you had to configure an HRA to only match relevant `check_run` events. It wasn't trivial.
In contrast, a `workflow_job` event payload contains `labels` of runners requested. `actions-runner-controller` is able to automatically decide which HRA to scale by filtering the corresponding RunnerDeployment by `labels` included in the webhook payload. So all you need to use webhook-based autoscale will be to enable `workflow_job` on GitHub and expose actions-runner-controller's webhook server to the internet.
Note that the current implementation of `workflow_job` support works in two ways, increment, and decrement. An increment happens when the webhook server receives` workflow_job` of `queued` status. A decrement happens when it receives `workflow_job` of `completed` status. The latter is used to make scaling-down faster so that you waste money less than before. You still don't suffer from flapping, as a scale-down is still subject to `scaleDownDelaySecondsAfterScaleOut `.
Please read the section `Example 3: Scale on each `workflow_job` event` in the updated version of our README for more information on its usage.
* feat: RunnerSet backed by StatefulSet
Unlike a runner deployment, a runner set can manage a set of stateful runners by combining a statefulset and an admission webhook that mutates statefulset-managed pods with required envvars and registration tokens.
Resolves#613
Ref #612
* Upgrade controller-runtime to 0.9.0
* Bump Go to 1.16.x following controller-runtime 0.9.0
* Upgrade kubebuilder to 2.3.2 for updated etcd and apiserver following local setup
* Fix startup failure due to missing LeaderElectionID
* Fix the issue that any pods become unable to start once actions-runner-controller got failed after the mutating webhook has been registered
* Allow force-updating statefulset
* Fix runner container missing work and certs-client volume mounts and DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false
* Fix runnerset-controller not applying statefulset.spec.template.spec changes when there were no changes in runnerset spec
* Enable running acceptance tests against arbitrary kind cluster
* RunnerSet supports non-ephemeral runners only today
* fix: docker-build from root Makefile on intel mac
* fix: arch check fixes for mac and ARM
* ci: aligning test data format and patching checks
* fix: removing namespace in test data
* chore: adding more ignores
* chore: removing leading space in shebang
* Re-add metrics to org hra testdata
* Bump cert-manager to v1.1.1 and fix deploy.sh
Co-authored-by: toast-gear <15716903+toast-gear@users.noreply.github.com>
Co-authored-by: Callum James Tait <callum.tait@photobox.com>
- Adds `ephemeral` option to `runner.spec`
```
....
template:
spec:
ephemeral: false
repository: mumoshu/actions-runner-controller-ci
....
```
- `ephemeral` defaults to `true`
- `entrypoint.sh` in runner/Dockerfile modified to read `RUNNER_EPHEMERAL` flag
- Runner images are backward-compatible. `--once` is omitted only when the new envvar `RUNNER_EPHEMERAL` is explicitly set to `false`.
Resolves#457
This is an attempt to support scaling from/to zero.
The basic idea is that we create a one-off "registration-only" runner pod on RunnerReplicaSet being scaled to zero, so that there is one "offline" runner, which enables GitHub Actions to queue jobs instead of discarding those.
GitHub Actions seems to immediately throw away the new job when there are no runners at all. Generally, having runners of any status, `busy`, `idle`, or `offline` would prevent GitHub actions from failing jobs. But retaining `busy` or `idle` runners means that we need to keep runner pods running, which conflicts with our desired to scale to/from zero, hence we retain `offline` runners.
In this change, I enhanced the runnerreplicaset controller to create a registration-only runner on very beginning of its reconciliation logic, only when a runnerreplicaset is scaled to zero. The runner controller creates the registration-only runner pod, waits for it to become "offline", and then removes the runner pod. The runner on GitHub stays `offline`, until the runner resource on K8s is deleted. As we remove the registration-only runner pod as soon as it registers, this doesn't block cluster-autoscaler.
Related to #447
* Fix acceptance helm test not using newly built controller image
* Locally build runner image instead of pulling it
* Revert runner controller image pull policy to always
and add a line to the test deployment to use IfNotPresent
* Change runner repository from summerwind/action-runner to the owner of actions-runner-controller.
Also fix some Makefile formatting.
* Undo renaming acceptance/pull to docker-pull
* Some env var cleanup
Rename USERNAME to DOCKER_USER(is still used for github too tho)
Add RUNNER_NAME var(defaults to $DOCKER_USER/actions-runner)
Add TEST_REPO(defaults to $DOCKER_USER/actions-runner-controller)