I am about to revisit #1517, #1454, #1561, and #1560 as a part of our on-going effort for a major enhancement to the runner entrypoints being made in #1759.
This change updates and reintroduces #1517 contributed by @CASABECI in a way it becomes applicable to today's code-base.
* feat: allow to discover runner statuses
* fix manifests
* Bump runner version to 2.289.1 which includes the hooks support
* Add feedback from review
* Update reference to newRunnerPod
* Fix TestNewRunnerPodFromRunnerController and make hooks file names job specific
* Fix additional TestNewRunnerPod test
* Cover additional feedback from review
* fix rbac manager role
* Add permissions to service account for container mode if not provided
* Rename flag to runner.statusUpdateHook.enabled and fix needsServiceAccount
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* runner: Remove the ability to use the deprecated `--once` flag
Ref #1196
* runner: Ability to opt-out of using --ephemeral
Although we are going to eventually remove the ability to use the legacy --once flag as proposed in #1196, there might be folks still using legacy GHES versions 3.2 or earlier.
This commit removes the existing feature flag to opt-in for --ephemeral, while adding another feature flag RUNNER_FEATURE_FLAG_ONCE to opt-in for --once so that folks stuck in legacy GHES versions
can still use ARC.
Since this change every user starts using --ephemeral by default. If they see any issues on legacy GHES instance, RUNNER_FEATURE_FLAG_ONCE=true can be set to opt-in to keep using --once, which gives one more ARC release until they upgrade their GHES instance.
But beware, we won't support legacy GHES instances forever as it's going to be a maintenance nightmare. Please upgrade!
Ref #1196
* Improved Bash Logger
This is a first step towards having robust Bash scripts in the runner images. The changes _could_ be considered breaking, depending on our backwards compatibility definition.
* Fixed Log Formatting Issues
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
* feat: copy dotfiles from asset to service dir
* Fixed `UNITTEST` Condition
* Load `/etc/environment`
See https://github.com/actions/runner/issues/1703 for context on this change.
* Add env variable to configure `disablupdate` flag
* Write test for entrypoint disable update
* Rename flag, update docs for DISABLE_RUNNER_UPDATE
* chore: bump runner version in makefile
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
The unit tests are simulating a run for entrypoint. It creates some
dummy config.sh and runsvc.sh and makes sure the logic behind
entrypoint.sh is correct.
Unfortunately the entrypoint.sh contains some sections that are not
mockable so I had to put some logic in there too.
Testing includes for now:
- the normal scenario
- the normal non-ephemeral scenario
- the configuration failure scenario
Also tested the entrypoint.sh on a real runner, still works as expected.
Adding a basic retry loop during configuration. If configuration fails,
the runner will just straight into a retry loop and will continuously
fail until it dies after a while.
This change will retry 10 times and will exit if the configuration
wasn't successful.
Also, changed the logging format, adding a bit of color in the event of
success or failure.
This add support for two upcoming enhancements on the GitHub side of self-hosted runners, ephemeral runners, and `workflow_jow` events. You can't use these yet.
**These features are not yet generally available to all GitHub users**. Please take this pull request as a preparation to make it available to actions-runner-controller users as soon as possible after GitHub released the necessary features on their end.
**Ephemeral runners**:
The former, ephemeral runners, is basically the reliable alternative to `--once`, which we've been using when you enabled `ephemeral: true` (default in actions-runner-controller).
`--once` has been suffering from a race issue #466. `--ephemeral` fixes that.
To enable ephemeral runners with `actions/runner`, you give `--ephemeral` to `config.sh`. This updated version of `actions-runner-controller` does it for you, by using `--ephemeral` instead of `--once` when you set `RUNNER_FEATURE_FLAG_EPHEMERAL=true`.
Please read the section `Ephemeral Runners` in the updated version of our README for more information.
Note that ephemeral runners is not released on GitHub yet. And `RUNNER_FEATURE_FLAG_EPHEMERAL=true` won't work at all until the feature gets released on GitHub. Stay tuned for an announcement from GitHub!
**`workflow_job` events**:
`workflow_job` is the additional webhook event that corresponds to each GitHub Actions workflow job run. It provides `actions-runner-controller` a solid foundation to improve our webhook-based autoscale.
Formerly, we've been exploiting webhook events like `check_run` for autoscaling. However, as none of our supported events has included `labels`, you had to configure an HRA to only match relevant `check_run` events. It wasn't trivial.
In contrast, a `workflow_job` event payload contains `labels` of runners requested. `actions-runner-controller` is able to automatically decide which HRA to scale by filtering the corresponding RunnerDeployment by `labels` included in the webhook payload. So all you need to use webhook-based autoscale will be to enable `workflow_job` on GitHub and expose actions-runner-controller's webhook server to the internet.
Note that the current implementation of `workflow_job` support works in two ways, increment, and decrement. An increment happens when the webhook server receives` workflow_job` of `queued` status. A decrement happens when it receives `workflow_job` of `completed` status. The latter is used to make scaling-down faster so that you waste money less than before. You still don't suffer from flapping, as a scale-down is still subject to `scaleDownDelaySecondsAfterScaleOut `.
Please read the section `Example 3: Scale on each `workflow_job` event` in the updated version of our README for more information on its usage.
- Adds `ephemeral` option to `runner.spec`
```
....
template:
spec:
ephemeral: false
repository: mumoshu/actions-runner-controller-ci
....
```
- `ephemeral` defaults to `true`
- `entrypoint.sh` in runner/Dockerfile modified to read `RUNNER_EPHEMERAL` flag
- Runner images are backward-compatible. `--once` is omitted only when the new envvar `RUNNER_EPHEMERAL` is explicitly set to `false`.
Resolves#457
This is an attempt to support scaling from/to zero.
The basic idea is that we create a one-off "registration-only" runner pod on RunnerReplicaSet being scaled to zero, so that there is one "offline" runner, which enables GitHub Actions to queue jobs instead of discarding those.
GitHub Actions seems to immediately throw away the new job when there are no runners at all. Generally, having runners of any status, `busy`, `idle`, or `offline` would prevent GitHub actions from failing jobs. But retaining `busy` or `idle` runners means that we need to keep runner pods running, which conflicts with our desired to scale to/from zero, hence we retain `offline` runners.
In this change, I enhanced the runnerreplicaset controller to create a registration-only runner on very beginning of its reconciliation logic, only when a runnerreplicaset is scaled to zero. The runner controller creates the registration-only runner pod, waits for it to become "offline", and then removes the runner pod. The runner on GitHub stays `offline`, until the runner resource on K8s is deleted. As we remove the registration-only runner pod as soon as it registers, this doesn't block cluster-autoscaler.
Related to #447
Prevents arguments from being split when e.g. the RUNNER_GROUP variable contains spaces (which is legit. One can create such groups in GitHub).
I've seen that all workers with group names that contain no spaces can register successfully, while all workers with groups that contain spaces will not register.
Furthermore, I suppose also other chars can be used here to inject arbitrary commands in an unsupported way via e.g. pipe symbol.
Quoting the vars correctly should prevent that and allow for e.g. group names and runner labels with spaces and other bash reserved characters.
* so far, runner group parameter is only set for runners with org scope
* now set group for enterprise runners as well
* removed null check for org scope as either org or enterprise will be set
* runner/controller: Add externals directory mount point
* Runner: Create hack for moving content of /runner/externals/ dir
* Externals dir Mount: mount examples for '__e/node12/bin/node' not found error
* Adds RUNNER_GROUP argument to the runner registration
Adds the ability to register a runner to a predefined runner_group
Resolves#137
* Update README with runner group example
- Updates the README with instructions of how to add the runner to a
group
- Fix code fencing for shell and yaml blocks in the README
- Use consistent bullet points (dash not asterisk)
* Use self update ready entrypoint
* Add --once support for runsvc.sh
Run `cd runner; NAME=$DOCKER_USER/actions-runner TAG=dev make docker-build docker-push`,
`kubectl apply -f release/actions-runner-controller.yaml`,
then update the runner image(not the controller image) by updating e.g. `Runner.Spec.Image` to `$DOCKER_USER/actions-runner:$TAG`, for testing.
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>