It's probably worth highlighting it's ver 0.X.X by design and that breaking changes are possible until we move it to 1.0.0
Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
The unregister timeout of 1 minute (no matter how long it is) can negatively impact availability of static runner constantly running workflow jobs, and ephemeral runner that runs a long-running job.
We deal with that by completely removing the unregistaration timeout, so that regarldess of the type of runner(static or ephemeral) it waits forever until it successfully to get unregistered before being terminated.
This will work on GHES but GitHub Enterprise Cloud due to excessive GitHub API calls required.
More work is needed, like adding a cache layer to the GitHub client, to make it usable on GitHub Enterprise Cloud.
Fixes additional cases from https://github.com/actions-runner-controller/actions-runner-controller/pull/1012
If GitHub auth is provided in the webhooks controller then runner groups with custom visibility are supported. Otherwise, all runner groups will be assumed to be visible to all repositories
`getScaleUpTargetWithFunction()` will check if there is an HRA available with the following flow:
1. Search for **repository** HRAs - if so it ends here
2. Get available HRAs in k8s
3. Compute visible runner groups
a. If GitHub auth is provided - get all the runner groups that are visible to the repository of the incoming webhook using GitHub API calls.
b. If GitHub auth is not provided - assume all runner groups are visible to all repositories
4. Search for **default organization** runners (a.k.a runners from organization's visible default runner group) with matching labels
5. Search for **default enterprise** runners (a.k.a runners from enterprise's visible default runner group) with matching labels
6. Search for **custom organization runner groups** with matching labels
7. Search for **custom enterprise runner groups** with matching labels
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
* Add env variable to configure `disablupdate` flag
* Write test for entrypoint disable update
* Rename flag, update docs for DISABLE_RUNNER_UPDATE
* chore: bump runner version in makefile
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
The webhook "workflowJob" pass the labels the job needs to the controller, who in turns search for them in its RunnerDeployment / RunnerSet. The current implementation ignore the search for `self-hosted` if this is the only label, however if multiple labels are found the `self-hosted` label must be declared explicitely or the RD / RS will not be selected for the autoscaling.
This PR fixes the behavior by ignoring this label, and add documentation on this webhook for the other labels that will still require an explicit declaration (OS and architecture).
The exception should be temporary, ideally the labels implicitely created (self-hosted, OS, architecture) should be searchable alongside the explicitly declared labels.
code tested, work with `["self-hosted"]` and `["self-hosted","anotherLabel"]`
Fixes#951
A lot of people have issues with private GKE clusters and it seems they are all solved by setting up a firewall policy. I think it would be relevant to include this in a troubleshooting-section since so many people are searching around issues for it. I myself just spent most of my day trying to figure it out.
Issues where this is the solution:
* #293
* #335
* #68
* doc: Add GitHub App Permission for Workflow Job Webhook
I think we've missed documenting about the app permission for receiving workflow_job events via the app webhook
* docs: clarification on webhook events
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
* docs: minor correction to autoscaling description
* docs: make pull metrics plural
* docs: remove redundant words
* docs: make plural and expand anti-flapping
The README had some examples of RunnerDeployment with wrong/outdated
spec.
This caused some pain when trying to create them based on the examples.
=> Fixed using the spec declared in: api/v1alpha1/runner_types.go
* docs: clean up of autoscaling section
* docs: clarifying anti-flapping
* docs: more improvements
* docs: more improvements
* docs: adding duration details and cavaets
* docs: smaller english and better structure
* docs: use consistent wording
* docs: adding limitation cavaet for RunnerSets
* docs: correct helm uprgade order
* docs: lines helm upgrade command with help switch
* docs: use existing limitations section
* docs: fix table of headers and contents
* docs: add link to runnersets on first mention
* docs: adding runnerset limitation
* chore: use new enterprise permission for PAT
* docs: bump example deploy to latest version
* docs: adding oauth apps link
* docs: adding cavaet to the oauth apps doc
* docs: updating to reflect the latest release
* docs: further updates
* docs: format updates
* docs: correcting title sizes
* docs: correcting links
* docs: correcting the grammar of multi controllers
* docs: slightly less awkward English
This add support for two upcoming enhancements on the GitHub side of self-hosted runners, ephemeral runners, and `workflow_jow` events. You can't use these yet.
**These features are not yet generally available to all GitHub users**. Please take this pull request as a preparation to make it available to actions-runner-controller users as soon as possible after GitHub released the necessary features on their end.
**Ephemeral runners**:
The former, ephemeral runners, is basically the reliable alternative to `--once`, which we've been using when you enabled `ephemeral: true` (default in actions-runner-controller).
`--once` has been suffering from a race issue #466. `--ephemeral` fixes that.
To enable ephemeral runners with `actions/runner`, you give `--ephemeral` to `config.sh`. This updated version of `actions-runner-controller` does it for you, by using `--ephemeral` instead of `--once` when you set `RUNNER_FEATURE_FLAG_EPHEMERAL=true`.
Please read the section `Ephemeral Runners` in the updated version of our README for more information.
Note that ephemeral runners is not released on GitHub yet. And `RUNNER_FEATURE_FLAG_EPHEMERAL=true` won't work at all until the feature gets released on GitHub. Stay tuned for an announcement from GitHub!
**`workflow_job` events**:
`workflow_job` is the additional webhook event that corresponds to each GitHub Actions workflow job run. It provides `actions-runner-controller` a solid foundation to improve our webhook-based autoscale.
Formerly, we've been exploiting webhook events like `check_run` for autoscaling. However, as none of our supported events has included `labels`, you had to configure an HRA to only match relevant `check_run` events. It wasn't trivial.
In contrast, a `workflow_job` event payload contains `labels` of runners requested. `actions-runner-controller` is able to automatically decide which HRA to scale by filtering the corresponding RunnerDeployment by `labels` included in the webhook payload. So all you need to use webhook-based autoscale will be to enable `workflow_job` on GitHub and expose actions-runner-controller's webhook server to the internet.
Note that the current implementation of `workflow_job` support works in two ways, increment, and decrement. An increment happens when the webhook server receives` workflow_job` of `queued` status. A decrement happens when it receives `workflow_job` of `completed` status. The latter is used to make scaling-down faster so that you waste money less than before. You still don't suffer from flapping, as a scale-down is still subject to `scaleDownDelaySecondsAfterScaleOut `.
Please read the section `Example 3: Scale on each `workflow_job` event` in the updated version of our README for more information on its usage.
Sets the privileged flag to false if SELinuxOptions are present/defined. This is needed because containerd treats SELinux and Privileged controls as mutually exclusive. Also see https://github.com/containerd/cri/blob/aa2d5a97c/pkg/server/container_create.go#L164.
This allows users who use SELinux for managing privileged processes to use GH Actions - otherwise, based on the SELinux policy, the Docker in Docker container might not be privileged enough.
Signed-off-by: Jonah Back <jonah@jonahback.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
This allows using the `runtimeClassName` directive in the runner's spec.
One of the use-cases for this is Kata Containers, which use `runtimeClassName` in a pod spec as an indicator that the pod should run inside a Kata container. This allows us a greater degree of pod isolation.