Commit Graph

39 Commits

Author SHA1 Message Date
Yusuke Kuoka 94c089c407
Revert docker.sock path to /var/run/docker.sock (#2536)
Starting ARC v0.27.2, we've changed the `docker.sock` path from `/var/run/docker.sock` to `/var/run/docker/docker.sock`. That resulted in breaking some container-based actions due to the hard-coded `docker.sock` path in various places.

Even `actions/runner` seem to use `/var/run/docker.sock` for building container-based actions and for service containers?

Anyway, this fixes that by moving the sock file back to the previous location.

Once this gets merged, users stuck at ARC v0.27.1, previously upgraded to 0.27.2 or 0.27.3 and reverted back to v0.27.1 due to #2519, should be able to upgrade to the upcoming v0.27.4.

Resolves #2519
Resolves #2538
2023-04-27 13:06:35 +09:00
Nikola Jokic aa6dab5a9a
Changes to folder structure to allow multigroups and changed go mod name (#2105)
* Changed folder structure to allow multi group registration

* included actions.github.com directory for resources and controllers

* updated go module to actions/actions-runner-controller

* publish arc packages under actions-runner-controller

* Update charts/actions-runner-controller/docs/UPGRADING.md

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-12-28 09:38:34 +09:00
Yusuke Kuoka 3ae9f09532
e2e: Do honor the runner graceful stop timeout also in the dockerd sidecar prestop hook (#2044)
The runner graceful stop timeout has never been propagated to the dind sidecar due to configuration error in E2E. This fixes it, so that we can verify that the dind sidecar prestop can respect the graceful stop timeout.

Related to #1759
2022-11-27 11:13:56 +09:00
Yusuke Kuoka 154fcde7d0
runner: Make WAIT_FOR_DOCKER_SECONDS configurable and working (#1999)
* runner: Make WAIT_FOR_DOCKER_SECONDS configurable and working

Ref #1830
Ref #1804

* Update acceptance/testdata/runnerdeploy.envsubst.yaml

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>

* Update docs/detailed-docs.md

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-11-22 12:08:54 +09:00
Claudio Vellage 3b36a81db6
Allow to set docker default address pool (#1971)
* Allow to set docker default address pool

* fixup! Allow to set docker default address pool

Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com>

* Revert unnecessary chart ver bump

* Update docs for DOCKER_DEFAULT_ADDRESS_POOL_*

* Fix the dockerd default address pool scripts to actually work as probably intended

* Update the E2E testdata runnerdeployment to accomodate the new docker default addr pool options

* Correct default dockerd addr pool doc

Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com>
Co-authored-by: Claudio Vellage <claudio.vellage@pm.me>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-11-05 14:46:32 +09:00
Yusuke Kuoka c74ad6195f
Fix runners to do their best to gracefully stop on pod eviction (#1759)
Ref #1535
Ref #1581

Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-11-01 20:30:10 +09:00
Yusuke Kuoka e4879e7ae4 Tweak E2E and documentation about MTU configuration 2022-09-25 07:50:12 +09:00
Yusuke Kuoka f8e07c7fe4 e2e: Update RunnerSet template for rootless-dind test 2022-08-27 07:12:55 +00:00
Yusuke Kuoka ebcd838501 e2e: Continuous rolling-update of runners while workflow jobs are running
This should help revealing issues like https://github.com/actions-runner-controller/actions-runner-controller/issues/1535 if any.
2022-08-26 01:28:08 +00:00
Yusuke Kuoka 6ef276b239 e2e: Custom RBAC resources for make test success reporting work when k8s container mode or runner update hook is enabled 2022-08-26 01:28:08 +00:00
Yusuke Kuoka ea94b3cc5b
e2e: Add new option to test rootless docker (#1742)
Related to #1644

Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com>

Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-08-24 10:42:45 +09:00
Yusuke Kuoka b77489d098
Fix E2E to not fail due to missing storageclass for RunnerDeployment w/ kubernetes container mode (#1649) 2022-07-17 19:43:13 +09:00
Yusuke Kuoka 7e4b6ebd6d chart: Add rbac.allowGrantingKubernetesContainerModePermissions 2022-07-10 16:16:32 +09:00
Yusuke Kuoka b5d1a63bdf
Enhance the acceptance runnerset yaml template for manual E2E (#1587)
The primary goal of this change is to let the tester know about the config difference between the explicitly configured ephemeral work volume vs the automatically configured work volume with workVolumeClaimTemplate+containerMode=kubernetes.
2022-06-29 22:15:50 +09:00
Yusuke Kuoka 63be0223ad
fix: Avoid duplicate volume and mount name error for generic ephemeral volume as "work" (#1471)
* fix: Avoid duplicate volume and mount name error for generic ephemeral volume as "work"

While manually testing configurations being documented in #1464, I discovered that the use of dynamic ephemeral volume for "work" directory was not working correctly due to the valiadation error.

This fixes the runner pod generation logic to not add the default volume and volume mount for "work" dir, so that the error disappears.

Ref #1464

* e2e: Ensure work generic ephemeral volume to work as expected
2022-05-22 10:25:50 +09:00
Yusuke Kuoka b5194fd75a
Enhance RunnerSet to optionally retain PVs accross restarts (#1340)
* Enhance RunnerSet to optionally retain PVs accross restarts

This is our initial attempt to bring back the ability to retain PVs across runner pod restarts when using RunnerSet.
The implementation is composed of two new controllers, `runnerpersistentvolumeclaim-controller` and `runnerpersistentvolume-controller`.
It all starts from our existing `runnerset-controller`. The controller now tries to mark any PVCs created by StatefulSets created for the RunnerSet.
Once the controller terminated statefulsets, their corresponding PVCs are clean up by `runnerpersistentvolumeclaim-controller`, then PVs are unbound from their corresponding PVCs by `runnerpersistentvolume-controller` so that they can be reused by future PVCs createf for future StatefulSets that shares the same same StorageClass.

Ref #1286

* Update E2E test suite to cover runner, docker, and go caching with RunnerSet + PVs

Ref #1286
2022-05-16 09:26:48 +09:00
Yusuke Kuoka 3b67ee727f
e2e: Fix wrong scale trigger configuration used in test (#1434) 2022-05-12 09:19:58 +09:00
Yusuke Kuoka dabbc99c78
refactor(controller): stop auto-setting RUNNER_FEATURE_FLAG_EPHEMERAL (#1385)
This feature flag was provided from ARC to runner container automatically to let it use `--ephemeral` instead of `--once` by default. As the support for `--once` is being dropped from the runner image via #1384, we no longer need that.

Ref #1196
2022-05-11 11:42:55 +01:00
Yusuke Kuoka c4b24f8366 Prevent static runners from terminating due to unregister timeout
The unregister timeout of 1 minute (no matter how long it is) can negatively impact availability of static runner constantly running workflow jobs, and ephemeral runner that runs a long-running job.
We deal with that by completely removing the unregistaration timeout, so that regarldess of the type of runner(static or ephemeral) it waits forever until it successfully to get unregistered before being terminated.
2022-03-13 07:26:36 +00:00
Yusuke Kuoka 14a878bfae refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet 2022-03-06 05:53:26 +00:00
Yusuke Kuoka 3115d71471 acceptance,e2e: Enhance deploy.sh to support more types of runnersets 2022-03-02 19:03:20 +09:00
Yusuke Kuoka 1463d4927f acceptance,e2e: Let capacity reservation expired more later 2022-02-21 00:07:49 +00:00
Yusuke Kuoka d4a9750e20 acceptance,e2e: Enhance E2E test and deploy.sh to support scaleDownDelaySeconds~ and minReplicas for HRA 2022-02-20 13:45:42 +00:00
Yusuke Kuoka 4e6bfd8114 e2e: Add ability to toggle dockerdWithinRunnerContainer 2022-02-20 04:37:15 +00:00
Yusuke Kuoka ba4bd7c0db
e2e,acceptance: Cover enterprise runners (#1124)
Adds various code and changes I have used while testing #1062
2022-02-17 09:16:28 +09:00
Yusuke Kuoka fabead8c8e
feat: Workflow job based ephemeral runner scaling (#721)
This add support for two upcoming enhancements on the GitHub side of self-hosted runners, ephemeral runners, and `workflow_jow` events. You can't use these yet.

**These features are not yet generally available to all GitHub users**. Please take this pull request as a preparation to make it available to actions-runner-controller users as soon as possible after GitHub released the necessary features on their end.

**Ephemeral runners**:

The former, ephemeral runners, is basically the reliable alternative to `--once`, which we've been using when you enabled `ephemeral: true` (default in actions-runner-controller).

`--once` has been suffering from a race issue #466. `--ephemeral` fixes that.

To enable ephemeral runners with `actions/runner`, you give `--ephemeral` to `config.sh`. This updated version of `actions-runner-controller` does it for you, by using `--ephemeral` instead of `--once` when you set `RUNNER_FEATURE_FLAG_EPHEMERAL=true`.

Please read the section `Ephemeral Runners` in the updated version of our README for more information.

Note that ephemeral runners is not released on GitHub yet. And `RUNNER_FEATURE_FLAG_EPHEMERAL=true` won't work at all until the feature gets released on GitHub. Stay tuned for an announcement from GitHub!

**`workflow_job` events**:

`workflow_job` is the additional webhook event that corresponds to each GitHub Actions workflow job run. It provides `actions-runner-controller` a solid foundation to improve our webhook-based autoscale.

Formerly, we've been exploiting webhook events like `check_run` for autoscaling. However, as none of our supported events has included `labels`, you had to configure an HRA to only match relevant `check_run` events. It wasn't trivial.

In contrast, a `workflow_job` event payload contains `labels` of runners requested. `actions-runner-controller` is able to automatically decide which HRA to scale by filtering the corresponding RunnerDeployment by `labels` included in the webhook payload. So all you need to use webhook-based autoscale will be to enable `workflow_job` on GitHub and expose actions-runner-controller's webhook server to the internet.

Note that the current implementation of `workflow_job` support works in two ways, increment, and decrement. An increment happens when the webhook server receives` workflow_job` of `queued` status. A decrement happens when it receives `workflow_job` of `completed` status. The latter is used to make scaling-down faster so that you waste money less than before. You still don't suffer from flapping, as a scale-down is still subject to `scaleDownDelaySecondsAfterScaleOut `.

Please read the section `Example 3: Scale on each `workflow_job` event` in the updated version of our README for more information on its usage.
2021-08-11 09:52:04 +09:00
Yusuke Kuoka 7a305d2892
e2e: Install and run workflow and verify the result (#661)
This enhances the E2E test suite introduced in #658 to also include the following steps:

- Install GitHub Actions workflow
- Trigger a workflow run via a git commit
- Verify the workflow run result

In the workflow, we use `kubectl create cm --from-literal` to create a configmap that contains an unique test ID. In the last step we obtain the configmap from within the E2E test and check the test ID to match the expected one.

To install a GitHub Actions workflow, we clone a GitHub repository denoted by the TEST_REPO envvar, progmatically generate a few files with some Go code, run `git-add`, `git-commit`, and then `git-push` to actually push the files to the repository. A single commit containing an updated workflow definition and an updated file seems to run a workflow derived to the definition introduced in the commit, which was a bit surpirising and useful behaviour.

At this point, the E2E test fully covers all the steps for a GitHub token based installation. We need to add scenarios for more deployment options, like GitHub App, RunnerDeployment, HRA, and so on. But each of them would worth another pull request.
2021-06-28 08:30:32 +09:00
Yusuke Kuoka acb906164b
RunnerSet: Automatic-recovery from registration timeout and deregistration on pod termination (#652)
Ref #629
Ref #613
Ref #612
2021-06-24 20:39:37 +09:00
Yusuke Kuoka 98da4c2adb
Add HRA support for RunnerSet (#647)
`HRA.Spec.ScaleTargetRef.Kind` is added to denote that the scale-target is a RunnerSet.

It defaults to `RunnerDeployment` for backward compatibility.

```
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: myhra
spec:
  scaleTargetRef:
    kind: RunnerSet
    name: myrunnerset
```

Ref #629
Ref #613
Ref #612
2021-06-23 20:25:03 +09:00
Yusuke Kuoka 9e4dbf497c
feat: RunnerSet backed by StatefulSet (#629)
* feat: RunnerSet backed by StatefulSet

Unlike a runner deployment, a runner set can manage a set of stateful runners by combining a statefulset and an admission webhook that mutates statefulset-managed pods with required envvars and registration tokens.

Resolves #613
Ref #612

* Upgrade controller-runtime to 0.9.0

* Bump Go to 1.16.x following controller-runtime 0.9.0

* Upgrade kubebuilder to 2.3.2 for updated etcd and apiserver following local setup

* Fix startup failure due to missing LeaderElectionID

* Fix the issue that any pods become unable to start once actions-runner-controller got failed after the mutating webhook has been registered

* Allow force-updating statefulset

* Fix runner container missing work and certs-client volume mounts and DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false

* Fix runnerset-controller not applying statefulset.spec.template.spec changes when there were no changes in runnerset spec

* Enable running acceptance tests against arbitrary kind cluster

* RunnerSet supports non-ephemeral runners only today

* fix: docker-build from root Makefile on intel mac

* fix: arch check fixes for mac and ARM

* ci: aligning test data format and patching checks

* fix: removing namespace in test data

* chore: adding more ignores

* chore: removing leading space in shebang

* Re-add metrics to org hra testdata

* Bump cert-manager to v1.1.1 and fix deploy.sh

Co-authored-by: toast-gear <15716903+toast-gear@users.noreply.github.com>
Co-authored-by: Callum James Tait <callum.tait@photobox.com>
2021-06-22 17:10:09 +09:00
Yusuke Kuoka 3cd124dce3
chore: Add debug logs for scheduledOverrides (#540)
Follow-up for #515
Ref #484
2021-05-11 17:30:22 +09:00
Yusuke Kuoka 0510f19607 chore: Enhance acceptance test to cover webhook-based autoscaling for repo and org runners
Adds what I used while verifying #534
2021-05-11 15:36:02 +09:00
Yusuke Kuoka e00b3b9714
Make development cycle faster (#508)
Improves Makefile, acceptance/deploy.sh, acceptance/testdata/runnerdeploy.yaml, and the documentation to help developers and contributors.
2021-05-03 13:03:17 +09:00
Yusuke Kuoka dbd7b486d2
feat: Support for scaling from/to zero (#465)
This is an attempt to support scaling from/to zero.

The basic idea is that we create a one-off "registration-only" runner pod on RunnerReplicaSet being scaled to zero, so that there is one "offline" runner, which enables GitHub Actions to queue jobs instead of discarding those.

GitHub Actions seems to immediately throw away the new job when there are no runners at all. Generally, having runners of any status, `busy`, `idle`, or `offline` would prevent GitHub actions from failing jobs. But retaining `busy` or `idle` runners means that we need to keep runner pods running, which conflicts with our desired to scale to/from zero, hence we retain `offline` runners.

In this change, I enhanced the runnerreplicaset controller to create a registration-only runner on very beginning of its reconciliation logic, only when a runnerreplicaset is scaled to zero. The runner controller creates the registration-only runner pod, waits for it to become "offline", and then removes the runner pod. The runner on GitHub stays `offline`, until the runner resource on K8s is deleted. As we remove the registration-only runner pod as soon as it registers, this doesn't block cluster-autoscaler.

Related to #447
2021-05-02 16:11:36 +09:00
ToMe25 ba175148c8
Locally build runner image instead of pulling it (#473)
* Fix acceptance helm test not using newly built controller image

* Locally build runner image instead of pulling it

* Revert runner controller image pull policy to always

and add a line to the test deployment to use IfNotPresent

* Change runner repository from summerwind/action-runner to the owner of actions-runner-controller.

Also fix some Makefile formatting.

* Undo renaming acceptance/pull to docker-pull

* Some env var cleanup

Rename USERNAME to DOCKER_USER(is still used for github too tho)
Add RUNNER_NAME var(defaults to $DOCKER_USER/actions-runner)
Add TEST_REPO(defaults to $DOCKER_USER/actions-runner-controller)
2021-05-01 15:10:57 +09:00
callum-tait-pbx f2680b2f2d
Bumping runner to Ubuntu 20.04 (#438)
Images for `actions-runner:v${VERSION}` and `actions-runner:latest` tags are upgraded to Ubuntu 20.04.

If you would like not to upgrade Ubuntu in the runner image in the future, migrate to new tags suffixed with `-ubuntu-20.04` like`actions-runner:v${VERSION}-ubuntu-20.04`.

We also keep publishing the existing Ubuntu 18.04 images with new `actions-runner:v${VERSION}-ubuntu-18.04` tags. Please use it when it turned out that you had workflows dependent on Ubuntu 18.04.

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-04-17 17:02:03 +09:00
Yusuke Kuoka 156e2c1987
Fix MTU configuration for dockerd (#421)
Resolves #393
2021-03-31 09:29:21 +09:00
Yusuke Kuoka 1658f51fcb
Make Runner{Deployment,ReplicaSet} replicas actually optional (#186)
If omitted, it properly defaults to 1.

Fixes #64
2020-11-14 22:06:33 +09:00
Yusuke Kuoka bbfe03f02b
Add acceptance test (#168)
To ease verifying the controller to work before submitting/merging PRs and releasing a new version of the controller.
2020-11-14 20:07:14 +09:00