Commit Graph

228 Commits

Author SHA1 Message Date
Yusuke Kuoka 4e7b8b57c0
edge: Enable scaling from zero with PercentageRunnersBusy (#524)
`PercentageRunnersBusy`, in combination with a secondary `TotalInProgressAndQueuedWorkflowRuns` metric, enables scale-from-zero for PercentageRunnersBusy.

Please see the new `Autoscaling to/from 0` section in the updated documentation about how it works.

Resolves #522
2021-05-05 14:27:17 +09:00
Yusuke Kuoka e00b3b9714
Make development cycle faster (#508)
Improves Makefile, acceptance/deploy.sh, acceptance/testdata/runnerdeploy.yaml, and the documentation to help developers and contributors.
2021-05-03 13:03:17 +09:00
Thejas N 588872a316
feat: allow ephemeral runner to be optional (#498)
- Adds `ephemeral` option to `runner.spec` 
    
    ```
      ....
      template:
         spec:
             ephemeral: false
             repository: mumoshu/actions-runner-controller-ci
      ....
    ```
- `ephemeral` defaults to `true`
- `entrypoint.sh` in runner/Dockerfile modified to read `RUNNER_EPHEMERAL` flag
- Runner images are backward-compatible. `--once` is omitted only when the new envvar `RUNNER_EPHEMERAL` is explicitly set to `false`.

Resolves #457
2021-05-02 19:04:14 +09:00
Yusuke Kuoka 0901456320
Update README with more detailed test instructions (#503)
- You can now use `make acceptance/run` to run only a specific acceptance test case
- Add note about Ubuntu 20.04 users / snap-provided docker
- Add instruction to run Ginkgo tests
- Extract acceptance/load from acceptance/kind
- Make `acceptance/pull` not depend on `docker-build`, so that you can do `make docker-build acceptance/load` for faster image reload
2021-05-02 16:31:07 +09:00
Yusuke Kuoka dbd7b486d2
feat: Support for scaling from/to zero (#465)
This is an attempt to support scaling from/to zero.

The basic idea is that we create a one-off "registration-only" runner pod on RunnerReplicaSet being scaled to zero, so that there is one "offline" runner, which enables GitHub Actions to queue jobs instead of discarding those.

GitHub Actions seems to immediately throw away the new job when there are no runners at all. Generally, having runners of any status, `busy`, `idle`, or `offline` would prevent GitHub actions from failing jobs. But retaining `busy` or `idle` runners means that we need to keep runner pods running, which conflicts with our desired to scale to/from zero, hence we retain `offline` runners.

In this change, I enhanced the runnerreplicaset controller to create a registration-only runner on very beginning of its reconciliation logic, only when a runnerreplicaset is scaled to zero. The runner controller creates the registration-only runner pod, waits for it to become "offline", and then removes the runner pod. The runner on GitHub stays `offline`, until the runner resource on K8s is deleted. As we remove the registration-only runner pod as soon as it registers, this doesn't block cluster-autoscaler.

Related to #447
2021-05-02 16:11:36 +09:00
ToMe25 ba175148c8
Locally build runner image instead of pulling it (#473)
* Fix acceptance helm test not using newly built controller image

* Locally build runner image instead of pulling it

* Revert runner controller image pull policy to always

and add a line to the test deployment to use IfNotPresent

* Change runner repository from summerwind/action-runner to the owner of actions-runner-controller.

Also fix some Makefile formatting.

* Undo renaming acceptance/pull to docker-pull

* Some env var cleanup

Rename USERNAME to DOCKER_USER(is still used for github too tho)
Add RUNNER_NAME var(defaults to $DOCKER_USER/actions-runner)
Add TEST_REPO(defaults to $DOCKER_USER/actions-runner-controller)
2021-05-01 15:10:57 +09:00
callum-tait-pbx 358146ee54
docs: adding note on cloud tooling (#492)
* docs: adding note on cloud tooling

* docs: better grammar
2021-04-30 10:20:01 +09:00
callum-tait-pbx 1ba4098648
docs: updating to reflect new ownership (#491) 2021-04-30 10:11:58 +09:00
callum-tait-pbx 05fb8569b3
docs: updating helm install command (#485) 2021-04-27 09:12:30 +09:00
Rolf Ahrenberg 81dd47a893
Document dockerMTU and dockerRegistryMirror (#482) 2021-04-26 09:52:09 +09:00
Rolf Ahrenberg 6b77a2a5a8
feat: Docker registry mirror (#478)
Changes:

- Switched to use `jq` in startup.sh
- Enable docker registry mirror configuration which is useful when e.g. avoiding the Docker Hub rate-limiting

Check #478 for how this feature is tested and supposed to be used.
2021-04-25 14:04:01 +09:00
callum-tait-pbx dc4cf3f57b
docs: better enterprise runner documentation (#477)
* docs: better Enterprise runner documentation

* docs: adding more detail
2021-04-25 13:33:47 +09:00
callum-tait-pbx aad2615487
docs: improved details on authentication (#472) 2021-04-23 09:42:29 +09:00
callum-tait-pbx 03d9b6a09f
docs: slightly better wording about support (#471) 2021-04-23 09:41:08 +09:00
callum-tait-pbx 5d280cc8c8
docs: adding scaling configuration detail (#469) 2021-04-23 09:40:23 +09:00
callum-tait-pbx 133c4fb21e
docs: clean up Enterprise and fsGroup docs (#460)
* docs: cleaning up Enterprise docs

* docs: better wording in various areas

* docs: improving enterprise runner docs

* docs: using American English

* docs: removing superfluous paragraph

* docs: improving grammar

* docs: better grammar

* docs: better wording

* docs: updated to reflect comments

* docs: spelling correction
2021-04-20 10:26:10 +09:00
callum-tait-pbx 3b2d2c052e
chore: adding Helm app version back (#412)
* chore: adding Helm app version back

* chore: removing redundant values entry

* chore: bumping to newer version

* chore: bumping app version to latest

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-04-18 13:58:54 +09:00
callum-tait-pbx 2eeb56d1c8
docs: removing superfluous title reference (#459) 2021-04-18 09:45:28 +09:00
ToMe25 a612b38f9b
Cache docker images in acceptance test (#463)
* Cache docker images locally

Cache dind, runner, and kube-rbac-proxy docker image on the host and copy onto the kind node instead of downloading it to the node directly.

* Also cache certmanager docker images
2021-04-18 09:44:59 +09:00
callum-tait-pbx 325c2cc385
docs: correct and simplify example (#450)
* docs: correct and simplify example

* docs: removing alternatives

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-04-17 17:08:57 +09:00
asoldino 7b44454d01
Add documentation of dockerVolumeMount (#453) 2021-04-17 17:04:38 +09:00
callum-tait-pbx f2680b2f2d
Bumping runner to Ubuntu 20.04 (#438)
Images for `actions-runner:v${VERSION}` and `actions-runner:latest` tags are upgraded to Ubuntu 20.04.

If you would like not to upgrade Ubuntu in the runner image in the future, migrate to new tags suffixed with `-ubuntu-20.04` like`actions-runner:v${VERSION}-ubuntu-20.04`.

We also keep publishing the existing Ubuntu 18.04 images with new `actions-runner:v${VERSION}-ubuntu-18.04` tags. Please use it when it turned out that you had workflows dependent on Ubuntu 18.04.

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-04-17 17:02:03 +09:00
Gabriel Dantas Gomes 0783ffe989
some readme typos (#423) 2021-03-29 10:08:21 +09:00
Balazs Gyurak 264cf494e3
Fix "pole" typo in README (#394)
I think these should be "poll".
2021-03-17 06:34:01 +09:00
callum-tait-pbx a6270b44d5
docs: fix typos and add PR link (#379)
* docs: fix typos and add PR link

* docs: changes based on feedback

* docs: fixing numbers in list

* docs: grammer

* docs: better wording
2021-03-12 08:52:34 +09:00
Rob Whitby dbda292f54
fix typo in examples (#373) 2021-03-08 09:18:10 +09:00
Mike Perry f220fefe92
Update README.md (#370) 2021-03-05 09:17:32 +09:00
callum-tait-pbx 9c7372a8e0
docs: styling fixes (#359)
* docs: styling fixes

* docs: grammer fixes
2021-03-01 09:44:35 +09:00
callum-tait-pbx f987571b64
Improve docs (#303) 2021-02-26 09:32:18 +09:00
Johannes Nicolai 6cce3fefc5
Add project to awesome-runners list (#319) 2021-02-17 09:14:42 +09:00
Johannes Nicolai 34c6c3d9cd
Pod eviction policy examples (crashed nodes) (#308)
* ... otherwise it will take 40 seconds (until a node is detected as unreachable) + 5 minutes (until pods are evicted from unreachable/crashed nodes)
* pods stuck in "Terminating" status on unreachable nodes will only be freed once #307 gets merged
2021-02-15 09:33:01 +09:00
Yusuke Kuoka ab1c39de57
feat: HorizontalRunnerAutoscaler Webhook server (#282)
* feat: HorizontalRunnerAutoscaler Webhook server

This introduces a Webhook server that responds GitHub `check_run`, `pull_request`, and `push` events by scaling up matched HorizontalRunnerAutoscaler by 1 replica. This allows you to immediately add "resource slack" for future GitHub Actions job runs, without waiting next sync period to add insufficient runners.

This feature is highly inspired by https://github.com/philips-labs/terraform-aws-github-runner. terraform-aws-github-runner can manage one set of runners per deployment, where actions-runner-controller with this feature can manage as many sets of runners as you declare with HorizontalRunnerAutoscaler and RunnerDeployment pairs.

On each GitHub event received, the webhook server queries repository-wide and organizational runners from the cluster and searches for the single target to scale up. The webhook server tries to match HorizontalRunnerAutoscaler.Spec.ScaleUpTriggers[].GitHubEvent.[CheckRun|Push|PullRequest] against the event and if it finds only one HRA, it is the scale target. If none or two or more targets are found for repository-wide runners, it does the same on organizational runners.

Changes:

* Fix integration test
* Update manifests
* chart: Add support for github webhook server
* dockerfile: Include github-webhook-server binary
* Do not import unversioned go-github
* Update README
2021-02-07 17:37:27 +09:00
Jesse Haka 28e80a2d28
Add support for enterprise runners (#290)
* Add support for enterprise runners

* update docs
2021-02-05 09:31:06 +09:00
Yusuke Kuoka a2690aa5cb Update README.md
Follow-up for #275
2021-01-29 09:29:26 +09:00
Clément da020df0fd
docs: fix install installation method (#275) 2021-01-29 09:28:34 +09:00
Johannes Nicolai 42493d5e01
Adding --name-space parameter in example (#259)
* when setting a GitHub Enterprise server URL without a namespace, an error occurs: "error: the server doesn't have a resource type "controller-manager"
* setting default namespace "actions-runner-system" makes the example work out of the box
2021-01-22 10:12:04 +09:00
Johannes Nicolai cbb41cbd18
Updating custom container example (#260)
* use latest instead of outdated version
* use sudo for package install (required)
* use sudo for package meta data removal (required)
2021-01-22 09:57:42 +09:00
Johannes Nicolai 64a1a58acf
GitHub runner groups have to be created first (#261)
* in contrast to runner labels, GitHub runner groups are not automatically created
2021-01-22 09:52:35 +09:00
ZacharyBenamram 0dadddfc7d
Update README for "PercentageRunnersBusy" HRA metric type (#237)
* adding readme for new hpa scheme

* callum's comments

Co-authored-by: Zachary Benamram <zacharybenamram@blend.com>
2020-12-17 10:21:27 +09:00
Yusuke Kuoka dfffd3fb62
feat: EKS IAM Roles for Service Accounts for Runner Pods (#226)
One of the pod recreation conditions has been modified to use hash of runner spec, so that the controller does not keep restarting pods mutated by admission webhooks. This naturally allows us, for example, to use IRSA for EKS that requires its admission webhook to mutate the runner pod to have additional, IRSA-related volumes, volume mounts and env.

Resolves #200
2020-12-08 17:56:06 +09:00
Reinier Timmer ee8fb5a388
parametrized working directory (#185)
* parametrized working directory

* manifests v3.0
2020-11-25 08:55:26 +09:00
Shinnosuke Sawada 6ce6737f61
add dockerEnabled document (#193)
Follow-up for #191
2020-11-17 09:31:34 +09:00
Yusuke Kuoka 3d531ffcdd
refactor/sync-period (#188)
* refactor: adding explicit sync-period flag

* docs: adding mroe detail for sync period config

* docs: spelling 🙃

Co-authored-by: Callum Tait <callum.tait@PBXUK-HH-05772.photobox.priv>
2020-11-14 22:07:22 +09:00
Callum Tait c623ce0765 docs: spelling 🙃 2020-11-14 12:32:14 +00:00
Callum Tait 04dde518f0 docs: adding mroe detail for sync period config 2020-11-14 11:22:09 +00:00
Yusuke Kuoka bbfe03f02b
Add acceptance test (#168)
To ease verifying the controller to work before submitting/merging PRs and releasing a new version of the controller.
2020-11-14 20:07:14 +09:00
Callum Tait 492ea4b583 docs: adding a common errors section 2020-11-13 08:38:25 +00:00
Callum Tait 3446d4d761 docs: reordering to be more logical 2020-11-13 08:27:18 +00:00
Javier de Pedro López 1f82032fe3
Improvements in the documentation for permissions and replication (#175) 2020-11-13 09:36:22 +09:00
Elliot Maincourt 5b2272d80a
Add read-only permissions to actions for being able to autoscale (#180) 2020-11-13 09:27:01 +09:00
Dan Webb dcf8524b5c
Adds RUNNER_GROUP argument to the runner registration (#157)
* Adds RUNNER_GROUP argument to the runner registration

Adds the ability to register a runner to a predefined runner_group

Resolves #137

* Update README with runner group example

- Updates the README with instructions of how to add the runner to a
  group
- Fix code fencing for shell and yaml blocks in the README
- Use consistent bullet points (dash not asterisk)
2020-11-10 17:15:54 +09:00
Juho Saarinen 40c5050978
Added support for other than public GitHub URL (#146)
Refactoring a bit
2020-10-28 22:15:53 +09:00
Yusuke Kuoka faaca10fba
Rename Runner.Spec.dockerWithinRunnerContainer to docker"d"WithinRunnerContainer (#134)
* Rename Runner.Spec.dockerWithinRunnerContainer to dockerdWithinRunnerContainer

Ref https://github.com/summerwind/actions-runner-controller/pull/126#issuecomment-712501790
2020-10-21 21:32:40 +09:00
Juho Saarinen 92920926fe
Configurable "runner and DinD in a single container" (#126) 2020-10-20 08:48:28 +09:00
Yusuke Kuoka be2e61f209
Add "alternatives" section to README (#124)
Grow together :)
2020-10-15 08:40:02 +09:00
John Wiebalk acb1700b7c
Fix a couple typos in readme (#107) 2020-10-05 08:59:34 +09:00
Moto Ishizawa e10637ce35
Merge pull request #66 from summerwind/org-runner-autoscale
feat: Organizational RunnerDeployment Autoscaling
2020-07-28 19:17:18 +09:00
Yusuke Kuoka ae30648985 feat: Use HorizontalRunnerAutoscaler for autoscaling 2020-07-27 20:33:44 +09:00
David Liao c0914743b0 add config to respect image pull policy 2020-07-08 23:53:52 -07:00
KUOKA Yusuke 5bb2694349
feat: Repository-wide RunnerDeployment Autoscaling (#57)
* feat: Repository-wide RunnerDeployment Autoscaling

This adds `maxReplicas` and `minReplicas` to the RunnerDeploymentSpec. If and only if both fields are set, the controller computes and sets desired `replicas` automatically depending on the demand.

The number of demanded runner replicas is computed by `queued workflow runs + in_progress workflow runs` for the repository. The support for organizational runners is not included.

Ref https://github.com/summerwind/actions-runner-controller/issues/10
2020-06-27 17:26:46 +09:00
Vitali Raikov 905ed18824
Add an extra permission for organization self hosted runners 2020-06-02 16:16:26 +03:00
Moto Ishizawa 390f2a62d9
Merge pull request #50 from summerwind/runner-validation-webhook
Add validation webhooks
2020-05-08 22:39:13 +09:00
naka-gawa 1555651325 fix typo readme 2020-05-06 12:25:30 +09:00
Reinier Timmer 79655989d0 Add additional software packages to runner image 2020-05-04 06:55:16 +00:00
Moto Ishizawa 55323c3754 Update installation section 2020-05-02 16:57:19 +09:00
Reinier Timmer 966e0dca37 fix merge conflict on README 2020-04-28 11:24:59 +02:00
Reinier Timmer 8c42b317ec updated documentation 2020-04-28 11:15:41 +02:00
Varun Priolkar df5eed52d0
Specify language in additional tweaks code block 2020-04-23 12:57:56 +05:30
Varun Priolkar 5f271a3050
Fix code block on additional tweak section 2020-04-23 12:57:15 +05:30
Varun Priolkar 6997cc97c6
Improve readme 2020-04-23 12:53:46 +05:30
Moto Ishizawa 0bb6f64470 Add section to use GitHub App 2020-04-10 15:42:27 +09:00
Adam Jensen 934ec7f181
Clarify instructions for getting a token (#18)
* Clarify instructions for getting a token

* Fix typo
2020-03-25 21:22:19 +09:00
Jorge Arco d3aa21f583
Readme fix requirements 2020-03-17 11:48:41 +01:00
Yusuke Kuoka c19a1b3ffe Rename RunnerSet to RunnerReplicaSet
To hand over the name `RunnerSet` to the new StatefulSet-based implementation of that being developed at #4
2020-03-10 09:14:34 +09:00
Yusuke Kuoka d12eca268d Fix validation error on nil for optional slice field (runner.spec.env)
I had observed athe exact issue seen for the 4th option described in https://github.com/elastic/cloud-on-k8s/issues/1822, which resulted in actions-runner-controller is unable to create nor update runners. This fixes that.

I've also updated README to introduce RunnerDeployment and manually tested it to work after the fix.

---

`actions-runner-controller` has been failing while creating and updating runners:

```
2020-03-05T11:05:16.610+0900    ERROR   controllers.Runner      Failed to update runner {"runner": "default/example-runner", "error": "Runner.actions.summerwind.dev \"example-runner\" is invalid: []: Invalid value: map[string]interface {}{\"apiVersion\":\"actions.summerwind.dev/v1alpha1\", \"kind\":\"Runner\", \"metadata\":map[string]interface {}{\"creationTimestamp\":\"2020-03-05T02:05:16Z\", \"finalizers\":[]interface {}{\"runner.actions.summerwind.dev\"}, \"generation\":2, \"name\":\"example-runner\", \"namespace\":\"default\", \"resourceVersion\":\"911496\", \"selfLink\":\"/apis/actions.summerwind.dev/v1alpha1/namespaces/default/runners/example-runner\", \"uid\":\"48b62d07-ff2c-42d6-878c-d3f951202209\"}, \"spec\":map[string]interface {}{\"env\":interface {}(nil), \"image\":\"\", \"repository\":\"mumoshu/actions-runner-controller-ci\"}}: validation failure list:\nspec.env in body must be of type array: \"null\""}
github.com/go-logr/zapr.(*zapLogger).Error
        /Users/c-ykuoka/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
github.com/summerwind/actions-runner-controller/controllers.(*RunnerReconciler).Reconcile
        /Users/c-ykuoka/p/actions-runner-controller/controllers/runner_controller.go:88
```

This seems like the exact issue seen in the 4th option in https://github.com/elastic/cloud-on-k8s/issues/1822

I also observed the same issue is failing while the runnerset controller is trying to create/update runners:

```

Also while creating runner in the runnerset controller:

2020-03-05T11:15:01.223+0900    ERROR   controller-runtime.controller   Reconciler error        {"controller": "runnerset", "request": "default/example-runnerset", "error": "Runner.actions.summerwind.dev \"example-runnersetgp56m\" is invalid: []: Invalid value: map[string]interface {}{\"apiVersion\":\"actions.summerwind.dev/v1alpha1\", \"kind\":\"Runner\", \"metadata\":map[string]interface {}{\"creationTimestamp\":\"2020-03-05T02:15:01Z\", \"generateName\":\"example-runnerset\", \"generation\":1, \"name\":\"example-runnersetgp56m\", \"namespace\":\"default\", \"ownerReferences\":[]interface {}{map[string]interface {}{\"apiVersion\":\"actions.summerwind.dev/v1alpha1\", \"blockOwnerDeletion\":true, \"controller\":true, \"kind\":\"RunnerSet\", \"name\":\"example-runnerset\", \"uid\":\"e26f7d01-3168-496d-931b-8e6f97b776ea\"}}, \"uid\":\"4ee490f5-9a8c-4f30-86f9-61dea799b972\"}, \"spec\":map[string]interface {}{\"env\":interface {}(nil), \"image\":\"\", \"repository\":\"mumoshu/actions-runner-controller-ci\"}}: validation failure list:\nspec.env in body must be of type array: \"null\""}
github.com/go-logr/zapr.(*zapLogger).Error
```

and while the runnerdeployment controller is trying to create/update runners.

I've fixed it so that the new `RunnerDeployment` example added to README just works.
2020-03-09 22:03:07 +09:00
Moto Ishizawa f5c8a0e655 Update README 2020-02-03 21:52:28 +09:00
Moto Ishizawa 6d8f833d4f Update README 2020-02-03 10:10:40 +09:00
Moto Ishizawa 1cd3bdbe25 Add README.md 2020-01-29 23:19:23 +09:00