Commit Graph

431 Commits

Author SHA1 Message Date
Thejas N 588872a316
feat: allow ephemeral runner to be optional (#498)
- Adds `ephemeral` option to `runner.spec` 
    
    ```
      ....
      template:
         spec:
             ephemeral: false
             repository: mumoshu/actions-runner-controller-ci
      ....
    ```
- `ephemeral` defaults to `true`
- `entrypoint.sh` in runner/Dockerfile modified to read `RUNNER_EPHEMERAL` flag
- Runner images are backward-compatible. `--once` is omitted only when the new envvar `RUNNER_EPHEMERAL` is explicitly set to `false`.

Resolves #457
2021-05-02 19:04:14 +09:00
Yusuke Kuoka a0feee257f
Add .dockerignore for controller to accelerate image rebuild in local dev env (#504)
Previously any non-go changes resulted in `make docker-build` rerunning time-consufming `go build`. This fixes that by adding clearly unnecessary files .dockerignore
2021-05-02 16:47:07 +09:00
Christoph Brand a18ac330bb
feature(controller): allow autoscaler to scale down to 0 (#447) 2021-05-02 16:46:51 +09:00
Yusuke Kuoka 0901456320
Update README with more detailed test instructions (#503)
- You can now use `make acceptance/run` to run only a specific acceptance test case
- Add note about Ubuntu 20.04 users / snap-provided docker
- Add instruction to run Ginkgo tests
- Extract acceptance/load from acceptance/kind
- Make `acceptance/pull` not depend on `docker-build`, so that you can do `make docker-build acceptance/load` for faster image reload
2021-05-02 16:31:07 +09:00
Yusuke Kuoka dbd7b486d2
feat: Support for scaling from/to zero (#465)
This is an attempt to support scaling from/to zero.

The basic idea is that we create a one-off "registration-only" runner pod on RunnerReplicaSet being scaled to zero, so that there is one "offline" runner, which enables GitHub Actions to queue jobs instead of discarding those.

GitHub Actions seems to immediately throw away the new job when there are no runners at all. Generally, having runners of any status, `busy`, `idle`, or `offline` would prevent GitHub actions from failing jobs. But retaining `busy` or `idle` runners means that we need to keep runner pods running, which conflicts with our desired to scale to/from zero, hence we retain `offline` runners.

In this change, I enhanced the runnerreplicaset controller to create a registration-only runner on very beginning of its reconciliation logic, only when a runnerreplicaset is scaled to zero. The runner controller creates the registration-only runner pod, waits for it to become "offline", and then removes the runner pod. The runner on GitHub stays `offline`, until the runner resource on K8s is deleted. As we remove the registration-only runner pod as soon as it registers, this doesn't block cluster-autoscaler.

Related to #447
2021-05-02 16:11:36 +09:00
callum-tait-pbx 7e766282aa
ci: updating paths-ignore (#496)
* chore: updating paths-ignore

* chore: adding more path-ignores
2021-05-01 21:36:45 +09:00
ToMe25 ba175148c8
Locally build runner image instead of pulling it (#473)
* Fix acceptance helm test not using newly built controller image

* Locally build runner image instead of pulling it

* Revert runner controller image pull policy to always

and add a line to the test deployment to use IfNotPresent

* Change runner repository from summerwind/action-runner to the owner of actions-runner-controller.

Also fix some Makefile formatting.

* Undo renaming acceptance/pull to docker-pull

* Some env var cleanup

Rename USERNAME to DOCKER_USER(is still used for github too tho)
Add RUNNER_NAME var(defaults to $DOCKER_USER/actions-runner)
Add TEST_REPO(defaults to $DOCKER_USER/actions-runner-controller)
2021-05-01 15:10:57 +09:00
callum-tait-pbx 358146ee54
docs: adding note on cloud tooling (#492)
* docs: adding note on cloud tooling

* docs: better grammar
2021-04-30 10:20:01 +09:00
callum-tait-pbx e9dd16b023
chore: adding stale config (#487)
* chore: adding stale config

* chore: adding more labels

* chore: adding more exempt labels
2021-04-30 10:14:13 +09:00
callum-tait-pbx 1ba4098648
docs: updating to reflect new ownership (#491) 2021-04-30 10:11:58 +09:00
callum-tait-pbx 05fb8569b3
docs: updating helm install command (#485) 2021-04-27 09:12:30 +09:00
callum-tait-pbx db45a375d0
chore: bump runner (#486)
* chore: bump runner

* chore: bumper runner in ci
2021-04-27 08:38:40 +09:00
Rolf Ahrenberg 81dd47a893
Document dockerMTU and dockerRegistryMirror (#482) 2021-04-26 09:52:09 +09:00
Rolf Ahrenberg 6b77a2a5a8
feat: Docker registry mirror (#478)
Changes:

- Switched to use `jq` in startup.sh
- Enable docker registry mirror configuration which is useful when e.g. avoiding the Docker Hub rate-limiting

Check #478 for how this feature is tested and supposed to be used.
2021-04-25 14:04:01 +09:00
callum-tait-pbx dc4cf3f57b
docs: better enterprise runner documentation (#477)
* docs: better Enterprise runner documentation

* docs: adding more detail
2021-04-25 13:33:47 +09:00
Yusuke Kuoka d810b579a5
Update RELEASE_NOTE_TEMPLATE.md 2021-04-25 13:02:15 +09:00
Yusuke Kuoka 47c8de9dc3
Rename RELEASE_NOTE_TEMPLATE to RELEASE_NOTE_TEMPLATE.md 2021-04-25 13:01:20 +09:00
Yusuke Kuoka 74a53bde5e
Add release note template (#481)
So that everyone can contribute enhancements and fixes to the release notes structure :)
2021-04-25 13:00:25 +09:00
callum-tait-pbx aad2615487
docs: improved details on authentication (#472) 2021-04-23 09:42:29 +09:00
callum-tait-pbx 03d9b6a09f
docs: slightly better wording about support (#471) 2021-04-23 09:41:08 +09:00
callum-tait-pbx 5d280cc8c8
docs: adding scaling configuration detail (#469) 2021-04-23 09:40:23 +09:00
callum-tait-pbx 133c4fb21e
docs: clean up Enterprise and fsGroup docs (#460)
* docs: cleaning up Enterprise docs

* docs: better wording in various areas

* docs: improving enterprise runner docs

* docs: using American English

* docs: removing superfluous paragraph

* docs: improving grammar

* docs: better grammar

* docs: better wording

* docs: updated to reflect comments

* docs: spelling correction
2021-04-20 10:26:10 +09:00
callum-tait-pbx 3b2d2c052e
chore: adding Helm app version back (#412)
* chore: adding Helm app version back

* chore: removing redundant values entry

* chore: bumping to newer version

* chore: bumping app version to latest

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-04-18 13:58:54 +09:00
Manuel Jurado 37c2a62fa8
Allow to configure runner volume size limit (#436)
Enable the user to set a limit size on the volume of the runner to avoid some runner pod affecting other resources of the same cluster

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-04-18 13:56:59 +09:00
callum-tait-pbx 2eeb56d1c8
docs: removing superfluous title reference (#459) 2021-04-18 09:45:28 +09:00
ToMe25 a612b38f9b
Cache docker images in acceptance test (#463)
* Cache docker images locally

Cache dind, runner, and kube-rbac-proxy docker image on the host and copy onto the kind node instead of downloading it to the node directly.

* Also cache certmanager docker images
2021-04-18 09:44:59 +09:00
callum-tait-pbx 1c67ea65d9
ci: fix latest tag push logic (#462)
* ci: fix latest tag push logic

* ci: use better job names
2021-04-18 09:41:22 +09:00
ToMe25 c26fb5ad5f
Make acceptance use local docker image (#448)
load the local docker image to the kind cluster instead of pushing it to dockerhub and pulling it from there
2021-04-17 17:13:47 +09:00
callum-tait-pbx 325c2cc385
docs: correct and simplify example (#450)
* docs: correct and simplify example

* docs: removing alternatives

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-04-17 17:08:57 +09:00
Agoney Garcia-Deniz 2e551c9d0a
Add hostAliases to the runner spec (#456) 2021-04-17 17:04:52 +09:00
asoldino 7b44454d01
Add documentation of dockerVolumeMount (#453) 2021-04-17 17:04:38 +09:00
callum-tait-pbx f2680b2f2d
Bumping runner to Ubuntu 20.04 (#438)
Images for `actions-runner:v${VERSION}` and `actions-runner:latest` tags are upgraded to Ubuntu 20.04.

If you would like not to upgrade Ubuntu in the runner image in the future, migrate to new tags suffixed with `-ubuntu-20.04` like`actions-runner:v${VERSION}-ubuntu-20.04`.

We also keep publishing the existing Ubuntu 18.04 images with new `actions-runner:v${VERSION}-ubuntu-18.04` tags. Please use it when it turned out that you had workflows dependent on Ubuntu 18.04.

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-04-17 17:02:03 +09:00
asoldino b42b8406a2
Add dockerVolumeMounts (#439)
Resolves #435
2021-04-06 10:10:10 +09:00
Javi Polo 3c125e2191
Fix helm webhook ingress error: `spec.rules[0].http.paths[0].backend: Required value: port name or number is required` (#437) 2021-04-02 06:34:45 +09:00
Christoph Brand 9ed245c85e
feature(controller): remove dockerd executable (#432) 2021-04-01 08:50:48 +09:00
Florian Braun 5b7807d54b
Quote vars in entrypoint.sh to prevent unwanted argument split (#420)
Prevents arguments from being split when e.g. the RUNNER_GROUP variable contains spaces (which is legit. One can create such groups in GitHub).

I've seen that all workers with group names that contain no spaces can register successfully, while all workers with groups that contain spaces will not register.

Furthermore, I suppose also other chars can be used here to inject arbitrary commands in an unsupported way via e.g. pipe symbol.

Quoting the vars correctly should prevent that and allow for e.g. group names and runner labels with spaces and other bash reserved characters.
2021-03-31 10:09:08 +09:00
Yusuke Kuoka 156e2c1987
Fix MTU configuration for dockerd (#421)
Resolves #393
2021-03-31 09:29:21 +09:00
Yusuke Kuoka da4dfb3fdf
Add make target `test-with-deps` to ease setting up dependent binaries (#426) 2021-03-31 09:23:16 +09:00
Gabriel Dantas Gomes 0783ffe989
some readme typos (#423) 2021-03-29 10:08:21 +09:00
Yusuke Kuoka 374105c1f3
Fix dindWithinRunnerContainer not to crash-loop runner pods (#419)
Apparently #253 broke dindWithinRunnerContainer completely due to the difference in how /runner volume is set up.
2021-03-25 10:23:36 +09:00
Yusuke Kuoka bc6e499e4f
Make logging more concise (#410)
This makes logging more concise by changing logger names to something like `controllers.Runner` to `actions-runner-controller.runner` after the standard `controller-rutime.controller` and reducing redundant logs by removing unnecessary requeues. I have also tweaked log messages so that their style is more consistent, which will also help readability. Also, runnerreplicaset-controller lacked useful logs so I have enhanced it.
2021-03-20 07:34:25 +09:00
Yusuke Kuoka 07f822bb08
Do include Runner controller in integration test (#409)
So that we could catch bugs in runner controller like seen in #398, #404, and #407.

Ref #400
2021-03-19 16:14:15 +09:00
Hidetake Iwata 3a0332dfdc
Add metrics of RunnerDeployment and HRA (#408)
* Add metrics of RunnerDeployment and HRA

* Use kube-state-metrics-style label names
2021-03-19 16:14:02 +09:00
Yusuke Kuoka f6ab66c55b
Do not delay min/maxReplicas propagation from HRA to RD due to caching (#406)
As part of #282, I have introduced some caching mechanism to avoid excessive GitHub API calls due to the autoscaling calculation involving GitHub API calls is executed on each Webhook event.

Apparently, it was saving the wrong value in the cache- The value was one after applying `HRA.Spec.{Max,Min}Replicas` so manual changes to {Max,Min}Replicas doesn't affect RunnerDeployment.Spec.Replicas until the cache expires. This isn't what I had wanted.

This patch fixes that, by changing the value being cached to one before applying {Min,Max}Replicas.

Additionally, I've also updated logging so that you observe which number was fetched from cache, and what number was suggested by either TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy, and what was the final number used as the desired-replicas(after applying {Min,Max}Replicas).

Follow-up for #282
2021-03-19 12:58:02 +09:00
Yusuke Kuoka d874a5cfda
Fix `status.lastRegistrationCheckTime in body must be of type string: \"null\"` errors (#407)
Follow-up for #398 and #404
2021-03-19 11:15:35 +09:00
Yusuke Kuoka c424215044
Do recheck runner registration timely (#405)
Since #392, the runner controller could have taken unexpectedly long time until it finally notices that the runner has been registered to GitHub. This patch fixes the issue, so that the controller will notice the successful registration in approximately 1 minute(hard-coded).

More concretely, let's say you had configured a long sync-period of like 10m, the runner controller could have taken approx 10m to notice the successful registration. The original expectation was 1m, because it was intended to recheck every 1m as implemented in #392. It wasn't working as such due to my misunderstanding in how requeueing work.
2021-03-19 11:02:47 +09:00
Yusuke Kuoka c5fdfd63db
Merge pull request #404 from summerwind/fix-reg-update-runner-status-err
Fix `Failed to update runner status for Registration` errors
2021-03-19 08:56:20 +09:00
Yusuke Kuoka 23a45eaf87 Bump chart version 2021-03-19 08:37:17 +09:00
Yusuke Kuoka dee997b44e Fix `Failed to update runner status for Registration` errors
Fixes #400
2021-03-19 07:02:00 +09:00
Yusuke Kuoka 2929a739e3
Merge pull request #398 from summerwind/fix-status-last-reg-check-time-type-err
Fix `status.lastRegistrationCheckTime in body must be of type string: \"null\"` error
2021-03-18 10:36:44 +09:00