Commit Graph

351 Commits

Author SHA1 Message Date
Tristan Keen 5e3f89bdc5
Correct test to append docker container (#837)
Fixes #835
2021-09-24 09:18:20 +09:00
Tristan Keen 1eb135cace Correct default image logic 2021-09-14 17:00:57 +09:00
Tarasovych 7008b0c257
feat: Organization RunnerDeployment with webhook-based autoscaling only for certain repositories (#766)
Resolves #765

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-08-31 09:46:36 +09:00
Yusuke Kuoka fabead8c8e
feat: Workflow job based ephemeral runner scaling (#721)
This add support for two upcoming enhancements on the GitHub side of self-hosted runners, ephemeral runners, and `workflow_jow` events. You can't use these yet.

**These features are not yet generally available to all GitHub users**. Please take this pull request as a preparation to make it available to actions-runner-controller users as soon as possible after GitHub released the necessary features on their end.

**Ephemeral runners**:

The former, ephemeral runners, is basically the reliable alternative to `--once`, which we've been using when you enabled `ephemeral: true` (default in actions-runner-controller).

`--once` has been suffering from a race issue #466. `--ephemeral` fixes that.

To enable ephemeral runners with `actions/runner`, you give `--ephemeral` to `config.sh`. This updated version of `actions-runner-controller` does it for you, by using `--ephemeral` instead of `--once` when you set `RUNNER_FEATURE_FLAG_EPHEMERAL=true`.

Please read the section `Ephemeral Runners` in the updated version of our README for more information.

Note that ephemeral runners is not released on GitHub yet. And `RUNNER_FEATURE_FLAG_EPHEMERAL=true` won't work at all until the feature gets released on GitHub. Stay tuned for an announcement from GitHub!

**`workflow_job` events**:

`workflow_job` is the additional webhook event that corresponds to each GitHub Actions workflow job run. It provides `actions-runner-controller` a solid foundation to improve our webhook-based autoscale.

Formerly, we've been exploiting webhook events like `check_run` for autoscaling. However, as none of our supported events has included `labels`, you had to configure an HRA to only match relevant `check_run` events. It wasn't trivial.

In contrast, a `workflow_job` event payload contains `labels` of runners requested. `actions-runner-controller` is able to automatically decide which HRA to scale by filtering the corresponding RunnerDeployment by `labels` included in the webhook payload. So all you need to use webhook-based autoscale will be to enable `workflow_job` on GitHub and expose actions-runner-controller's webhook server to the internet.

Note that the current implementation of `workflow_job` support works in two ways, increment, and decrement. An increment happens when the webhook server receives` workflow_job` of `queued` status. A decrement happens when it receives `workflow_job` of `completed` status. The latter is used to make scaling-down faster so that you waste money less than before. You still don't suffer from flapping, as a scale-down is still subject to `scaleDownDelaySecondsAfterScaleOut `.

Please read the section `Example 3: Scale on each `workflow_job` event` in the updated version of our README for more information on its usage.
2021-08-11 09:52:04 +09:00
Rolf Ahrenberg 14564c7b8e
Allow disabling /runner emptydir mounts and setting storage volume (#674)
* Allow disabling /runner emptydir mounts

* Support defining storage medium for emptydirs

* Fix typos
2021-07-15 06:29:58 +09:00
Sebastien Le Digabel 7f2795b5d6
Adding a default docker registry mirror (#689)
* Adding a default docker registry mirror

This change allows the controller to start with a specified default
docker registry mirror and avoid having to specify it in all the runner*
objects.

The change is backward compatible, if a runner has a docker registry
mirror specified, it will supersede the default one.
2021-07-15 06:20:08 +09:00
Yusuke Kuoka f858e2e432
Add POC of GitHub Webhook Delivery Forwarder (#682)
* Add POC of GitHub Webhook Delivery Forwarder

* multi-forwarder and ctrl-c existing and fix for non-woring http post

* Rename source files

* Extract signal handling into a dedicated source file

* Faster ctrl-c handling

* Enable automatic creation of repo hook on startup

* Add support for forwarding org hook deliveries

* Set hook secret on hook creation via envvar (HOOK_SECRET)

* Fix org hook support

* Fix HOOK_SECRET for consistency

* Refactor to prepare for custom log position provider

* Refactor to extract inmemory log position provider

* Add configmap-based log position provider

* Rename githubwebhookdeliveryforwarder to hookdeliveryforwarder

* Refactor to rename LogPositionProvider to Checkpointer and extract ConfigMap checkpointer into a dedicated pkg

* Refactor to extract logger initialization

* Add hookdeliveryforwarder README and bump go-github to unreleased ver
2021-07-14 10:18:55 +09:00
Yusuke Kuoka 6f130c2db5
Fix dockerdWithinRunnerContainer for Runner(Deployment) not working in the main branch (#696)
Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/674#issuecomment-878600993
2021-07-13 18:14:15 +09:00
Yusuke Kuoka f19e7ea8a8
chore: Upgrade go-github to v36 (#681) 2021-07-04 17:43:52 +09:00
Yusuke Kuoka acb906164b
RunnerSet: Automatic-recovery from registration timeout and deregistration on pod termination (#652)
Ref #629
Ref #613
Ref #612
2021-06-24 20:39:37 +09:00
Yusuke Kuoka 98da4c2adb
Add HRA support for RunnerSet (#647)
`HRA.Spec.ScaleTargetRef.Kind` is added to denote that the scale-target is a RunnerSet.

It defaults to `RunnerDeployment` for backward compatibility.

```
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: myhra
spec:
  scaleTargetRef:
    kind: RunnerSet
    name: myrunnerset
```

Ref #629
Ref #613
Ref #612
2021-06-23 20:25:03 +09:00
Yusuke Kuoka 8b90b0f0e3
Clean up import list (#645)
Resolves #644
2021-06-22 17:55:06 +09:00
Jonathan Gonzalez V a277489003
Added support to enable and disable enableServiceLinks. (#628)
This option expose internally some `KUBERNETES_*` environment variables
that doesn't allow the runner to use KinD (Kubernetes in Docker) since it will
try to connect to the Kubernetes cluster where the runner it's running.

This option it's set by default to `true` in any Kubernetes deployment.

Signed-off-by: Jonathan Gonzalez V <jonathan.gonzalez@enterprisedb.com>
2021-06-22 17:27:26 +09:00
Yusuke Kuoka 9e4dbf497c
feat: RunnerSet backed by StatefulSet (#629)
* feat: RunnerSet backed by StatefulSet

Unlike a runner deployment, a runner set can manage a set of stateful runners by combining a statefulset and an admission webhook that mutates statefulset-managed pods with required envvars and registration tokens.

Resolves #613
Ref #612

* Upgrade controller-runtime to 0.9.0

* Bump Go to 1.16.x following controller-runtime 0.9.0

* Upgrade kubebuilder to 2.3.2 for updated etcd and apiserver following local setup

* Fix startup failure due to missing LeaderElectionID

* Fix the issue that any pods become unable to start once actions-runner-controller got failed after the mutating webhook has been registered

* Allow force-updating statefulset

* Fix runner container missing work and certs-client volume mounts and DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false

* Fix runnerset-controller not applying statefulset.spec.template.spec changes when there were no changes in runnerset spec

* Enable running acceptance tests against arbitrary kind cluster

* RunnerSet supports non-ephemeral runners only today

* fix: docker-build from root Makefile on intel mac

* fix: arch check fixes for mac and ARM

* ci: aligning test data format and patching checks

* fix: removing namespace in test data

* chore: adding more ignores

* chore: removing leading space in shebang

* Re-add metrics to org hra testdata

* Bump cert-manager to v1.1.1 and fix deploy.sh

Co-authored-by: toast-gear <15716903+toast-gear@users.noreply.github.com>
Co-authored-by: Callum James Tait <callum.tait@photobox.com>
2021-06-22 17:10:09 +09:00
Jonah Back 8c42f99d0b
feat: avoid setting privileged flag if seLinuxOptions is not null (#599)
Sets the privileged flag to false if SELinuxOptions are present/defined. This is needed because containerd treats SELinux and Privileged controls as mutually exclusive. Also see https://github.com/containerd/cri/blob/aa2d5a97c/pkg/server/container_create.go#L164.

This allows users who use SELinux for managing privileged processes to use GH Actions - otherwise, based on the SELinux policy, the Docker in Docker container might not be privileged enough. 

Signed-off-by: Jonah Back <jonah@jonahback.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-06-04 08:59:11 +09:00
Ameer Ghani 7523ea44f1
feat: allow specifying runtime class in runner spec (#580)
This allows using the `runtimeClassName` directive in the runner's spec.

One of the use-cases for this is Kata Containers, which use `runtimeClassName` in a pod spec as an indicator that the pod should run inside a Kata container. This allows us a greater degree of pod isolation.
2021-06-04 08:56:43 +09:00
Yusuke Kuoka cb14d7530b
Add HRA printer column "SCHEDULE" (#561)
Adds a column to help the operator see if they configured HRA.Spec.ScheduledOverrides correctly, in a form of "next override schedule recognized by the controller":

```
$ k get horizontalrunnerautoscaler
NAME                            MIN   MAX   DESIRED   SCHEDULE
actions-runner-aos-autoscaler   0     5     0
org                             0     5     0         min=0 time=2021-05-21 15:00:00 +0000 UTC
```

Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/484
2021-05-22 08:29:53 +09:00
Yusuke Kuoka 0b88b246d3
Fix additionalPrinterColumns (#556)
This fixes human-readable output of `kubectl get` on `runnerdeployment`, `runnerreplicaset`, and `runner`.

Most notably, CURRENT and READY of runner replicasets are now computed and printed correctly. Runner deployments now have UP-TO-DATE and AVAILABLE instead of READY so that it is consistent with columns of K8s deployments.

A few fixes has been also made to runner deployment and runner replicaset controllers so that those numbers stored in Status objects are reliably updated and in-sync with actual values.

Finally, `AGE` columns are added to runnerdeployment, runnerreplicaset, runnner to make that more visible to users.

`kubectl get` outputs should now look like the below examples:

```
# Immediately after runnerdeployment updated/created
$ k get runnerdeployment
NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
example-runnerdeploy   0         0         0            0           8d
org-runnerdeploy       5         5         5            0           8d

# A few dozens of seconds after update/create all the runners are registered that "available" numbers increase
$ k get runnerdeployment
NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
example-runnerdeploy   0         0         0            0           8d
org-runnerdeploy       5         5         5            5           8d
```

```
$ k get runnerreplicaset
NAME                         DESIRED   CURRENT   READY   AGE
example-runnerdeploy-wnpf6   0         0         0       61m
org-runnerdeploy-fsnmr       2         2         0       8m41s
```

```
$ k get runner
NAME                                           ENTERPRISE   ORGANIZATION                REPOSITORY                                       LABELS                      STATUS    AGE
example-runnerdeploy-wnpf6-registration-only                                            actions-runner-controller/mumoshu-actions-test                               Running   61m
org-runnerdeploy-fsnmr-n8kkx                                actions-runner-controller                                                    ["mylabel 1","mylabel 2"]             21s
org-runnerdeploy-fsnmr-sq6m8                                actions-runner-controller                                                    ["mylabel 1","mylabel 2"]             21s
```

Fixes #490
2021-05-21 09:10:47 +09:00
Yusuke Kuoka 3cd124dce3
chore: Add debug logs for scheduledOverrides (#540)
Follow-up for #515
Ref #484
2021-05-11 17:30:22 +09:00
Yusuke Kuoka 25f5817a5e Improve debug log in webhook-based autoscaling
Adds some helpful debug log messages I have used while verifying #534
2021-05-11 15:49:03 +09:00
Yusuke Kuoka 4e7b8b57c0
edge: Enable scaling from zero with PercentageRunnersBusy (#524)
`PercentageRunnersBusy`, in combination with a secondary `TotalInProgressAndQueuedWorkflowRuns` metric, enables scale-from-zero for PercentageRunnersBusy.

Please see the new `Autoscaling to/from 0` section in the updated documentation about how it works.

Resolves #522
2021-05-05 14:27:17 +09:00
Yusuke Kuoka e7020c7c0f
Fix scale-from-zero to retain the reg-only runner until other pods come up (#523)
Fixes #516
2021-05-05 12:13:51 +09:00
Yusuke Kuoka 0e0f385f72
Experimental support for ScheduledOverrides (#515)
This adds the initial version of ScheduledOverrides to HorizontalRunnerAutoscaler.
`MinReplicas` overriding should just work.
When there are two or more ScheduledOverrides, the earliest one that matched is activated. Each ScheduledOverride can be recurring or one-time. If you have two or more ScheduledOverrides, only one of them should be one-time. And the one-time override should be the earliest item in the list to make sense.

Tests will be added in another commit. Logging improvements and additional observability in HRA.Status will also be added in yet another commits.

Ref #484
2021-05-03 23:31:17 +09:00
Yusuke Kuoka 469b117a09
Foundation for ScheduledOverrides (#513)
Adds two types `RecurrenceRule` and `Period` and one function `MatchSchedule` as the foundation for building the upcoming ScheduledOverrides feature.

Ref #484
2021-05-03 22:03:49 +09:00
Thejas N 588872a316
feat: allow ephemeral runner to be optional (#498)
- Adds `ephemeral` option to `runner.spec` 
    
    ```
      ....
      template:
         spec:
             ephemeral: false
             repository: mumoshu/actions-runner-controller-ci
      ....
    ```
- `ephemeral` defaults to `true`
- `entrypoint.sh` in runner/Dockerfile modified to read `RUNNER_EPHEMERAL` flag
- Runner images are backward-compatible. `--once` is omitted only when the new envvar `RUNNER_EPHEMERAL` is explicitly set to `false`.

Resolves #457
2021-05-02 19:04:14 +09:00
Christoph Brand a18ac330bb
feature(controller): allow autoscaler to scale down to 0 (#447) 2021-05-02 16:46:51 +09:00
Yusuke Kuoka dbd7b486d2
feat: Support for scaling from/to zero (#465)
This is an attempt to support scaling from/to zero.

The basic idea is that we create a one-off "registration-only" runner pod on RunnerReplicaSet being scaled to zero, so that there is one "offline" runner, which enables GitHub Actions to queue jobs instead of discarding those.

GitHub Actions seems to immediately throw away the new job when there are no runners at all. Generally, having runners of any status, `busy`, `idle`, or `offline` would prevent GitHub actions from failing jobs. But retaining `busy` or `idle` runners means that we need to keep runner pods running, which conflicts with our desired to scale to/from zero, hence we retain `offline` runners.

In this change, I enhanced the runnerreplicaset controller to create a registration-only runner on very beginning of its reconciliation logic, only when a runnerreplicaset is scaled to zero. The runner controller creates the registration-only runner pod, waits for it to become "offline", and then removes the runner pod. The runner on GitHub stays `offline`, until the runner resource on K8s is deleted. As we remove the registration-only runner pod as soon as it registers, this doesn't block cluster-autoscaler.

Related to #447
2021-05-02 16:11:36 +09:00
Rolf Ahrenberg 6b77a2a5a8
feat: Docker registry mirror (#478)
Changes:

- Switched to use `jq` in startup.sh
- Enable docker registry mirror configuration which is useful when e.g. avoiding the Docker Hub rate-limiting

Check #478 for how this feature is tested and supposed to be used.
2021-04-25 14:04:01 +09:00
Manuel Jurado 37c2a62fa8
Allow to configure runner volume size limit (#436)
Enable the user to set a limit size on the volume of the runner to avoid some runner pod affecting other resources of the same cluster

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-04-18 13:56:59 +09:00
Agoney Garcia-Deniz 2e551c9d0a
Add hostAliases to the runner spec (#456) 2021-04-17 17:04:52 +09:00
asoldino b42b8406a2
Add dockerVolumeMounts (#439)
Resolves #435
2021-04-06 10:10:10 +09:00
Christoph Brand 9ed245c85e
feature(controller): remove dockerd executable (#432) 2021-04-01 08:50:48 +09:00
Yusuke Kuoka 156e2c1987
Fix MTU configuration for dockerd (#421)
Resolves #393
2021-03-31 09:29:21 +09:00
Yusuke Kuoka 374105c1f3
Fix dindWithinRunnerContainer not to crash-loop runner pods (#419)
Apparently #253 broke dindWithinRunnerContainer completely due to the difference in how /runner volume is set up.
2021-03-25 10:23:36 +09:00
Yusuke Kuoka bc6e499e4f
Make logging more concise (#410)
This makes logging more concise by changing logger names to something like `controllers.Runner` to `actions-runner-controller.runner` after the standard `controller-rutime.controller` and reducing redundant logs by removing unnecessary requeues. I have also tweaked log messages so that their style is more consistent, which will also help readability. Also, runnerreplicaset-controller lacked useful logs so I have enhanced it.
2021-03-20 07:34:25 +09:00
Yusuke Kuoka 07f822bb08
Do include Runner controller in integration test (#409)
So that we could catch bugs in runner controller like seen in #398, #404, and #407.

Ref #400
2021-03-19 16:14:15 +09:00
Hidetake Iwata 3a0332dfdc
Add metrics of RunnerDeployment and HRA (#408)
* Add metrics of RunnerDeployment and HRA

* Use kube-state-metrics-style label names
2021-03-19 16:14:02 +09:00
Yusuke Kuoka f6ab66c55b
Do not delay min/maxReplicas propagation from HRA to RD due to caching (#406)
As part of #282, I have introduced some caching mechanism to avoid excessive GitHub API calls due to the autoscaling calculation involving GitHub API calls is executed on each Webhook event.

Apparently, it was saving the wrong value in the cache- The value was one after applying `HRA.Spec.{Max,Min}Replicas` so manual changes to {Max,Min}Replicas doesn't affect RunnerDeployment.Spec.Replicas until the cache expires. This isn't what I had wanted.

This patch fixes that, by changing the value being cached to one before applying {Min,Max}Replicas.

Additionally, I've also updated logging so that you observe which number was fetched from cache, and what number was suggested by either TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy, and what was the final number used as the desired-replicas(after applying {Min,Max}Replicas).

Follow-up for #282
2021-03-19 12:58:02 +09:00
Yusuke Kuoka c424215044
Do recheck runner registration timely (#405)
Since #392, the runner controller could have taken unexpectedly long time until it finally notices that the runner has been registered to GitHub. This patch fixes the issue, so that the controller will notice the successful registration in approximately 1 minute(hard-coded).

More concretely, let's say you had configured a long sync-period of like 10m, the runner controller could have taken approx 10m to notice the successful registration. The original expectation was 1m, because it was intended to recheck every 1m as implemented in #392. It wasn't working as such due to my misunderstanding in how requeueing work.
2021-03-19 11:02:47 +09:00
Yusuke Kuoka 3cccca8d09 Do patch runner status instead of update to reduce conflicts and avoid future bugs
Ref https://github.com/summerwind/actions-runner-controller/pull/398#issuecomment-801548375
2021-03-18 10:31:17 +09:00
Yusuke Kuoka 7a7086e7aa Make error logs more helpful 2021-03-18 10:26:21 +09:00
Yusuke Kuoka 3f23501b8e
Reduce "No runner matching the specified labels was found" errors while runner replacement (#392)
We occasionally encountered those errors while the underlying RunnerReplicaSet is being recreated/replaced on RunnerDeployment.Spec.Template update. It turned out to be due to that the RunnerDeployment controller was waiting for the runner pod becomes `Running`, intead of the new replacement runner to have registered to GitHub. This fixes that, by trying to Runner.Status.Phase to `Running` only after the runner in the runner pod appears to be registered.

A side-effect of this change is that runner controller would call more "ListRunners" GitHub Actions API. I've reviewed and improved the runner controller code and Runner CRD to make make the number of calls minimum. In most cases, ListRunners should be called only twice for each runner creation.
2021-03-16 10:52:30 +09:00
Yusuke Kuoka 5530030c67
Disable metrics-based autoscaling by default when scaleUpTriggers are enabled (#391)
Relates to https://github.com/summerwind/actions-runner-controller/pull/379#discussion_r592813661
Relates to https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-793266609

When you defined HRA.Spec.ScaleUpTriggers[] but HRA.Spec.Metrics[], the HRA controller will now enable ScaleUpTriggers alone and insteaed of automatically enabling TotalNumberOfQueuedAndInProgressWorkflowRuns. This allows you to use ScaleUpTriggers alone, so that the autoscaling is done without calling GitHub API at all, which should grealy decrease the change of GitHub API calls get rate-limited.
2021-03-14 11:03:00 +09:00
Yusuke Kuoka 8d3a83b07a
Add CheckRun.Names scale-up trigger configuration (#390)
This allows you to trigger autoscaling depending on check_run names(i.e. actions job names). If you are willing to differentiate scale amount only for a specific job, or want to scale only on a specific job, try this.
2021-03-14 10:21:42 +09:00
Brandon Kimbrough 2273b198a1
Add ability to set the MTU size of the docker in docker container (#385)
* adding abilitiy to set docker in docker MTU size

* safeguards to only set MTU env var if it is set
2021-03-12 08:44:49 +09:00
Yusuke Kuoka 3d62e73f8c
Fix PercentageRunnersBusy scaling not working (#386)
PercentageRunnerBusy seems to have regressed since #355 due to that RunnerDeployment.Spec.Selector is empty by default and the HRA controller was using that empty selector to query runners, which somehow returned 0 runners. This fixes that by using the newly added automatic `runner-deployment-name` label for the default runner label and the selector, which avoids querying with empty selector.

Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-795200205
2021-03-11 20:16:36 +09:00
Yusuke Kuoka f5c639ae28
Make webhook-based autoscaler github event logs more operator-friendly (#384)
Adds fields like `pullRequest.base.ref` and `checkRun.status` that are useful for verifying the autoscaling behaviour without browsing GitHub.
Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-794175312
2021-03-10 09:40:44 +09:00
Yusuke Kuoka 728829be7b
Fix panic on scaling organizational runners (#381)
Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-793287133
2021-03-09 15:03:47 +09:00
Yusuke Kuoka 1b8a656051 Use --watch-namespace flag to restrict the namespace to watch
Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-793172995
2021-03-09 09:46:21 +09:00
Rob Whitby 1753fa3530
handle GET requests in webhook hra (#378) 2021-03-09 08:46:27 +09:00
Yusuke Kuoka 4fa5315311
Fix possible flapping autoscale on runner update (#371)
Addresses https://github.com/summerwind/actions-runner-controller/pull/355#discussion_r587199428
2021-03-05 10:21:20 +09:00
Hiroshi Muraoka 11e58fcc41
Manage runner with label (#355)
* Update RunnerDeploymentSpec to have Selector field

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

* Update RunnerReplicaSetSpec to have Selector field

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

* Add CloneSelectorAndAddLabel to add Selector field

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

* Fix tests

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

* Use label to find RunnerReplicaSet/Runner

Signed-off-by: binoue <banji-inoue@cybozu.co.jp>

* Update controller-gen versions in CRD

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

* Update autoscaler to list Pods with labels

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

* Add debug log

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

* Modify RunnerDeployment tests

Signed-off-by: binoue <banji-inoue@cybozu.co.jp>

* Modify RunnerReplicaset test

Signed-off-by: binoue <banji-inoue@cybozu.co.jp>

* Modify integration test

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

* Use RunnerDeployment Template Labels as the default selector for backward compatibility

* Fix labeling

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

* Update func in Eventually to return (int, error)

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

* Update RunnerDeployment controller not to use label selector

Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com>

* Fix potential replicaset controller breakage on replicaset created before v0.17.0

* Fix errors on existing runner replica sets

* Ensure RunnerReplicaSet Spec Selector addition does not break controller

* Ensure RunnerDeployment Template.Spec.Labels change does result in template hash change

* Fix comment

Co-authored-by: binoue <banji-inoue@cybozu.co.jp>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-03-05 10:15:39 +09:00
Yusuke Kuoka 584590e97c
Use patch instead of update to alleviate HRA conflict on webhook (#358)
We sometimes see that integration test fails due to runner replicas not meeting the expected number in a timely manner. After investigating a bit, this turned out to be due to that HRA updates on webhook-based autoscaler and HRA controller are conflicting. This changes the controllers to use Patch instead of Update to make conflicts less likely to happen.

I have also updated the hra controller to use Patch when updating RunnerDeployment, too.

Overall, these changes should make the webhook-based autoscaling more reliable due to less conflicts.
2021-02-26 10:17:09 +09:00
Yusuke Kuoka d18884a0b9
Fix HRA expired cache entries not cleaned up (#357)
Fixes #356
2021-02-26 09:54:24 +09:00
Yusuke Kuoka e9eef04993
Fix old HRA capacity reservations not cleaned up (#354)
Similar to #348 for #346, but for HRA.Spec.CapacityReservations usually modified by the webhook-based autoscaler controller.
This patch tries to fix that by improving the webhook-based autoscaler controller to omit expired reservations on updating HRA spec.
2021-02-25 11:08:00 +09:00
Yusuke Kuoka 598dd1d9fe
Fix incorrect DESIRED on `kubectl get hra (#353)
`kubectl get horizontalrunnerautoscalers.actions.summerwind.dev` shows HRA.status.desiredReplicas as the DESIRED count. However the value had been not taking capacityReservations into account, which resulted in showing incorrect count when you used webhook-based autoscaler, or capacityReservations API directly. This fixes that.
2021-02-25 10:32:09 +09:00
Yusuke Kuoka 9890a90e69
Improve webhook-based autoscaler log (#352)
The controller had been writing confusing messages like the below on missing scale target:

```
Found too many scale targets: It must be exactly one to avoid ambiguity. Either set WatchNamespace for the webhook-based autoscaler to let it only find HRAs in the namespace, or update Repository or Organization fields in your RunnerDeployment resources to fix the ambiguity.{"scaleTargets": ""}
```

This fixes that, while improving many kinds of messages written while reconcilation, so that the error message is more actionable.
2021-02-25 10:07:41 +09:00
Yusuke Kuoka 9da123ae5e
Fix integration test flakiness (#351)
Ref https://github.com/summerwind/actions-runner-controller/pull/345#issuecomment-785015406
2021-02-25 09:30:32 +09:00
Yusuke Kuoka 022007078e
Compact excessive error message on runnerreplicaset status update conflict (#350)
We occasionally see logs like the below:

```
2021-02-24T02:48:26.769ZERRORFailed to update runner status{"runnerreplicaset": "testns-244ol/example-runnerdeploy-j5wzf", "error": "Operation cannot be fulfilled on runnerreplicasets.actions.summerwind.dev \"example-runnerdeploy-j5wzf\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
/home/runner/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
github.com/summerwind/actions-runner-controller/controllers.(*RunnerReplicaSetReconciler).Reconcile
/home/runner/work/actions-runner-controller/actions-runner-controller/controllers/runnerreplicaset_controller.go:207
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:88
2021-02-24T02:48:26.769ZERRORcontroller-runtime.controllerReconciler error{"controller": "testns-244olrunnerreplicaset", "request": "testns-244ol/example-runnerdeploy-j5wzf", "error": "Operation cannot be fulfilled on runnerreplicasets.actions.summerwind.dev \"example-runnerdeploy-j5wzf\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
/home/runner/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:88
```

which can be compacted into one-liner, without the useless stack trace, without double-logging the same error from the logger and the controller.
2021-02-25 09:01:02 +09:00
Johannes Nicolai 31e5e61155
Log correct runner that was deleted (#349) 2021-02-25 08:38:55 +09:00
Yusuke Kuoka e44e53b88e
Fix failure while saving HRA status after running controller for a while (#348)
Fixes #346
2021-02-24 11:20:21 +09:00
Yusuke Kuoka 991535e567
Fix panic on webhook for user-owned repository (#344)
* Fix panic on webhook for user-owned repository

Follow-up for #282 and #334
2021-02-23 08:05:25 +09:00
Johannes Nicolai 2d7fbbfb68
Handle offline runners gracefully (#341)
* if a runner pod starts up with an invalid token, it will go in an 
infinite retry loop, appearing as RUNNING from the outside
* normally, this error situation is detected because no corresponding 
runner objects exists in GitHub and the pod will get removed after 
registration timeout
* if the GitHub runner object already existed before - e.g. because a 
finalizer was not properly run as part of a partial Kubernetes crash, 
the runner will always stay in a running mode, even updating the 
registration token will not kill the problematic pod
* introducing RunnerOffline exception that can be handled in runner 
controller and replicaset controller
* as runners are offline when a pod is completed and marked for restart, 
only do additional restart checks if no restart was already decided, 
making code a bit cleaner and saving GitHub API calls after each job 
completion
2021-02-22 10:08:04 +09:00
Hidetake Iwata b0e74bebab Fix index key to find HRA in GitHub webhook handler 2021-02-20 21:25:23 +09:00
Hidetake Iwata dfbe53dcca Fix webhook payload in integration test 2021-02-20 21:08:23 +09:00
Yusuke Kuoka ebc3970b84 Add integration test for autoscaling on check_run webhook event 2021-02-19 10:33:04 +09:00
Hidetake Iwata 1ddcf6946a
Fix nil pointer error on received check_run event (#331)
* Reproduce nil pointer error on received check_run event

* Fix nil pointer error on received check_run event
2021-02-18 20:22:36 +09:00
Yusuke Kuoka 67f6de010b
feat: Common runner labels configurable per controller (#327)
* feat: Common runner labels configurable per controller

Ref #321
2021-02-18 20:19:08 +09:00
Yusuke Kuoka 2fdf35ac9d
Refactor integration test to use helpers (#320)
This should make the test code a bit more DRY and readable.
2021-02-17 10:23:35 +09:00
Yusuke Kuoka eb2eaf8130
Fix TotalNumberOfQueuedAndInProgressWorkflowRuns to work with a lot of remaining `completed` jobs (#316)
I have heard from some user that they have hundred thousands of `status=completed` workflow runs in their repository which effectively blocked TotalNumberOfQueuedAndInProgressWorkflowRuns from working because of GitHub API rate limit due to excessive paginated requests.

This fixes that by separating list-workflow-runs calls to two - one for `queued` and one for `in_progress`, which can make the minimum API call from 1 to 2, but allows it to work regardless of number of remaining `completed` workflow runs.
2021-02-16 18:55:55 +09:00
Yusuke Kuoka 7d024a6c05
Fix "duplicate metrics collector registration attempted" errors at startup (#317)
I have seen this error a lot in our integration test. It turned out due to https://github.com/kubernetes-sigs/controller-runtime/issues/484 and is being fixed with this change.
2021-02-16 18:51:33 +09:00
Yusuke Kuoka 434823bcb3
`scale{Up,Down}Adjustment` to add/remove constant number of replicas on scaling (#315)
* `scale{Up,Down}Adjustment` to add/remove constant number of replicas on scaling

Ref #305

* Bump chart version
2021-02-16 17:16:26 +09:00
Yusuke Kuoka f1db6af1c5
Add repository runners support for PercentageRunnersBusy-based autoscaling (#313)
Resolves #258
2021-02-16 12:44:51 +09:00
Johannes Nicolai 2623140c9a
Make log message less scary (#311)
* the reconciliation loop is often much faster than the runner startup, 
so changing runner not found related messages to debug and also add the 
possibility that the runner just needs more time
2021-02-16 09:55:55 +09:00
Johannes Nicolai bc8bc70f69
Fix rate limit and runner registration logic (#309)
* errors.Is compares all members of a struct to return true which never 
happened
* switched to type check instead of exact value check
* notRegistered was using double negation in if statement which lead to 
unregistering runners after the registration timeout
2021-02-15 09:36:49 +09:00
Johannes Nicolai 9c8d7305f1
Introduce pod deletion timeout and forcefully delete stuck pods (#307)
* if a k8s node becomes unresponsive, the kube controller will soft
delete all pods after the eviction time (default 5 mins)
* as long as the node stays unresponsive, the pod will never leave the
last status and hence the runner controller will assume that everything
is fine with the pod and will not try to create new pods
* this can result in a situation where a horizontal autoscaler thinks
that none of its runners are currently busy and will not schedule any
further runners / pods, resulting in a broken  runner deployment until the
runnerreplicaset is deleted or the node comes back online
* introducing a pod deletion timeout (1 minute) after which the runner
controller will try to reboot the runner and create a pod on a working
node
* use forceful deletion and requeue for pods that have been stuck for
more than one minute in terminating state
* gracefully handling race conditions if pod gets finally forcefully deleted within
2021-02-15 09:32:28 +09:00
Yusuke Kuoka addcbfa7ee
Fix runner registration timeout (#301)
Fixes #300
2021-02-12 10:00:20 +09:00
Yusuke Kuoka bbb036e732
feat: Prevent blocking on transient runner registration failure (#297)
This enhances the controller to recreate the runner pod if the corresponding runner has failed to register itself to GitHub within 10 minutes(currently hard-coded).

It should alleviate #288 in case the root cause is some kind of transient failures(network unreliability, GitHub down, temporarly compute resource shortage, etc).

Formerly you had to manually detect and delete such pods or even force-delete corresponding runners to unblock the controller.

Since this enhancement, the controller does the pod deletion automatically after 10 minutes after pod creation, which result in the controller create another pod that might work.

Ref #288
2021-02-09 10:17:52 +09:00
Yusuke Kuoka 9301409aec
fix: Paginate ListRepositoryWorkflowRuns (#295)
When we used `QueuedAndInProgressWorkflowRuns`-based autoscaling, it only fetched and considered only the first 30 workflow runs at the reconcilation time. This may have resulted in unreliable scaling behaviour, like scale-in/out not happening when it was expected.
2021-02-09 10:13:53 +09:00
Yusuke Kuoka ab1c39de57
feat: HorizontalRunnerAutoscaler Webhook server (#282)
* feat: HorizontalRunnerAutoscaler Webhook server

This introduces a Webhook server that responds GitHub `check_run`, `pull_request`, and `push` events by scaling up matched HorizontalRunnerAutoscaler by 1 replica. This allows you to immediately add "resource slack" for future GitHub Actions job runs, without waiting next sync period to add insufficient runners.

This feature is highly inspired by https://github.com/philips-labs/terraform-aws-github-runner. terraform-aws-github-runner can manage one set of runners per deployment, where actions-runner-controller with this feature can manage as many sets of runners as you declare with HorizontalRunnerAutoscaler and RunnerDeployment pairs.

On each GitHub event received, the webhook server queries repository-wide and organizational runners from the cluster and searches for the single target to scale up. The webhook server tries to match HorizontalRunnerAutoscaler.Spec.ScaleUpTriggers[].GitHubEvent.[CheckRun|Push|PullRequest] against the event and if it finds only one HRA, it is the scale target. If none or two or more targets are found for repository-wide runners, it does the same on organizational runners.

Changes:

* Fix integration test
* Update manifests
* chart: Add support for github webhook server
* dockerfile: Include github-webhook-server binary
* Do not import unversioned go-github
* Update README
2021-02-07 17:37:27 +09:00
Jesse Haka 28e80a2d28
Add support for enterprise runners (#290)
* Add support for enterprise runners

* update docs
2021-02-05 09:31:06 +09:00
Jonas Lergell 6c64ae6a01
Actually use 'dockerdContainerResources' to set resources on the dind container (#273) 2021-01-29 09:18:28 +09:00
Yusuke Kuoka ace95d72ab
Fix self-update failuers due to /runner/externals mount (#253)
* Fix self-update failuers due to /runner/externals mount

Fixes #252

* Tested Self-update Fixes (#269)

Adding fixes to #253 as confirmed and tested in https://github.com/summerwind/actions-runner-controller/issues/264#issuecomment-764549833 by @jolestar, @achedeuzot and @hfuss 🙇 🍻

Co-authored-by: Hayden Fuss <wifu1234@gmail.com>
2021-01-24 10:58:35 +09:00
Johannes Nicolai 94e8c6ffbf
minReplicas <= desiredReplicas <= maxReplicas (#267)
* ensure that minReplicas <= desiredReplicas <= maxReplicas no matter what
* before this change, if the number of runners was much larger than the max number, the applied scale down factor might still result in a desired value > maxReplicas
* if for resource constraints in the cluster, runners would be permanently restarted, the number of runners could go up more than the reverse scale down factor until the next reconciliation round, resulting in a situation where the number of runners climbs up even though it should actually go down
* by checking whether the desiredReplicas is always <= maxReplicas, infinite scaling up loops can be prevented
2021-01-22 10:11:21 +09:00
ZacharyBenamram 48923fec56
Autoscaling: Percentage runners busy - remove magic number used for round up (#235)
* remove magic number for autoscaling

Co-authored-by: Zachary Benamram <zacharybenamram@blend.com>
2020-12-15 14:38:01 +09:00
ZacharyBenamram 466b30728d
Add "PercentageRunnersBusy" horizontal runner autoscaler metric type (#223)
* hpa scheme based off busy runners

* running make manifests

Co-authored-by: Zachary Benamram <zacharybenamram@blend.com>
2020-12-13 08:48:19 +09:00
Yusuke Kuoka dfffd3fb62
feat: EKS IAM Roles for Service Accounts for Runner Pods (#226)
One of the pod recreation conditions has been modified to use hash of runner spec, so that the controller does not keep restarting pods mutated by admission webhooks. This naturally allows us, for example, to use IRSA for EKS that requires its admission webhook to mutate the runner pod to have additional, IRSA-related volumes, volume mounts and env.

Resolves #200
2020-12-08 17:56:06 +09:00
Juho Saarinen f710a54110
Don't compare runner connetion token at restart need check (#227)
Fixes #143
2020-12-08 08:48:35 +09:00
Erik Nobel a2b335ad6a
Github pkg: Bump github package to version 33 (#222) 2020-12-06 10:01:47 +09:00
Shinnosuke Sawada be25715e1e
Use TLS for secure docker connection (#192) 2020-11-30 08:57:33 +09:00
Reinier Timmer ee8fb5a388
parametrized working directory (#185)
* parametrized working directory

* manifests v3.0
2020-11-25 08:55:26 +09:00
Erik Nobel 4e93879b8f
[BUG?]: Create mountpoint for /externals/ (#203)
* runner/controller: Add externals directory mount point

* Runner: Create hack for moving content of /runner/externals/ dir

* Externals dir Mount: mount examples for '__e/node12/bin/node' not found error
2020-11-25 08:53:47 +09:00
Shinnosuke Sawada 4371de9733
add dockerEnabled option (#191)
Add dockerEnabled option for users who does not need docker and want not to run privileged container.
if `dockerEnabled == false`, dind container not run, and there are no privileged container.

Do the same as closed #96
2020-11-16 09:41:12 +09:00
Shinnosuke Sawada a4061d0625
gofmt ed 2020-11-12 09:20:06 +09:00
Shinnosuke Sawada 83857ba7e0
use tcp DOCKER_HOST instead of sharing docker.sock 2020-11-12 08:07:52 +09:00
Yusuke Kuoka e613219a89
Fix token registration broken since v0.11.0 (#167)
Fixes #166
2020-11-11 09:38:05 +09:00
Dan Webb dcf8524b5c
Adds RUNNER_GROUP argument to the runner registration (#157)
* Adds RUNNER_GROUP argument to the runner registration

Adds the ability to register a runner to a predefined runner_group

Resolves #137

* Update README with runner group example

- Updates the README with instructions of how to add the runner to a
  group
- Fix code fencing for shell and yaml blocks in the README
- Use consistent bullet points (dash not asterisk)
2020-11-10 17:15:54 +09:00
Juho Saarinen f2a2ab7ede
Check token validity only when creating new pod (#159)
Fixes #143
2020-11-10 17:02:30 +09:00
Juho Saarinen 40c5050978
Added support for other than public GitHub URL (#146)
Refactoring a bit
2020-10-28 22:15:53 +09:00
Yusuke Kuoka faaca10fba
Rename Runner.Spec.dockerWithinRunnerContainer to docker"d"WithinRunnerContainer (#134)
* Rename Runner.Spec.dockerWithinRunnerContainer to dockerdWithinRunnerContainer

Ref https://github.com/summerwind/actions-runner-controller/pull/126#issuecomment-712501790
2020-10-21 21:32:40 +09:00
Juho Saarinen d16dfac0f8
Restart if pod ends up succeeded (#136)
Fixes #132
2020-10-21 21:32:26 +09:00
Juho Saarinen 92920926fe
Configurable "runner and DinD in a single container" (#126) 2020-10-20 08:48:28 +09:00
Brendan Galloway 7d0bfb77e3
Inject Env Vars into Runner defined Container Spec (#127)
The runner token is now injected into the `runner` container defined within Runner.Spec.Containers[]
2020-10-20 08:43:53 +09:00
Dominic LoBue a63860029a
Prefer autoscaling based on jobs rather than workflows if available (#114)
Adds the ability to autoscale on jobs in addition to workflows. We fall back to using workflow metrics if job details are not present.

Resolves #89
2020-10-08 09:00:44 +09:00
Yusuke Kuoka 1e466ad3df Ensure controller-gen is up-to-date and the code and the manifests are in-sync
Follow-up for #95 that added /finalizers subresource permission and #103 that upgraded controller-gen from 0.2.4 from 0.3.0
2020-10-06 09:23:03 +09:00
Helder Moreira 7a2fa7fbce
runner-controller: do not delete runner if it is busy (#103)
Currently, after refreshing the token, the controller re-creates the runner with the new token. This results in jobs being interrupted. This PR makes sure the pod is not restarted if it is busy.

Closes #74
2020-10-05 09:06:37 +09:00
Yusuke Kuoka 4733edc20d Add scaling-down scenario to integration test 2020-08-02 16:10:01 +09:00
Yusuke Kuoka 50487bbb54 Fix the HRA controller name 2020-08-02 10:38:15 +09:00
Yusuke Kuoka e2164f9946 Fix integration test bugs and do verify scaling out 2020-08-02 10:34:58 +09:00
Yusuke Kuoka 3c3077a11c Fix crash on startup after the HRDA addition
This is a follow-up for #66.

The reconciler for the new HorizontalRunnerDeploymentAutoscaler had a terrible flaw that prevented the controller to fail launching due to an error like:

```
indexer conflict: map[field:.metadata.controller:{}]
```

This fixes that, while adding `integration_test.go` to verify its actually fixed and prevent regression in the future.
2020-07-29 21:20:46 +09:00
Moto Ishizawa e10637ce35
Merge pull request #66 from summerwind/org-runner-autoscale
feat: Organizational RunnerDeployment Autoscaling
2020-07-28 19:17:18 +09:00
Yusuke Kuoka ae30648985 feat: Use HorizontalRunnerAutoscaler for autoscaling 2020-07-27 20:33:44 +09:00
David Liao c0914743b0 add config to respect image pull policy 2020-07-08 23:53:52 -07:00
Yusuke Kuoka eca6917c6a feat: Organizational RunnerDeployment Autoscaling
Enhances #57 to add support for organizational runners.

As GitHub Actions does not have an appropriate API for this, this is the spec you need:

```
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: myrunners
spec:
  minReplicas: 1
  maxReplicas: 3
  autoscaling:
    metrics:
    - type: TotalNumberOfQueuedAndProgressingWorkflowRuns
      repositories:
      # Assumes that you have `github.com/myorg/myrepo1` repo
      - myrepo1
      - myrepo2
  template:
    spec:
      organization: myorg
```

It works by collecting "in_progress" and "queued" workflow runs for the repositories `myrepo1` and `myrepo2` to autoscale the number of replicas, assuming you have this organizational runner deployment only for those two repositories.

For example, if `myrepo1` had 1 `in_progress` and 2 `queued` workflow runs, and `myrepo2` had 4 `in_progress` and 8 `queued` workflow runs at the time of running the reconcilation loop on the runner deployment, it will scale replicas to 1 + 2 + 4 + 8 = 15.

Perhaps we might be better add a kind of "ratio" setting so that you can configure the controller to create e.g. 2x runners than demanded. But that's another story.

Ref #10
2020-07-03 09:12:47 +09:00
KUOKA Yusuke 5bb2694349
feat: Repository-wide RunnerDeployment Autoscaling (#57)
* feat: Repository-wide RunnerDeployment Autoscaling

This adds `maxReplicas` and `minReplicas` to the RunnerDeploymentSpec. If and only if both fields are set, the controller computes and sets desired `replicas` automatically depending on the demand.

The number of demanded runner replicas is computed by `queued workflow runs + in_progress workflow runs` for the repository. The support for organizational runners is not included.

Ref https://github.com/summerwind/actions-runner-controller/issues/10
2020-06-27 17:26:46 +09:00
Moto Ishizawa 390f2a62d9
Merge pull request #50 from summerwind/runner-validation-webhook
Add validation webhooks
2020-05-08 22:39:13 +09:00
Moto Ishizawa f80c3c1928 Set volume to pod properly 2020-05-01 08:51:25 +09:00
Moto Ishizawa e889eaeb04 Add validation webhooks 2020-04-30 22:11:59 +09:00
Reinier Timmer b96979888c fix delete pod when runner failed to register 2020-04-29 14:23:58 +09:00
Reinier Timmer 9f57f52e36 organization and repository are now exclusive 2020-04-28 11:14:31 +02:00
Reinier Timmer 8c5b776807 support runner labels 2020-04-28 11:14:31 +02:00
Reinier Timmer eca3cc7941 add organization info to runner status 2020-04-28 11:14:31 +02:00
Reinier Timmer fb35dd4131 support for organization runners 2020-04-28 11:14:31 +02:00
Moto Ishizawa 3b8ea2991c Share runner's working directory with docker sidecar 2020-04-24 22:36:27 +09:00
Moto Ishizawa 3ccc51433f Use github package to access the GitHub API 2020-04-13 22:28:07 +09:00
Yusuke Kuoka b411d37f2b fix: RunnerDeployment should clean up old RunnerReplicaSets ASAP
Since the initial implementation of RunnerDeployment and until this change, any update to a runner deployment has been leaving old runner replicasets until the next resync interval. This fixes that, by continusouly retrying the reconcilation 10 seconds later to see if there are any old runner replicasets that can be removed.

In addition to that, the cleanup of old runner replicasets has been improved to be deferred until all the runners of the newest replica set to be available. This gives you hopefully zero or at less downtime updates of runner deployments.

Fixes #24
2020-04-04 07:55:12 +09:00
Moto Ishizawa 5efdc6efe6 Add permission to create/patch events resource 2020-03-27 23:25:37 +09:00
Aleksandr Stepanov d4c849ee09
Add variants of PodTemplate spec fields into the Runner spec (#7)
Resolves #5
Fixes #11
Fixes #12

Changes:

* Added podtemplate spec

* Rework pod creation logic

* Added most using podspecs

* Added copy of podspec

* Fixed Github List method

* Fixed containers

* Added ability to override runner's containers

* Added ability to override runner's containers

* Added ability to override runner's containers

* Update controllers/runner_controller.go

Co-Authored-By: Moto Ishizawa <summerwind.jp@gmail.com>

* Remove optional restartpolicy

* Changed naming convention

Co-authored-by: Moto Ishizawa <summerwind.jp@gmail.com>
2020-03-20 22:50:50 +09:00
Moto Ishizawa b1da3092fb Revert test comment 2020-03-15 21:55:38 +09:00
Moto Ishizawa 5aeae6a152 Fix generate name 2020-03-15 21:50:45 +09:00
Moto Ishizawa a897eee402 Fix RBAC role for RunnerDeployment and RunnerReplicaSet 2020-03-15 18:08:11 +09:00
Yusuke Kuoka c19a1b3ffe Rename RunnerSet to RunnerReplicaSet
To hand over the name `RunnerSet` to the new StatefulSet-based implementation of that being developed at #4
2020-03-10 09:14:34 +09:00
Yusuke Kuoka 4b6806fda3 fix: RunnerDeployment was not working at all after field indexing enhancement 2020-03-06 08:53:46 +09:00
Yusuke Kuoka 70a8c3db0d chore: Tidy up the deployment controller code
Removes an unnecessary condition from the deployment controller code. We assumed that the client would return a not-found error on an empty runnerset list  it is clearly not the case.
2020-03-03 10:50:52 +09:00
Yusuke Kuoka 31fb7cc113 feat: Efficient runner set looks up for the deployment controller
Enhances the deployment controller to use indexed lookups against runner sets for more scalability.
2020-03-03 10:45:39 +09:00
Yusuke Kuoka 9d634d88ff feat: RunnerDeployment
Adds the initial version of RunnerDeployment that is intended to manage RunnerSets(#1), like Deployment manages ReplicaSets.

This is the initial version and therefore is bare bone. The only update strategy it supports is `Recreate`, which recreates the underlying RunnerSet when the runner template changes. I'd like to add `RollingUpdate` strategy once this is merged.

This depends on #1 so the diff contains that of #1, too. Please see only the latest commit for review.

Also see https://github.com/mumoshu/actions-runner-controller-ci/runs/471329823?check_suite_focus=true to confirm that `make tests` is passing after changes made in this commit.
2020-02-27 10:46:23 +09:00
Yusuke Kuoka d8d829b734 feat: RunnerSets
RunnerSet is basically ReplicaSet for Runners.

It is responsible for maintaining number of runners to match the desired one. That is, it creates missing runners from `.Spec.Template` and deletes redundant runners.

Similar to ReplicaSet, this does not support rolling update of runners on its own. We might want to later add `RunnerDeployment` for that. But that's another story.
2020-02-24 10:32:44 +09:00
Moto Ishizawa 829a167303 Add 'env' field to runner resource 2020-02-06 22:09:07 +09:00
Moto Ishizawa a436216d5e Implement finalizer 2020-02-03 21:35:01 +09:00
Moto Ishizawa 497ddba82d Record event of runner resource 2020-02-03 18:40:59 +09:00
Moto Ishizawa 13ef78ce20 Sync runner status with pod status 2020-02-03 17:25:38 +09:00
Moto Ishizawa e6952f5ca1 Add '-runner-image' and '-docker-image' flags 2020-02-03 16:56:52 +09:00
Moto Ishizawa 2f69329fce Fix permission for pods 2020-02-02 19:49:10 +09:00
Moto Ishizawa 7db5340595 Update CRD validation and RBAC 2020-02-02 10:30:42 +09:00
Moto Ishizawa 960befeade Restart runner pod on completion 2020-02-01 00:06:30 +09:00
Moto Ishizawa 65f479d749 Always restart container 2020-01-31 22:50:08 +09:00
Moto Ishizawa 3f02153257 Ignore pods being deleted 2020-01-31 22:47:53 +09:00
Moto Ishizawa ec3e7de701 Add docker container to a runner pod 2020-01-30 23:52:40 +09:00
Moto Ishizawa 5b887add53 Reconcile when runner pod is updated 2020-01-29 23:12:07 +09:00
Moto Ishizawa aaf6b0bcae Implement runner controller 2020-01-28 21:58:01 +09:00
Moto Ishizawa 04a8e562c0 Initial commit 2020-01-28 15:03:23 +09:00