actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Yusuke Kuoka	618276e3d3	Enhance support for multi-tenancy (#1371 ) This enhances every ARC controller and the various K8s custom resources so that the user can now configure a custom GitHub API credentials (that is different from the default one configured per the ARC instance). Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/1067#issuecomment-1043716646	2022-07-12 09:45:00 +09:00
Yusuke Kuoka	4053ab3e11	Fix label support for TotalNumberOfQueuedAndInProgressWorkflowRuns metric (#1390 ) In #1373 we made two mistakes: - We mistakenly checked if all the runner labels are included in the job labels and only after that it marked the target as eligible for scale. It should definitely be the opposite! - We mistakenly checked for the existence of `self-hosted` labe l in the job. [Although it should be a good practice to explicitly say `runs-on: ["self-hosted", "custom-label"]`](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-using-labels-for-runner-selection), that's not a requirement so we should code accordingly. The consequence of those two mistakes was that, for example, jobs with `self-hosted` + `custom` labels didn't result in scaling runner with `self-hosted` + `custom` + `custom2`. This should fix that. Ref #1056 Ref #1373	2022-04-27 16:24:21 +01:00
Yusuke Kuoka	a622968ff2	feat: Add label support to TotalNumberOfQueuedAndInProgressWorkflowRuns metric (#1373 ) This is an implementation for my intepretation of the "bronze" case proposed in #1056 Ref #1056	2022-04-24 14:41:34 +09:00
Yusuke Kuoka	3ba7179995	Do not enable TotalNumberOfQueuedAndInProgressWorkflowRuns by default (#1372 ) Previously, omitting hra.spec.metrics at all resulted in ARC enabling the TotalNumberOfQueuedAndInProgressWorkflowRuns. That turned out not a good idea so since this change it is not enabled by default. Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/728	2022-04-24 13:36:42 +09:00
Callum Tait	24aae58dbc	feat: default scale down flag (#963 ) Resolves #899 Co-authored-by: Callum <callum@domain.com> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-04-20 11:09:09 +09:00
Yusuke Kuoka	55ff4de79a	Remove legacy GitHub API cache of HRA.Status.CachedEntries (#1192 ) * Remove legacy GitHub API cache of HRA.Status.CachedEntries We migrated to the transport-level cache introduced in #1127 so not only this is useless, it is harder to deduce which cache resulted in the desired replicas number calculated by HRA. Just remove the legacy cache to keep it simple and easy to understand. * Deprecate githubAPICacheDuration helm chart value and the --github-api-cache-duration as well * Fix integration test	2022-03-08 19:05:43 +09:00
Yusuke Kuoka	98da4c2adb	Add HRA support for RunnerSet (#647 ) `HRA.Spec.ScaleTargetRef.Kind` is added to denote that the scale-target is a RunnerSet. It defaults to `RunnerDeployment` for backward compatibility. ``` apiVersion: actions.summerwind.dev/v1alpha1 kind: HorizontalRunnerAutoscaler metadata: name: myhra spec: scaleTargetRef: kind: RunnerSet name: myrunnerset ``` Ref #629 Ref #613 Ref #612	2021-06-23 20:25:03 +09:00
Yusuke Kuoka	8b90b0f0e3	Clean up import list (#645 ) Resolves #644	2021-06-22 17:55:06 +09:00
Yusuke Kuoka	9e4dbf497c	feat: RunnerSet backed by StatefulSet (#629 ) * feat: RunnerSet backed by StatefulSet Unlike a runner deployment, a runner set can manage a set of stateful runners by combining a statefulset and an admission webhook that mutates statefulset-managed pods with required envvars and registration tokens. Resolves #613 Ref #612 * Upgrade controller-runtime to 0.9.0 * Bump Go to 1.16.x following controller-runtime 0.9.0 * Upgrade kubebuilder to 2.3.2 for updated etcd and apiserver following local setup * Fix startup failure due to missing LeaderElectionID * Fix the issue that any pods become unable to start once actions-runner-controller got failed after the mutating webhook has been registered * Allow force-updating statefulset * Fix runner container missing work and certs-client volume mounts and DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false * Fix runnerset-controller not applying statefulset.spec.template.spec changes when there were no changes in runnerset spec * Enable running acceptance tests against arbitrary kind cluster * RunnerSet supports non-ephemeral runners only today * fix: docker-build from root Makefile on intel mac * fix: arch check fixes for mac and ARM * ci: aligning test data format and patching checks * fix: removing namespace in test data * chore: adding more ignores * chore: removing leading space in shebang * Re-add metrics to org hra testdata * Bump cert-manager to v1.1.1 and fix deploy.sh Co-authored-by: toast-gear <15716903+toast-gear@users.noreply.github.com> Co-authored-by: Callum James Tait <callum.tait@photobox.com>	2021-06-22 17:10:09 +09:00
Yusuke Kuoka	cb14d7530b	Add HRA printer column "SCHEDULE" (#561 ) Adds a column to help the operator see if they configured HRA.Spec.ScheduledOverrides correctly, in a form of "next override schedule recognized by the controller": ``` $ k get horizontalrunnerautoscaler NAME MIN MAX DESIRED SCHEDULE actions-runner-aos-autoscaler 0 5 0 org 0 5 0 min=0 time=2021-05-21 15:00:00 +0000 UTC ``` Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/484	2021-05-22 08:29:53 +09:00
Yusuke Kuoka	f6ab66c55b	Do not delay min/maxReplicas propagation from HRA to RD due to caching (#406 ) As part of #282, I have introduced some caching mechanism to avoid excessive GitHub API calls due to the autoscaling calculation involving GitHub API calls is executed on each Webhook event. Apparently, it was saving the wrong value in the cache- The value was one after applying `HRA.Spec.{Max,Min}Replicas` so manual changes to {Max,Min}Replicas doesn't affect RunnerDeployment.Spec.Replicas until the cache expires. This isn't what I had wanted. This patch fixes that, by changing the value being cached to one before applying {Min,Max}Replicas. Additionally, I've also updated logging so that you observe which number was fetched from cache, and what number was suggested by either TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy, and what was the final number used as the desired-replicas(after applying {Min,Max}Replicas). Follow-up for #282	2021-03-19 12:58:02 +09:00
Hiroshi Muraoka	11e58fcc41	Manage runner with label (#355 ) * Update RunnerDeploymentSpec to have Selector field Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update RunnerReplicaSetSpec to have Selector field Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Add CloneSelectorAndAddLabel to add Selector field Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Fix tests Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Use label to find RunnerReplicaSet/Runner Signed-off-by: binoue <banji-inoue@cybozu.co.jp> * Update controller-gen versions in CRD Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update autoscaler to list Pods with labels Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Add debug log Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Modify RunnerDeployment tests Signed-off-by: binoue <banji-inoue@cybozu.co.jp> * Modify RunnerReplicaset test Signed-off-by: binoue <banji-inoue@cybozu.co.jp> * Modify integration test Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Use RunnerDeployment Template Labels as the default selector for backward compatibility * Fix labeling Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update func in Eventually to return (int, error) Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update RunnerDeployment controller not to use label selector Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Fix potential replicaset controller breakage on replicaset created before v0.17.0 * Fix errors on existing runner replica sets * Ensure RunnerReplicaSet Spec Selector addition does not break controller * Ensure RunnerDeployment Template.Spec.Labels change does result in template hash change * Fix comment Co-authored-by: binoue <banji-inoue@cybozu.co.jp> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-03-05 10:15:39 +09:00
Yusuke Kuoka	eb2eaf8130	Fix TotalNumberOfQueuedAndInProgressWorkflowRuns to work with a lot of remaining `completed` jobs (#316 ) I have heard from some user that they have hundred thousands of `status=completed` workflow runs in their repository which effectively blocked TotalNumberOfQueuedAndInProgressWorkflowRuns from working because of GitHub API rate limit due to excessive paginated requests. This fixes that by separating list-workflow-runs calls to two - one for `queued` and one for `in_progress`, which can make the minimum API call from 1 to 2, but allows it to work regardless of number of remaining `completed` workflow runs.	2021-02-16 18:55:55 +09:00
Yusuke Kuoka	ab1c39de57	feat: HorizontalRunnerAutoscaler Webhook server (#282 ) * feat: HorizontalRunnerAutoscaler Webhook server This introduces a Webhook server that responds GitHub `check_run`, `pull_request`, and `push` events by scaling up matched HorizontalRunnerAutoscaler by 1 replica. This allows you to immediately add "resource slack" for future GitHub Actions job runs, without waiting next sync period to add insufficient runners. This feature is highly inspired by https://github.com/philips-labs/terraform-aws-github-runner. terraform-aws-github-runner can manage one set of runners per deployment, where actions-runner-controller with this feature can manage as many sets of runners as you declare with HorizontalRunnerAutoscaler and RunnerDeployment pairs. On each GitHub event received, the webhook server queries repository-wide and organizational runners from the cluster and searches for the single target to scale up. The webhook server tries to match HorizontalRunnerAutoscaler.Spec.ScaleUpTriggers[].GitHubEvent.[CheckRun\|Push\|PullRequest] against the event and if it finds only one HRA, it is the scale target. If none or two or more targets are found for repository-wide runners, it does the same on organizational runners. Changes: * Fix integration test * Update manifests * chart: Add support for github webhook server * dockerfile: Include github-webhook-server binary * Do not import unversioned go-github * Update README	2021-02-07 17:37:27 +09:00
Juho Saarinen	40c5050978	Added support for other than public GitHub URL (#146 ) Refactoring a bit	2020-10-28 22:15:53 +09:00
Dominic LoBue	a63860029a	Prefer autoscaling based on jobs rather than workflows if available (#114 ) Adds the ability to autoscale on jobs in addition to workflows. We fall back to using workflow metrics if job details are not present. Resolves #89	2020-10-08 09:00:44 +09:00
Yusuke Kuoka	ae30648985	feat: Use HorizontalRunnerAutoscaler for autoscaling	2020-07-27 20:33:44 +09:00
Yusuke Kuoka	eca6917c6a	feat: Organizational RunnerDeployment Autoscaling Enhances #57 to add support for organizational runners. As GitHub Actions does not have an appropriate API for this, this is the spec you need: ``` apiVersion: actions.summerwind.dev/v1alpha1 kind: RunnerDeployment metadata: name: myrunners spec: minReplicas: 1 maxReplicas: 3 autoscaling: metrics: - type: TotalNumberOfQueuedAndProgressingWorkflowRuns repositories: # Assumes that you have `github.com/myorg/myrepo1` repo - myrepo1 - myrepo2 template: spec: organization: myorg ``` It works by collecting "in_progress" and "queued" workflow runs for the repositories `myrepo1` and `myrepo2` to autoscale the number of replicas, assuming you have this organizational runner deployment only for those two repositories. For example, if `myrepo1` had 1 `in_progress` and 2 `queued` workflow runs, and `myrepo2` had 4 `in_progress` and 8 `queued` workflow runs at the time of running the reconcilation loop on the runner deployment, it will scale replicas to 1 + 2 + 4 + 8 = 15. Perhaps we might be better add a kind of "ratio" setting so that you can configure the controller to create e.g. 2x runners than demanded. But that's another story. Ref #10	2020-07-03 09:12:47 +09:00
KUOKA Yusuke	5bb2694349	feat: Repository-wide RunnerDeployment Autoscaling (#57 ) * feat: Repository-wide RunnerDeployment Autoscaling This adds `maxReplicas` and `minReplicas` to the RunnerDeploymentSpec. If and only if both fields are set, the controller computes and sets desired `replicas` automatically depending on the demand. The number of demanded runner replicas is computed by `queued workflow runs + in_progress workflow runs` for the repository. The support for organizational runners is not included. Ref https://github.com/summerwind/actions-runner-controller/issues/10	2020-06-27 17:26:46 +09:00

19 Commits