actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Yusuke Kuoka	b8e65aa857	Prevent unnecessary ephemeral runner recreations	2022-02-20 13:45:42 +00:00
renovate[bot]	c64000e11c	fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.0 (#740 ) * fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.0 * Fix dependencies and bump Go to 1.17 so that it builds after controller-runtime 0.11.0 upgrade * Regenerate manifests with the latest K8s dependencies Co-authored-by: Renovate Bot <bot@renovateapp.com> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-12-17 09:06:55 +09:00
Yusuke Kuoka	98da4c2adb	Add HRA support for RunnerSet (#647 ) `HRA.Spec.ScaleTargetRef.Kind` is added to denote that the scale-target is a RunnerSet. It defaults to `RunnerDeployment` for backward compatibility. ``` apiVersion: actions.summerwind.dev/v1alpha1 kind: HorizontalRunnerAutoscaler metadata: name: myhra spec: scaleTargetRef: kind: RunnerSet name: myrunnerset ``` Ref #629 Ref #613 Ref #612	2021-06-23 20:25:03 +09:00
Yusuke Kuoka	8b90b0f0e3	Clean up import list (#645 ) Resolves #644	2021-06-22 17:55:06 +09:00
Yusuke Kuoka	9e4dbf497c	feat: RunnerSet backed by StatefulSet (#629 ) * feat: RunnerSet backed by StatefulSet Unlike a runner deployment, a runner set can manage a set of stateful runners by combining a statefulset and an admission webhook that mutates statefulset-managed pods with required envvars and registration tokens. Resolves #613 Ref #612 * Upgrade controller-runtime to 0.9.0 * Bump Go to 1.16.x following controller-runtime 0.9.0 * Upgrade kubebuilder to 2.3.2 for updated etcd and apiserver following local setup * Fix startup failure due to missing LeaderElectionID * Fix the issue that any pods become unable to start once actions-runner-controller got failed after the mutating webhook has been registered * Allow force-updating statefulset * Fix runner container missing work and certs-client volume mounts and DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false * Fix runnerset-controller not applying statefulset.spec.template.spec changes when there were no changes in runnerset spec * Enable running acceptance tests against arbitrary kind cluster * RunnerSet supports non-ephemeral runners only today * fix: docker-build from root Makefile on intel mac * fix: arch check fixes for mac and ARM * ci: aligning test data format and patching checks * fix: removing namespace in test data * chore: adding more ignores * chore: removing leading space in shebang * Re-add metrics to org hra testdata * Bump cert-manager to v1.1.1 and fix deploy.sh Co-authored-by: toast-gear <15716903+toast-gear@users.noreply.github.com> Co-authored-by: Callum James Tait <callum.tait@photobox.com>	2021-06-22 17:10:09 +09:00
Yusuke Kuoka	cb14d7530b	Add HRA printer column "SCHEDULE" (#561 ) Adds a column to help the operator see if they configured HRA.Spec.ScheduledOverrides correctly, in a form of "next override schedule recognized by the controller": ``` $ k get horizontalrunnerautoscaler NAME MIN MAX DESIRED SCHEDULE actions-runner-aos-autoscaler 0 5 0 org 0 5 0 min=0 time=2021-05-21 15:00:00 +0000 UTC ``` Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/484	2021-05-22 08:29:53 +09:00
Yusuke Kuoka	3cd124dce3	chore: Add debug logs for scheduledOverrides (#540 ) Follow-up for #515 Ref #484	2021-05-11 17:30:22 +09:00
Yusuke Kuoka	0e0f385f72	Experimental support for ScheduledOverrides (#515 ) This adds the initial version of ScheduledOverrides to HorizontalRunnerAutoscaler. `MinReplicas` overriding should just work. When there are two or more ScheduledOverrides, the earliest one that matched is activated. Each ScheduledOverride can be recurring or one-time. If you have two or more ScheduledOverrides, only one of them should be one-time. And the one-time override should be the earliest item in the list to make sense. Tests will be added in another commit. Logging improvements and additional observability in HRA.Status will also be added in yet another commits. Ref #484	2021-05-03 23:31:17 +09:00
Christoph Brand	a18ac330bb	feature(controller): allow autoscaler to scale down to 0 (#447 )	2021-05-02 16:46:51 +09:00
Hidetake Iwata	3a0332dfdc	Add metrics of RunnerDeployment and HRA (#408 ) * Add metrics of RunnerDeployment and HRA * Use kube-state-metrics-style label names	2021-03-19 16:14:02 +09:00
Yusuke Kuoka	f6ab66c55b	Do not delay min/maxReplicas propagation from HRA to RD due to caching (#406 ) As part of #282, I have introduced some caching mechanism to avoid excessive GitHub API calls due to the autoscaling calculation involving GitHub API calls is executed on each Webhook event. Apparently, it was saving the wrong value in the cache- The value was one after applying `HRA.Spec.{Max,Min}Replicas` so manual changes to {Max,Min}Replicas doesn't affect RunnerDeployment.Spec.Replicas until the cache expires. This isn't what I had wanted. This patch fixes that, by changing the value being cached to one before applying {Min,Max}Replicas. Additionally, I've also updated logging so that you observe which number was fetched from cache, and what number was suggested by either TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy, and what was the final number used as the desired-replicas(after applying {Min,Max}Replicas). Follow-up for #282	2021-03-19 12:58:02 +09:00
Yusuke Kuoka	584590e97c	Use patch instead of update to alleviate HRA conflict on webhook (#358 ) We sometimes see that integration test fails due to runner replicas not meeting the expected number in a timely manner. After investigating a bit, this turned out to be due to that HRA updates on webhook-based autoscaler and HRA controller are conflicting. This changes the controllers to use Patch instead of Update to make conflicts less likely to happen. I have also updated the hra controller to use Patch when updating RunnerDeployment, too. Overall, these changes should make the webhook-based autoscaling more reliable due to less conflicts.	2021-02-26 10:17:09 +09:00
Yusuke Kuoka	d18884a0b9	Fix HRA expired cache entries not cleaned up (#357 ) Fixes #356	2021-02-26 09:54:24 +09:00
Yusuke Kuoka	598dd1d9fe	Fix incorrect DESIRED on `kubectl get hra (#353 ) `kubectl get horizontalrunnerautoscalers.actions.summerwind.dev` shows HRA.status.desiredReplicas as the DESIRED count. However the value had been not taking capacityReservations into account, which resulted in showing incorrect count when you used webhook-based autoscaler, or capacityReservations API directly. This fixes that.	2021-02-25 10:32:09 +09:00
Yusuke Kuoka	e44e53b88e	Fix failure while saving HRA status after running controller for a while (#348 ) Fixes #346	2021-02-24 11:20:21 +09:00
Yusuke Kuoka	ebc3970b84	Add integration test for autoscaling on check_run webhook event	2021-02-19 10:33:04 +09:00
Yusuke Kuoka	7d024a6c05	Fix "duplicate metrics collector registration attempted" errors at startup (#317 ) I have seen this error a lot in our integration test. It turned out due to https://github.com/kubernetes-sigs/controller-runtime/issues/484 and is being fixed with this change.	2021-02-16 18:51:33 +09:00
Yusuke Kuoka	ab1c39de57	feat: HorizontalRunnerAutoscaler Webhook server (#282 ) * feat: HorizontalRunnerAutoscaler Webhook server This introduces a Webhook server that responds GitHub `check_run`, `pull_request`, and `push` events by scaling up matched HorizontalRunnerAutoscaler by 1 replica. This allows you to immediately add "resource slack" for future GitHub Actions job runs, without waiting next sync period to add insufficient runners. This feature is highly inspired by https://github.com/philips-labs/terraform-aws-github-runner. terraform-aws-github-runner can manage one set of runners per deployment, where actions-runner-controller with this feature can manage as many sets of runners as you declare with HorizontalRunnerAutoscaler and RunnerDeployment pairs. On each GitHub event received, the webhook server queries repository-wide and organizational runners from the cluster and searches for the single target to scale up. The webhook server tries to match HorizontalRunnerAutoscaler.Spec.ScaleUpTriggers[].GitHubEvent.[CheckRun\|Push\|PullRequest] against the event and if it finds only one HRA, it is the scale target. If none or two or more targets are found for repository-wide runners, it does the same on organizational runners. Changes: * Fix integration test * Update manifests * chart: Add support for github webhook server * dockerfile: Include github-webhook-server binary * Do not import unversioned go-github * Update README	2021-02-07 17:37:27 +09:00
Yusuke Kuoka	1e466ad3df	Ensure controller-gen is up-to-date and the code and the manifests are in-sync Follow-up for #95 that added /finalizers subresource permission and #103 that upgraded controller-gen from 0.2.4 from 0.3.0	2020-10-06 09:23:03 +09:00
Yusuke Kuoka	50487bbb54	Fix the HRA controller name	2020-08-02 10:38:15 +09:00
Yusuke Kuoka	3c3077a11c	Fix crash on startup after the HRDA addition This is a follow-up for #66. The reconciler for the new HorizontalRunnerDeploymentAutoscaler had a terrible flaw that prevented the controller to fail launching due to an error like: ``` indexer conflict: map[field:.metadata.controller:{}] ``` This fixes that, while adding `integration_test.go` to verify its actually fixed and prevent regression in the future.	2020-07-29 21:20:46 +09:00
Yusuke Kuoka	ae30648985	feat: Use HorizontalRunnerAutoscaler for autoscaling	2020-07-27 20:33:44 +09:00

22 Commits