actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Yusuke Kuoka	55ff4de79a	Remove legacy GitHub API cache of HRA.Status.CachedEntries (#1192 ) * Remove legacy GitHub API cache of HRA.Status.CachedEntries We migrated to the transport-level cache introduced in #1127 so not only this is useless, it is harder to deduce which cache resulted in the desired replicas number calculated by HRA. Just remove the legacy cache to keep it simple and easy to understand. * Deprecate githubAPICacheDuration helm chart value and the --github-api-cache-duration as well * Fix integration test	2022-03-08 19:05:43 +09:00
Yusuke Kuoka	5f2b5327f7	integration: Reduce error logs to ease debugging	2022-03-03 18:47:54 +09:00
Yusuke Kuoka	a6f0e0008f	Make unregistration timeout and retry delay configurable in integration tests	2022-02-20 12:05:34 +00:00
Felipe Galindo Sanchez	d0d316252e	Option to consider runner group visibility on scale based on webhook (#1062 ) This will work on GHES but GitHub Enterprise Cloud due to excessive GitHub API calls required. More work is needed, like adding a cache layer to the GitHub client, to make it usable on GitHub Enterprise Cloud. Fixes additional cases from https://github.com/actions-runner-controller/actions-runner-controller/pull/1012 If GitHub auth is provided in the webhooks controller then runner groups with custom visibility are supported. Otherwise, all runner groups will be assumed to be visible to all repositories `getScaleUpTargetWithFunction()` will check if there is an HRA available with the following flow: 1. Search for repository HRAs - if so it ends here 2. Get available HRAs in k8s 3. Compute visible runner groups a. If GitHub auth is provided - get all the runner groups that are visible to the repository of the incoming webhook using GitHub API calls. b. If GitHub auth is not provided - assume all runner groups are visible to all repositories 4. Search for default organization runners (a.k.a runners from organization's visible default runner group) with matching labels 5. Search for default enterprise runners (a.k.a runners from enterprise's visible default runner group) with matching labels 6. Search for custom organization runner groups with matching labels 7. Search for custom enterprise runner groups with matching labels Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-02-16 19:08:56 +09:00
Patrick Ellis	ea2dbc2807	Update go-github from v37 -> v39 (#925 )	2021-12-11 21:43:40 +09:00
Tarasovych	7008b0c257	feat: Organization RunnerDeployment with webhook-based autoscaling only for certain repositories (#766 ) Resolves #765 Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-08-31 09:46:36 +09:00
Yusuke Kuoka	f858e2e432	Add POC of GitHub Webhook Delivery Forwarder (#682 ) * Add POC of GitHub Webhook Delivery Forwarder * multi-forwarder and ctrl-c existing and fix for non-woring http post * Rename source files * Extract signal handling into a dedicated source file * Faster ctrl-c handling * Enable automatic creation of repo hook on startup * Add support for forwarding org hook deliveries * Set hook secret on hook creation via envvar (HOOK_SECRET) * Fix org hook support * Fix HOOK_SECRET for consistency * Refactor to prepare for custom log position provider * Refactor to extract inmemory log position provider * Add configmap-based log position provider * Rename githubwebhookdeliveryforwarder to hookdeliveryforwarder * Refactor to rename LogPositionProvider to Checkpointer and extract ConfigMap checkpointer into a dedicated pkg * Refactor to extract logger initialization * Add hookdeliveryforwarder README and bump go-github to unreleased ver	2021-07-14 10:18:55 +09:00
Yusuke Kuoka	f19e7ea8a8	chore: Upgrade go-github to v36 (#681 )	2021-07-04 17:43:52 +09:00
Yusuke Kuoka	8b90b0f0e3	Clean up import list (#645 ) Resolves #644	2021-06-22 17:55:06 +09:00
Yusuke Kuoka	9e4dbf497c	feat: RunnerSet backed by StatefulSet (#629 ) * feat: RunnerSet backed by StatefulSet Unlike a runner deployment, a runner set can manage a set of stateful runners by combining a statefulset and an admission webhook that mutates statefulset-managed pods with required envvars and registration tokens. Resolves #613 Ref #612 * Upgrade controller-runtime to 0.9.0 * Bump Go to 1.16.x following controller-runtime 0.9.0 * Upgrade kubebuilder to 2.3.2 for updated etcd and apiserver following local setup * Fix startup failure due to missing LeaderElectionID * Fix the issue that any pods become unable to start once actions-runner-controller got failed after the mutating webhook has been registered * Allow force-updating statefulset * Fix runner container missing work and certs-client volume mounts and DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false * Fix runnerset-controller not applying statefulset.spec.template.spec changes when there were no changes in runnerset spec * Enable running acceptance tests against arbitrary kind cluster * RunnerSet supports non-ephemeral runners only today * fix: docker-build from root Makefile on intel mac * fix: arch check fixes for mac and ARM * ci: aligning test data format and patching checks * fix: removing namespace in test data * chore: adding more ignores * chore: removing leading space in shebang * Re-add metrics to org hra testdata * Bump cert-manager to v1.1.1 and fix deploy.sh Co-authored-by: toast-gear <15716903+toast-gear@users.noreply.github.com> Co-authored-by: Callum James Tait <callum.tait@photobox.com>	2021-06-22 17:10:09 +09:00
Yusuke Kuoka	dbd7b486d2	feat: Support for scaling from/to zero (#465 ) This is an attempt to support scaling from/to zero. The basic idea is that we create a one-off "registration-only" runner pod on RunnerReplicaSet being scaled to zero, so that there is one "offline" runner, which enables GitHub Actions to queue jobs instead of discarding those. GitHub Actions seems to immediately throw away the new job when there are no runners at all. Generally, having runners of any status, `busy`, `idle`, or `offline` would prevent GitHub actions from failing jobs. But retaining `busy` or `idle` runners means that we need to keep runner pods running, which conflicts with our desired to scale to/from zero, hence we retain `offline` runners. In this change, I enhanced the runnerreplicaset controller to create a registration-only runner on very beginning of its reconciliation logic, only when a runnerreplicaset is scaled to zero. The runner controller creates the registration-only runner pod, waits for it to become "offline", and then removes the runner pod. The runner on GitHub stays `offline`, until the runner resource on K8s is deleted. As we remove the registration-only runner pod as soon as it registers, this doesn't block cluster-autoscaler. Related to #447	2021-05-02 16:11:36 +09:00
Yusuke Kuoka	07f822bb08	Do include Runner controller in integration test (#409 ) So that we could catch bugs in runner controller like seen in #398, #404, and #407. Ref #400	2021-03-19 16:14:15 +09:00
Yusuke Kuoka	5530030c67	Disable metrics-based autoscaling by default when scaleUpTriggers are enabled (#391 ) Relates to https://github.com/summerwind/actions-runner-controller/pull/379#discussion_r592813661 Relates to https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-793266609 When you defined HRA.Spec.ScaleUpTriggers[] but HRA.Spec.Metrics[], the HRA controller will now enable ScaleUpTriggers alone and insteaed of automatically enabling TotalNumberOfQueuedAndInProgressWorkflowRuns. This allows you to use ScaleUpTriggers alone, so that the autoscaling is done without calling GitHub API at all, which should grealy decrease the change of GitHub API calls get rate-limited.	2021-03-14 11:03:00 +09:00
Yusuke Kuoka	728829be7b	Fix panic on scaling organizational runners (#381 ) Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-793287133	2021-03-09 15:03:47 +09:00
Yusuke Kuoka	1b8a656051	Use --watch-namespace flag to restrict the namespace to watch Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-793172995	2021-03-09 09:46:21 +09:00
Hiroshi Muraoka	11e58fcc41	Manage runner with label (#355 ) * Update RunnerDeploymentSpec to have Selector field Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update RunnerReplicaSetSpec to have Selector field Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Add CloneSelectorAndAddLabel to add Selector field Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Fix tests Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Use label to find RunnerReplicaSet/Runner Signed-off-by: binoue <banji-inoue@cybozu.co.jp> * Update controller-gen versions in CRD Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update autoscaler to list Pods with labels Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Add debug log Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Modify RunnerDeployment tests Signed-off-by: binoue <banji-inoue@cybozu.co.jp> * Modify RunnerReplicaset test Signed-off-by: binoue <banji-inoue@cybozu.co.jp> * Modify integration test Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Use RunnerDeployment Template Labels as the default selector for backward compatibility * Fix labeling Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update func in Eventually to return (int, error) Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update RunnerDeployment controller not to use label selector Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Fix potential replicaset controller breakage on replicaset created before v0.17.0 * Fix errors on existing runner replica sets * Ensure RunnerReplicaSet Spec Selector addition does not break controller * Ensure RunnerDeployment Template.Spec.Labels change does result in template hash change * Fix comment Co-authored-by: binoue <banji-inoue@cybozu.co.jp> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-03-05 10:15:39 +09:00
Yusuke Kuoka	584590e97c	Use patch instead of update to alleviate HRA conflict on webhook (#358 ) We sometimes see that integration test fails due to runner replicas not meeting the expected number in a timely manner. After investigating a bit, this turned out to be due to that HRA updates on webhook-based autoscaler and HRA controller are conflicting. This changes the controllers to use Patch instead of Update to make conflicts less likely to happen. I have also updated the hra controller to use Patch when updating RunnerDeployment, too. Overall, these changes should make the webhook-based autoscaling more reliable due to less conflicts.	2021-02-26 10:17:09 +09:00
Yusuke Kuoka	598dd1d9fe	Fix incorrect DESIRED on `kubectl get hra (#353 ) `kubectl get horizontalrunnerautoscalers.actions.summerwind.dev` shows HRA.status.desiredReplicas as the DESIRED count. However the value had been not taking capacityReservations into account, which resulted in showing incorrect count when you used webhook-based autoscaler, or capacityReservations API directly. This fixes that.	2021-02-25 10:32:09 +09:00
Yusuke Kuoka	9da123ae5e	Fix integration test flakiness (#351 ) Ref https://github.com/summerwind/actions-runner-controller/pull/345#issuecomment-785015406	2021-02-25 09:30:32 +09:00
Yusuke Kuoka	991535e567	Fix panic on webhook for user-owned repository (#344 ) * Fix panic on webhook for user-owned repository Follow-up for #282 and #334	2021-02-23 08:05:25 +09:00
Hidetake Iwata	dfbe53dcca	Fix webhook payload in integration test	2021-02-20 21:08:23 +09:00
Yusuke Kuoka	ebc3970b84	Add integration test for autoscaling on check_run webhook event	2021-02-19 10:33:04 +09:00
Yusuke Kuoka	2fdf35ac9d	Refactor integration test to use helpers (#320 ) This should make the test code a bit more DRY and readable.	2021-02-17 10:23:35 +09:00
Yusuke Kuoka	eb2eaf8130	Fix TotalNumberOfQueuedAndInProgressWorkflowRuns to work with a lot of remaining `completed` jobs (#316 ) I have heard from some user that they have hundred thousands of `status=completed` workflow runs in their repository which effectively blocked TotalNumberOfQueuedAndInProgressWorkflowRuns from working because of GitHub API rate limit due to excessive paginated requests. This fixes that by separating list-workflow-runs calls to two - one for `queued` and one for `in_progress`, which can make the minimum API call from 1 to 2, but allows it to work regardless of number of remaining `completed` workflow runs.	2021-02-16 18:55:55 +09:00
Yusuke Kuoka	ab1c39de57	feat: HorizontalRunnerAutoscaler Webhook server (#282 ) * feat: HorizontalRunnerAutoscaler Webhook server This introduces a Webhook server that responds GitHub `check_run`, `pull_request`, and `push` events by scaling up matched HorizontalRunnerAutoscaler by 1 replica. This allows you to immediately add "resource slack" for future GitHub Actions job runs, without waiting next sync period to add insufficient runners. This feature is highly inspired by https://github.com/philips-labs/terraform-aws-github-runner. terraform-aws-github-runner can manage one set of runners per deployment, where actions-runner-controller with this feature can manage as many sets of runners as you declare with HorizontalRunnerAutoscaler and RunnerDeployment pairs. On each GitHub event received, the webhook server queries repository-wide and organizational runners from the cluster and searches for the single target to scale up. The webhook server tries to match HorizontalRunnerAutoscaler.Spec.ScaleUpTriggers[].GitHubEvent.[CheckRun\|Push\|PullRequest] against the event and if it finds only one HRA, it is the scale target. If none or two or more targets are found for repository-wide runners, it does the same on organizational runners. Changes: * Fix integration test * Update manifests * chart: Add support for github webhook server * dockerfile: Include github-webhook-server binary * Do not import unversioned go-github * Update README	2021-02-07 17:37:27 +09:00
Dan Webb	dcf8524b5c	Adds RUNNER_GROUP argument to the runner registration (#157 ) * Adds RUNNER_GROUP argument to the runner registration Adds the ability to register a runner to a predefined runner_group Resolves #137 * Update README with runner group example - Updates the README with instructions of how to add the runner to a group - Fix code fencing for shell and yaml blocks in the README - Use consistent bullet points (dash not asterisk)	2020-11-10 17:15:54 +09:00
Helder Moreira	7a2fa7fbce	runner-controller: do not delete runner if it is busy (#103 ) Currently, after refreshing the token, the controller re-creates the runner with the new token. This results in jobs being interrupted. This PR makes sure the pod is not restarted if it is busy. Closes #74	2020-10-05 09:06:37 +09:00
Yusuke Kuoka	4733edc20d	Add scaling-down scenario to integration test	2020-08-02 16:10:01 +09:00
Yusuke Kuoka	e2164f9946	Fix integration test bugs and do verify scaling out	2020-08-02 10:34:58 +09:00
Yusuke Kuoka	3c3077a11c	Fix crash on startup after the HRDA addition This is a follow-up for #66. The reconciler for the new HorizontalRunnerDeploymentAutoscaler had a terrible flaw that prevented the controller to fail launching due to an error like: ``` indexer conflict: map[field:.metadata.controller:{}] ``` This fixes that, while adding `integration_test.go` to verify its actually fixed and prevent regression in the future.	2020-07-29 21:20:46 +09:00

30 Commits