actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Yusuke Kuoka	728829be7b	Fix panic on scaling organizational runners (#381 ) Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-793287133	2021-03-09 15:03:47 +09:00
Yusuke Kuoka	1b8a656051	Use --watch-namespace flag to restrict the namespace to watch Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-793172995	2021-03-09 09:46:21 +09:00
Rob Whitby	1753fa3530	handle GET requests in webhook hra (#378 )	2021-03-09 08:46:27 +09:00
Yusuke Kuoka	4fa5315311	Fix possible flapping autoscale on runner update (#371 ) Addresses https://github.com/summerwind/actions-runner-controller/pull/355#discussion_r587199428	2021-03-05 10:21:20 +09:00
Hiroshi Muraoka	11e58fcc41	Manage runner with label (#355 ) * Update RunnerDeploymentSpec to have Selector field Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update RunnerReplicaSetSpec to have Selector field Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Add CloneSelectorAndAddLabel to add Selector field Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Fix tests Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Use label to find RunnerReplicaSet/Runner Signed-off-by: binoue <banji-inoue@cybozu.co.jp> * Update controller-gen versions in CRD Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update autoscaler to list Pods with labels Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Add debug log Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Modify RunnerDeployment tests Signed-off-by: binoue <banji-inoue@cybozu.co.jp> * Modify RunnerReplicaset test Signed-off-by: binoue <banji-inoue@cybozu.co.jp> * Modify integration test Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Use RunnerDeployment Template Labels as the default selector for backward compatibility * Fix labeling Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update func in Eventually to return (int, error) Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Update RunnerDeployment controller not to use label selector Signed-off-by: Hiroshi Muraoka <h.muraoka714@gmail.com> * Fix potential replicaset controller breakage on replicaset created before v0.17.0 * Fix errors on existing runner replica sets * Ensure RunnerReplicaSet Spec Selector addition does not break controller * Ensure RunnerDeployment Template.Spec.Labels change does result in template hash change * Fix comment Co-authored-by: binoue <banji-inoue@cybozu.co.jp> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-03-05 10:15:39 +09:00
Yusuke Kuoka	584590e97c	Use patch instead of update to alleviate HRA conflict on webhook (#358 ) We sometimes see that integration test fails due to runner replicas not meeting the expected number in a timely manner. After investigating a bit, this turned out to be due to that HRA updates on webhook-based autoscaler and HRA controller are conflicting. This changes the controllers to use Patch instead of Update to make conflicts less likely to happen. I have also updated the hra controller to use Patch when updating RunnerDeployment, too. Overall, these changes should make the webhook-based autoscaling more reliable due to less conflicts.	2021-02-26 10:17:09 +09:00
Yusuke Kuoka	d18884a0b9	Fix HRA expired cache entries not cleaned up (#357 ) Fixes #356	2021-02-26 09:54:24 +09:00
Yusuke Kuoka	e9eef04993	Fix old HRA capacity reservations not cleaned up (#354 ) Similar to #348 for #346, but for HRA.Spec.CapacityReservations usually modified by the webhook-based autoscaler controller. This patch tries to fix that by improving the webhook-based autoscaler controller to omit expired reservations on updating HRA spec.	2021-02-25 11:08:00 +09:00
Yusuke Kuoka	598dd1d9fe	Fix incorrect DESIRED on `kubectl get hra (#353 ) `kubectl get horizontalrunnerautoscalers.actions.summerwind.dev` shows HRA.status.desiredReplicas as the DESIRED count. However the value had been not taking capacityReservations into account, which resulted in showing incorrect count when you used webhook-based autoscaler, or capacityReservations API directly. This fixes that.	2021-02-25 10:32:09 +09:00
Yusuke Kuoka	9890a90e69	Improve webhook-based autoscaler log (#352 ) The controller had been writing confusing messages like the below on missing scale target: ``` Found too many scale targets: It must be exactly one to avoid ambiguity. Either set WatchNamespace for the webhook-based autoscaler to let it only find HRAs in the namespace, or update Repository or Organization fields in your RunnerDeployment resources to fix the ambiguity.{"scaleTargets": ""} ``` This fixes that, while improving many kinds of messages written while reconcilation, so that the error message is more actionable.	2021-02-25 10:07:41 +09:00
Yusuke Kuoka	9da123ae5e	Fix integration test flakiness (#351 ) Ref https://github.com/summerwind/actions-runner-controller/pull/345#issuecomment-785015406	2021-02-25 09:30:32 +09:00
Yusuke Kuoka	022007078e	Compact excessive error message on runnerreplicaset status update conflict (#350 ) We occasionally see logs like the below: ``` 2021-02-24T02:48:26.769ZERRORFailed to update runner status{"runnerreplicaset": "testns-244ol/example-runnerdeploy-j5wzf", "error": "Operation cannot be fulfilled on runnerreplicasets.actions.summerwind.dev \"example-runnerdeploy-j5wzf\": the object has been modified; please apply your changes to the latest version and try again"} github.com/go-logr/zapr.(zapLogger).Error /home/runner/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128 github.com/summerwind/actions-runner-controller/controllers.(RunnerReplicaSetReconciler).Reconcile /home/runner/work/actions-runner-controller/actions-runner-controller/controllers/runnerreplicaset_controller.go:207 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:256 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).worker /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211 k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1 /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:152 k8s.io/apimachinery/pkg/util/wait.JitterUntil /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:153 k8s.io/apimachinery/pkg/util/wait.Until /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:88 2021-02-24T02:48:26.769ZERRORcontroller-runtime.controllerReconciler error{"controller": "testns-244olrunnerreplicaset", "request": "testns-244ol/example-runnerdeploy-j5wzf", "error": "Operation cannot be fulfilled on runnerreplicasets.actions.summerwind.dev \"example-runnerdeploy-j5wzf\": the object has been modified; please apply your changes to the latest version and try again"} github.com/go-logr/zapr.(zapLogger).Error /home/runner/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:258 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker /home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211 k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1 /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:152 k8s.io/apimachinery/pkg/util/wait.JitterUntil /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:153 k8s.io/apimachinery/pkg/util/wait.Until /home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:88 ``` which can be compacted into one-liner, without the useless stack trace, without double-logging the same error from the logger and the controller.	2021-02-25 09:01:02 +09:00
Johannes Nicolai	31e5e61155	Log correct runner that was deleted (#349 )	2021-02-25 08:38:55 +09:00
Yusuke Kuoka	e44e53b88e	Fix failure while saving HRA status after running controller for a while (#348 ) Fixes #346	2021-02-24 11:20:21 +09:00
Yusuke Kuoka	991535e567	Fix panic on webhook for user-owned repository (#344 ) * Fix panic on webhook for user-owned repository Follow-up for #282 and #334	2021-02-23 08:05:25 +09:00
Johannes Nicolai	2d7fbbfb68	Handle offline runners gracefully (#341 ) * if a runner pod starts up with an invalid token, it will go in an infinite retry loop, appearing as RUNNING from the outside * normally, this error situation is detected because no corresponding runner objects exists in GitHub and the pod will get removed after registration timeout * if the GitHub runner object already existed before - e.g. because a finalizer was not properly run as part of a partial Kubernetes crash, the runner will always stay in a running mode, even updating the registration token will not kill the problematic pod * introducing RunnerOffline exception that can be handled in runner controller and replicaset controller * as runners are offline when a pod is completed and marked for restart, only do additional restart checks if no restart was already decided, making code a bit cleaner and saving GitHub API calls after each job completion	2021-02-22 10:08:04 +09:00
Hidetake Iwata	b0e74bebab	Fix index key to find HRA in GitHub webhook handler	2021-02-20 21:25:23 +09:00
Hidetake Iwata	dfbe53dcca	Fix webhook payload in integration test	2021-02-20 21:08:23 +09:00
Yusuke Kuoka	ebc3970b84	Add integration test for autoscaling on check_run webhook event	2021-02-19 10:33:04 +09:00
Hidetake Iwata	1ddcf6946a	Fix nil pointer error on received check_run event (#331 ) * Reproduce nil pointer error on received check_run event * Fix nil pointer error on received check_run event	2021-02-18 20:22:36 +09:00
Yusuke Kuoka	67f6de010b	feat: Common runner labels configurable per controller (#327 ) * feat: Common runner labels configurable per controller Ref #321	2021-02-18 20:19:08 +09:00
Yusuke Kuoka	2fdf35ac9d	Refactor integration test to use helpers (#320 ) This should make the test code a bit more DRY and readable.	2021-02-17 10:23:35 +09:00
Yusuke Kuoka	eb2eaf8130	Fix TotalNumberOfQueuedAndInProgressWorkflowRuns to work with a lot of remaining `completed` jobs (#316 ) I have heard from some user that they have hundred thousands of `status=completed` workflow runs in their repository which effectively blocked TotalNumberOfQueuedAndInProgressWorkflowRuns from working because of GitHub API rate limit due to excessive paginated requests. This fixes that by separating list-workflow-runs calls to two - one for `queued` and one for `in_progress`, which can make the minimum API call from 1 to 2, but allows it to work regardless of number of remaining `completed` workflow runs.	2021-02-16 18:55:55 +09:00
Yusuke Kuoka	7d024a6c05	Fix "duplicate metrics collector registration attempted" errors at startup (#317 ) I have seen this error a lot in our integration test. It turned out due to https://github.com/kubernetes-sigs/controller-runtime/issues/484 and is being fixed with this change.	2021-02-16 18:51:33 +09:00
Yusuke Kuoka	434823bcb3	`scale{Up,Down}Adjustment` to add/remove constant number of replicas on scaling (#315 ) * `scale{Up,Down}Adjustment` to add/remove constant number of replicas on scaling Ref #305 * Bump chart version	2021-02-16 17:16:26 +09:00
Yusuke Kuoka	f1db6af1c5	Add repository runners support for PercentageRunnersBusy-based autoscaling (#313 ) Resolves #258	2021-02-16 12:44:51 +09:00
Johannes Nicolai	2623140c9a	Make log message less scary (#311 ) * the reconciliation loop is often much faster than the runner startup, so changing runner not found related messages to debug and also add the possibility that the runner just needs more time	2021-02-16 09:55:55 +09:00
Johannes Nicolai	bc8bc70f69	Fix rate limit and runner registration logic (#309 ) * errors.Is compares all members of a struct to return true which never happened * switched to type check instead of exact value check * notRegistered was using double negation in if statement which lead to unregistering runners after the registration timeout	2021-02-15 09:36:49 +09:00
Johannes Nicolai	9c8d7305f1	Introduce pod deletion timeout and forcefully delete stuck pods (#307 ) * if a k8s node becomes unresponsive, the kube controller will soft delete all pods after the eviction time (default 5 mins) * as long as the node stays unresponsive, the pod will never leave the last status and hence the runner controller will assume that everything is fine with the pod and will not try to create new pods * this can result in a situation where a horizontal autoscaler thinks that none of its runners are currently busy and will not schedule any further runners / pods, resulting in a broken runner deployment until the runnerreplicaset is deleted or the node comes back online * introducing a pod deletion timeout (1 minute) after which the runner controller will try to reboot the runner and create a pod on a working node * use forceful deletion and requeue for pods that have been stuck for more than one minute in terminating state * gracefully handling race conditions if pod gets finally forcefully deleted within	2021-02-15 09:32:28 +09:00
Yusuke Kuoka	addcbfa7ee	Fix runner registration timeout (#301 ) Fixes #300	2021-02-12 10:00:20 +09:00
Yusuke Kuoka	bbb036e732	feat: Prevent blocking on transient runner registration failure (#297 ) This enhances the controller to recreate the runner pod if the corresponding runner has failed to register itself to GitHub within 10 minutes(currently hard-coded). It should alleviate #288 in case the root cause is some kind of transient failures(network unreliability, GitHub down, temporarly compute resource shortage, etc). Formerly you had to manually detect and delete such pods or even force-delete corresponding runners to unblock the controller. Since this enhancement, the controller does the pod deletion automatically after 10 minutes after pod creation, which result in the controller create another pod that might work. Ref #288	2021-02-09 10:17:52 +09:00
Yusuke Kuoka	9301409aec	fix: Paginate ListRepositoryWorkflowRuns (#295 ) When we used `QueuedAndInProgressWorkflowRuns`-based autoscaling, it only fetched and considered only the first 30 workflow runs at the reconcilation time. This may have resulted in unreliable scaling behaviour, like scale-in/out not happening when it was expected.	2021-02-09 10:13:53 +09:00
Yusuke Kuoka	ab1c39de57	feat: HorizontalRunnerAutoscaler Webhook server (#282 ) * feat: HorizontalRunnerAutoscaler Webhook server This introduces a Webhook server that responds GitHub `check_run`, `pull_request`, and `push` events by scaling up matched HorizontalRunnerAutoscaler by 1 replica. This allows you to immediately add "resource slack" for future GitHub Actions job runs, without waiting next sync period to add insufficient runners. This feature is highly inspired by https://github.com/philips-labs/terraform-aws-github-runner. terraform-aws-github-runner can manage one set of runners per deployment, where actions-runner-controller with this feature can manage as many sets of runners as you declare with HorizontalRunnerAutoscaler and RunnerDeployment pairs. On each GitHub event received, the webhook server queries repository-wide and organizational runners from the cluster and searches for the single target to scale up. The webhook server tries to match HorizontalRunnerAutoscaler.Spec.ScaleUpTriggers[].GitHubEvent.[CheckRun\|Push\|PullRequest] against the event and if it finds only one HRA, it is the scale target. If none or two or more targets are found for repository-wide runners, it does the same on organizational runners. Changes: * Fix integration test * Update manifests * chart: Add support for github webhook server * dockerfile: Include github-webhook-server binary * Do not import unversioned go-github * Update README	2021-02-07 17:37:27 +09:00
Jesse Haka	28e80a2d28	Add support for enterprise runners (#290 ) * Add support for enterprise runners * update docs	2021-02-05 09:31:06 +09:00
Jonas Lergell	6c64ae6a01	Actually use 'dockerdContainerResources' to set resources on the dind container (#273 )	2021-01-29 09:18:28 +09:00
Yusuke Kuoka	ace95d72ab	Fix self-update failuers due to /runner/externals mount (#253 ) * Fix self-update failuers due to /runner/externals mount Fixes #252 * Tested Self-update Fixes (#269) Adding fixes to #253 as confirmed and tested in https://github.com/summerwind/actions-runner-controller/issues/264#issuecomment-764549833 by @jolestar, @achedeuzot and @hfuss 🙇 🍻 Co-authored-by: Hayden Fuss <wifu1234@gmail.com>	2021-01-24 10:58:35 +09:00
Johannes Nicolai	94e8c6ffbf	minReplicas <= desiredReplicas <= maxReplicas (#267 ) * ensure that minReplicas <= desiredReplicas <= maxReplicas no matter what * before this change, if the number of runners was much larger than the max number, the applied scale down factor might still result in a desired value > maxReplicas * if for resource constraints in the cluster, runners would be permanently restarted, the number of runners could go up more than the reverse scale down factor until the next reconciliation round, resulting in a situation where the number of runners climbs up even though it should actually go down * by checking whether the desiredReplicas is always <= maxReplicas, infinite scaling up loops can be prevented	2021-01-22 10:11:21 +09:00
ZacharyBenamram	48923fec56	Autoscaling: Percentage runners busy - remove magic number used for round up (#235 ) * remove magic number for autoscaling Co-authored-by: Zachary Benamram <zacharybenamram@blend.com>	2020-12-15 14:38:01 +09:00
ZacharyBenamram	466b30728d	Add "PercentageRunnersBusy" horizontal runner autoscaler metric type (#223 ) * hpa scheme based off busy runners * running make manifests Co-authored-by: Zachary Benamram <zacharybenamram@blend.com>	2020-12-13 08:48:19 +09:00
Yusuke Kuoka	dfffd3fb62	feat: EKS IAM Roles for Service Accounts for Runner Pods (#226 ) One of the pod recreation conditions has been modified to use hash of runner spec, so that the controller does not keep restarting pods mutated by admission webhooks. This naturally allows us, for example, to use IRSA for EKS that requires its admission webhook to mutate the runner pod to have additional, IRSA-related volumes, volume mounts and env. Resolves #200	2020-12-08 17:56:06 +09:00
Juho Saarinen	f710a54110	Don't compare runner connetion token at restart need check (#227 ) Fixes #143	2020-12-08 08:48:35 +09:00
Erik Nobel	a2b335ad6a	Github pkg: Bump github package to version 33 (#222 )	2020-12-06 10:01:47 +09:00
Shinnosuke Sawada	be25715e1e	Use TLS for secure docker connection (#192 )	2020-11-30 08:57:33 +09:00
Reinier Timmer	ee8fb5a388	parametrized working directory (#185 ) * parametrized working directory * manifests v3.0	2020-11-25 08:55:26 +09:00
Erik Nobel	4e93879b8f	[BUG?]: Create mountpoint for /externals/ (#203 ) * runner/controller: Add externals directory mount point * Runner: Create hack for moving content of /runner/externals/ dir * Externals dir Mount: mount examples for '__e/node12/bin/node' not found error	2020-11-25 08:53:47 +09:00
Shinnosuke Sawada	4371de9733	add dockerEnabled option (#191 ) Add dockerEnabled option for users who does not need docker and want not to run privileged container. if `dockerEnabled == false`, dind container not run, and there are no privileged container. Do the same as closed #96	2020-11-16 09:41:12 +09:00
Shinnosuke Sawada	a4061d0625	gofmt ed	2020-11-12 09:20:06 +09:00
Shinnosuke Sawada	83857ba7e0	use tcp DOCKER_HOST instead of sharing docker.sock	2020-11-12 08:07:52 +09:00
Yusuke Kuoka	e613219a89	Fix token registration broken since v0.11.0 (#167 ) Fixes #166	2020-11-11 09:38:05 +09:00
Dan Webb	dcf8524b5c	Adds RUNNER_GROUP argument to the runner registration (#157 ) * Adds RUNNER_GROUP argument to the runner registration Adds the ability to register a runner to a predefined runner_group Resolves #137 * Update README with runner group example - Updates the README with instructions of how to add the runner to a group - Fix code fencing for shell and yaml blocks in the README - Use consistent bullet points (dash not asterisk)	2020-11-10 17:15:54 +09:00
Juho Saarinen	f2a2ab7ede	Check token validity only when creating new pod (#159 ) Fixes #143	2020-11-10 17:02:30 +09:00
Juho Saarinen	40c5050978	Added support for other than public GitHub URL (#146 ) Refactoring a bit	2020-10-28 22:15:53 +09:00
Yusuke Kuoka	faaca10fba	Rename Runner.Spec.dockerWithinRunnerContainer to docker"d"WithinRunnerContainer (#134 ) * Rename Runner.Spec.dockerWithinRunnerContainer to dockerdWithinRunnerContainer Ref https://github.com/summerwind/actions-runner-controller/pull/126#issuecomment-712501790	2020-10-21 21:32:40 +09:00
Juho Saarinen	d16dfac0f8	Restart if pod ends up succeeded (#136 ) Fixes #132	2020-10-21 21:32:26 +09:00
Juho Saarinen	92920926fe	Configurable "runner and DinD in a single container" (#126 )	2020-10-20 08:48:28 +09:00
Brendan Galloway	7d0bfb77e3	Inject Env Vars into Runner defined Container Spec (#127 ) The runner token is now injected into the `runner` container defined within Runner.Spec.Containers[]	2020-10-20 08:43:53 +09:00
Dominic LoBue	a63860029a	Prefer autoscaling based on jobs rather than workflows if available (#114 ) Adds the ability to autoscale on jobs in addition to workflows. We fall back to using workflow metrics if job details are not present. Resolves #89	2020-10-08 09:00:44 +09:00
Yusuke Kuoka	1e466ad3df	Ensure controller-gen is up-to-date and the code and the manifests are in-sync Follow-up for #95 that added /finalizers subresource permission and #103 that upgraded controller-gen from 0.2.4 from 0.3.0	2020-10-06 09:23:03 +09:00
Helder Moreira	7a2fa7fbce	runner-controller: do not delete runner if it is busy (#103 ) Currently, after refreshing the token, the controller re-creates the runner with the new token. This results in jobs being interrupted. This PR makes sure the pod is not restarted if it is busy. Closes #74	2020-10-05 09:06:37 +09:00
Yusuke Kuoka	4733edc20d	Add scaling-down scenario to integration test	2020-08-02 16:10:01 +09:00
Yusuke Kuoka	50487bbb54	Fix the HRA controller name	2020-08-02 10:38:15 +09:00
Yusuke Kuoka	e2164f9946	Fix integration test bugs and do verify scaling out	2020-08-02 10:34:58 +09:00
Yusuke Kuoka	3c3077a11c	Fix crash on startup after the HRDA addition This is a follow-up for #66. The reconciler for the new HorizontalRunnerDeploymentAutoscaler had a terrible flaw that prevented the controller to fail launching due to an error like: ``` indexer conflict: map[field:.metadata.controller:{}] ``` This fixes that, while adding `integration_test.go` to verify its actually fixed and prevent regression in the future.	2020-07-29 21:20:46 +09:00
Moto Ishizawa	e10637ce35	Merge pull request #66 from summerwind/org-runner-autoscale feat: Organizational RunnerDeployment Autoscaling	2020-07-28 19:17:18 +09:00
Yusuke Kuoka	ae30648985	feat: Use HorizontalRunnerAutoscaler for autoscaling	2020-07-27 20:33:44 +09:00
David Liao	c0914743b0	add config to respect image pull policy	2020-07-08 23:53:52 -07:00
Yusuke Kuoka	eca6917c6a	feat: Organizational RunnerDeployment Autoscaling Enhances #57 to add support for organizational runners. As GitHub Actions does not have an appropriate API for this, this is the spec you need: ``` apiVersion: actions.summerwind.dev/v1alpha1 kind: RunnerDeployment metadata: name: myrunners spec: minReplicas: 1 maxReplicas: 3 autoscaling: metrics: - type: TotalNumberOfQueuedAndProgressingWorkflowRuns repositories: # Assumes that you have `github.com/myorg/myrepo1` repo - myrepo1 - myrepo2 template: spec: organization: myorg ``` It works by collecting "in_progress" and "queued" workflow runs for the repositories `myrepo1` and `myrepo2` to autoscale the number of replicas, assuming you have this organizational runner deployment only for those two repositories. For example, if `myrepo1` had 1 `in_progress` and 2 `queued` workflow runs, and `myrepo2` had 4 `in_progress` and 8 `queued` workflow runs at the time of running the reconcilation loop on the runner deployment, it will scale replicas to 1 + 2 + 4 + 8 = 15. Perhaps we might be better add a kind of "ratio" setting so that you can configure the controller to create e.g. 2x runners than demanded. But that's another story. Ref #10	2020-07-03 09:12:47 +09:00
KUOKA Yusuke	5bb2694349	feat: Repository-wide RunnerDeployment Autoscaling (#57 ) * feat: Repository-wide RunnerDeployment Autoscaling This adds `maxReplicas` and `minReplicas` to the RunnerDeploymentSpec. If and only if both fields are set, the controller computes and sets desired `replicas` automatically depending on the demand. The number of demanded runner replicas is computed by `queued workflow runs + in_progress workflow runs` for the repository. The support for organizational runners is not included. Ref https://github.com/summerwind/actions-runner-controller/issues/10	2020-06-27 17:26:46 +09:00
Moto Ishizawa	390f2a62d9	Merge pull request #50 from summerwind/runner-validation-webhook Add validation webhooks	2020-05-08 22:39:13 +09:00
Moto Ishizawa	f80c3c1928	Set volume to pod properly	2020-05-01 08:51:25 +09:00
Moto Ishizawa	e889eaeb04	Add validation webhooks	2020-04-30 22:11:59 +09:00
Reinier Timmer	b96979888c	fix delete pod when runner failed to register	2020-04-29 14:23:58 +09:00
Reinier Timmer	9f57f52e36	organization and repository are now exclusive	2020-04-28 11:14:31 +02:00
Reinier Timmer	8c5b776807	support runner labels	2020-04-28 11:14:31 +02:00
Reinier Timmer	eca3cc7941	add organization info to runner status	2020-04-28 11:14:31 +02:00
Reinier Timmer	fb35dd4131	support for organization runners	2020-04-28 11:14:31 +02:00
Moto Ishizawa	3b8ea2991c	Share runner's working directory with docker sidecar	2020-04-24 22:36:27 +09:00
Moto Ishizawa	3ccc51433f	Use github package to access the GitHub API	2020-04-13 22:28:07 +09:00
Yusuke Kuoka	b411d37f2b	fix: RunnerDeployment should clean up old RunnerReplicaSets ASAP Since the initial implementation of RunnerDeployment and until this change, any update to a runner deployment has been leaving old runner replicasets until the next resync interval. This fixes that, by continusouly retrying the reconcilation 10 seconds later to see if there are any old runner replicasets that can be removed. In addition to that, the cleanup of old runner replicasets has been improved to be deferred until all the runners of the newest replica set to be available. This gives you hopefully zero or at less downtime updates of runner deployments. Fixes #24	2020-04-04 07:55:12 +09:00
Moto Ishizawa	5efdc6efe6	Add permission to create/patch events resource	2020-03-27 23:25:37 +09:00
Aleksandr Stepanov	d4c849ee09	Add variants of PodTemplate spec fields into the Runner spec (#7 ) Resolves #5 Fixes #11 Fixes #12 Changes: * Added podtemplate spec * Rework pod creation logic * Added most using podspecs * Added copy of podspec * Fixed Github List method * Fixed containers * Added ability to override runner's containers * Added ability to override runner's containers * Added ability to override runner's containers * Update controllers/runner_controller.go Co-Authored-By: Moto Ishizawa <summerwind.jp@gmail.com> * Remove optional restartpolicy * Changed naming convention Co-authored-by: Moto Ishizawa <summerwind.jp@gmail.com>	2020-03-20 22:50:50 +09:00
Moto Ishizawa	b1da3092fb	Revert test comment	2020-03-15 21:55:38 +09:00
Moto Ishizawa	5aeae6a152	Fix generate name	2020-03-15 21:50:45 +09:00
Moto Ishizawa	a897eee402	Fix RBAC role for RunnerDeployment and RunnerReplicaSet	2020-03-15 18:08:11 +09:00
Yusuke Kuoka	c19a1b3ffe	Rename RunnerSet to RunnerReplicaSet To hand over the name `RunnerSet` to the new StatefulSet-based implementation of that being developed at #4	2020-03-10 09:14:34 +09:00
Yusuke Kuoka	4b6806fda3	fix: RunnerDeployment was not working at all after field indexing enhancement	2020-03-06 08:53:46 +09:00
Yusuke Kuoka	70a8c3db0d	chore: Tidy up the deployment controller code Removes an unnecessary condition from the deployment controller code. We assumed that the client would return a not-found error on an empty runnerset list it is clearly not the case.	2020-03-03 10:50:52 +09:00
Yusuke Kuoka	31fb7cc113	feat: Efficient runner set looks up for the deployment controller Enhances the deployment controller to use indexed lookups against runner sets for more scalability.	2020-03-03 10:45:39 +09:00
Yusuke Kuoka	9d634d88ff	feat: RunnerDeployment Adds the initial version of RunnerDeployment that is intended to manage RunnerSets(#1), like Deployment manages ReplicaSets. This is the initial version and therefore is bare bone. The only update strategy it supports is `Recreate`, which recreates the underlying RunnerSet when the runner template changes. I'd like to add `RollingUpdate` strategy once this is merged. This depends on #1 so the diff contains that of #1, too. Please see only the latest commit for review. Also see https://github.com/mumoshu/actions-runner-controller-ci/runs/471329823?check_suite_focus=true to confirm that `make tests` is passing after changes made in this commit.	2020-02-27 10:46:23 +09:00
Yusuke Kuoka	d8d829b734	feat: RunnerSets RunnerSet is basically ReplicaSet for Runners. It is responsible for maintaining number of runners to match the desired one. That is, it creates missing runners from `.Spec.Template` and deletes redundant runners. Similar to ReplicaSet, this does not support rolling update of runners on its own. We might want to later add `RunnerDeployment` for that. But that's another story.	2020-02-24 10:32:44 +09:00
Moto Ishizawa	829a167303	Add 'env' field to runner resource	2020-02-06 22:09:07 +09:00
Moto Ishizawa	a436216d5e	Implement finalizer	2020-02-03 21:35:01 +09:00
Moto Ishizawa	497ddba82d	Record event of runner resource	2020-02-03 18:40:59 +09:00
Moto Ishizawa	13ef78ce20	Sync runner status with pod status	2020-02-03 17:25:38 +09:00
Moto Ishizawa	e6952f5ca1	Add '-runner-image' and '-docker-image' flags	2020-02-03 16:56:52 +09:00
Moto Ishizawa	2f69329fce	Fix permission for pods	2020-02-02 19:49:10 +09:00
Moto Ishizawa	7db5340595	Update CRD validation and RBAC	2020-02-02 10:30:42 +09:00
Moto Ishizawa	960befeade	Restart runner pod on completion	2020-02-01 00:06:30 +09:00
Moto Ishizawa	65f479d749	Always restart container	2020-01-31 22:50:08 +09:00
Moto Ishizawa	3f02153257	Ignore pods being deleted	2020-01-31 22:47:53 +09:00

1 2 3 4

154 Commits