actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Thejas N	588872a316	feat: allow ephemeral runner to be optional (#498 ) - Adds `ephemeral` option to `runner.spec` ``` .... template: spec: ephemeral: false repository: mumoshu/actions-runner-controller-ci .... ``` - `ephemeral` defaults to `true` - `entrypoint.sh` in runner/Dockerfile modified to read `RUNNER_EPHEMERAL` flag - Runner images are backward-compatible. `--once` is omitted only when the new envvar `RUNNER_EPHEMERAL` is explicitly set to `false`. Resolves #457	2021-05-02 19:04:14 +09:00
Yusuke Kuoka	dbd7b486d2	feat: Support for scaling from/to zero (#465 ) This is an attempt to support scaling from/to zero. The basic idea is that we create a one-off "registration-only" runner pod on RunnerReplicaSet being scaled to zero, so that there is one "offline" runner, which enables GitHub Actions to queue jobs instead of discarding those. GitHub Actions seems to immediately throw away the new job when there are no runners at all. Generally, having runners of any status, `busy`, `idle`, or `offline` would prevent GitHub actions from failing jobs. But retaining `busy` or `idle` runners means that we need to keep runner pods running, which conflicts with our desired to scale to/from zero, hence we retain `offline` runners. In this change, I enhanced the runnerreplicaset controller to create a registration-only runner on very beginning of its reconciliation logic, only when a runnerreplicaset is scaled to zero. The runner controller creates the registration-only runner pod, waits for it to become "offline", and then removes the runner pod. The runner on GitHub stays `offline`, until the runner resource on K8s is deleted. As we remove the registration-only runner pod as soon as it registers, this doesn't block cluster-autoscaler. Related to #447	2021-05-02 16:11:36 +09:00
Rolf Ahrenberg	6b77a2a5a8	feat: Docker registry mirror (#478 ) Changes: - Switched to use `jq` in startup.sh - Enable docker registry mirror configuration which is useful when e.g. avoiding the Docker Hub rate-limiting Check #478 for how this feature is tested and supposed to be used.	2021-04-25 14:04:01 +09:00
Manuel Jurado	37c2a62fa8	Allow to configure runner volume size limit (#436 ) Enable the user to set a limit size on the volume of the runner to avoid some runner pod affecting other resources of the same cluster Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-04-18 13:56:59 +09:00
Agoney Garcia-Deniz	2e551c9d0a	Add hostAliases to the runner spec (#456 )	2021-04-17 17:04:52 +09:00
asoldino	b42b8406a2	Add dockerVolumeMounts (#439 ) Resolves #435	2021-04-06 10:10:10 +09:00
Christoph Brand	9ed245c85e	feature(controller): remove dockerd executable (#432 )	2021-04-01 08:50:48 +09:00
Yusuke Kuoka	156e2c1987	Fix MTU configuration for dockerd (#421 ) Resolves #393	2021-03-31 09:29:21 +09:00
Yusuke Kuoka	374105c1f3	Fix dindWithinRunnerContainer not to crash-loop runner pods (#419 ) Apparently #253 broke dindWithinRunnerContainer completely due to the difference in how /runner volume is set up.	2021-03-25 10:23:36 +09:00
Yusuke Kuoka	bc6e499e4f	Make logging more concise (#410 ) This makes logging more concise by changing logger names to something like `controllers.Runner` to `actions-runner-controller.runner` after the standard `controller-rutime.controller` and reducing redundant logs by removing unnecessary requeues. I have also tweaked log messages so that their style is more consistent, which will also help readability. Also, runnerreplicaset-controller lacked useful logs so I have enhanced it.	2021-03-20 07:34:25 +09:00
Yusuke Kuoka	07f822bb08	Do include Runner controller in integration test (#409 ) So that we could catch bugs in runner controller like seen in #398, #404, and #407. Ref #400	2021-03-19 16:14:15 +09:00
Yusuke Kuoka	c424215044	Do recheck runner registration timely (#405 ) Since #392, the runner controller could have taken unexpectedly long time until it finally notices that the runner has been registered to GitHub. This patch fixes the issue, so that the controller will notice the successful registration in approximately 1 minute(hard-coded). More concretely, let's say you had configured a long sync-period of like 10m, the runner controller could have taken approx 10m to notice the successful registration. The original expectation was 1m, because it was intended to recheck every 1m as implemented in #392. It wasn't working as such due to my misunderstanding in how requeueing work.	2021-03-19 11:02:47 +09:00
Yusuke Kuoka	3cccca8d09	Do patch runner status instead of update to reduce conflicts and avoid future bugs Ref https://github.com/summerwind/actions-runner-controller/pull/398#issuecomment-801548375	2021-03-18 10:31:17 +09:00
Yusuke Kuoka	7a7086e7aa	Make error logs more helpful	2021-03-18 10:26:21 +09:00
Yusuke Kuoka	3f23501b8e	Reduce "No runner matching the specified labels was found" errors while runner replacement (#392 ) We occasionally encountered those errors while the underlying RunnerReplicaSet is being recreated/replaced on RunnerDeployment.Spec.Template update. It turned out to be due to that the RunnerDeployment controller was waiting for the runner pod becomes `Running`, intead of the new replacement runner to have registered to GitHub. This fixes that, by trying to Runner.Status.Phase to `Running` only after the runner in the runner pod appears to be registered. A side-effect of this change is that runner controller would call more "ListRunners" GitHub Actions API. I've reviewed and improved the runner controller code and Runner CRD to make make the number of calls minimum. In most cases, ListRunners should be called only twice for each runner creation.	2021-03-16 10:52:30 +09:00
Brandon Kimbrough	2273b198a1	Add ability to set the MTU size of the docker in docker container (#385 ) * adding abilitiy to set docker in docker MTU size * safeguards to only set MTU env var if it is set	2021-03-12 08:44:49 +09:00
Johannes Nicolai	2d7fbbfb68	Handle offline runners gracefully (#341 ) * if a runner pod starts up with an invalid token, it will go in an infinite retry loop, appearing as RUNNING from the outside * normally, this error situation is detected because no corresponding runner objects exists in GitHub and the pod will get removed after registration timeout * if the GitHub runner object already existed before - e.g. because a finalizer was not properly run as part of a partial Kubernetes crash, the runner will always stay in a running mode, even updating the registration token will not kill the problematic pod * introducing RunnerOffline exception that can be handled in runner controller and replicaset controller * as runners are offline when a pod is completed and marked for restart, only do additional restart checks if no restart was already decided, making code a bit cleaner and saving GitHub API calls after each job completion	2021-02-22 10:08:04 +09:00
Yusuke Kuoka	7d024a6c05	Fix "duplicate metrics collector registration attempted" errors at startup (#317 ) I have seen this error a lot in our integration test. It turned out due to https://github.com/kubernetes-sigs/controller-runtime/issues/484 and is being fixed with this change.	2021-02-16 18:51:33 +09:00
Johannes Nicolai	2623140c9a	Make log message less scary (#311 ) * the reconciliation loop is often much faster than the runner startup, so changing runner not found related messages to debug and also add the possibility that the runner just needs more time	2021-02-16 09:55:55 +09:00
Johannes Nicolai	bc8bc70f69	Fix rate limit and runner registration logic (#309 ) * errors.Is compares all members of a struct to return true which never happened * switched to type check instead of exact value check * notRegistered was using double negation in if statement which lead to unregistering runners after the registration timeout	2021-02-15 09:36:49 +09:00
Johannes Nicolai	9c8d7305f1	Introduce pod deletion timeout and forcefully delete stuck pods (#307 ) * if a k8s node becomes unresponsive, the kube controller will soft delete all pods after the eviction time (default 5 mins) * as long as the node stays unresponsive, the pod will never leave the last status and hence the runner controller will assume that everything is fine with the pod and will not try to create new pods * this can result in a situation where a horizontal autoscaler thinks that none of its runners are currently busy and will not schedule any further runners / pods, resulting in a broken runner deployment until the runnerreplicaset is deleted or the node comes back online * introducing a pod deletion timeout (1 minute) after which the runner controller will try to reboot the runner and create a pod on a working node * use forceful deletion and requeue for pods that have been stuck for more than one minute in terminating state * gracefully handling race conditions if pod gets finally forcefully deleted within	2021-02-15 09:32:28 +09:00
Yusuke Kuoka	addcbfa7ee	Fix runner registration timeout (#301 ) Fixes #300	2021-02-12 10:00:20 +09:00
Yusuke Kuoka	bbb036e732	feat: Prevent blocking on transient runner registration failure (#297 ) This enhances the controller to recreate the runner pod if the corresponding runner has failed to register itself to GitHub within 10 minutes(currently hard-coded). It should alleviate #288 in case the root cause is some kind of transient failures(network unreliability, GitHub down, temporarly compute resource shortage, etc). Formerly you had to manually detect and delete such pods or even force-delete corresponding runners to unblock the controller. Since this enhancement, the controller does the pod deletion automatically after 10 minutes after pod creation, which result in the controller create another pod that might work. Ref #288	2021-02-09 10:17:52 +09:00
Jesse Haka	28e80a2d28	Add support for enterprise runners (#290 ) * Add support for enterprise runners * update docs	2021-02-05 09:31:06 +09:00
Jonas Lergell	6c64ae6a01	Actually use 'dockerdContainerResources' to set resources on the dind container (#273 )	2021-01-29 09:18:28 +09:00
Yusuke Kuoka	ace95d72ab	Fix self-update failuers due to /runner/externals mount (#253 ) * Fix self-update failuers due to /runner/externals mount Fixes #252 * Tested Self-update Fixes (#269) Adding fixes to #253 as confirmed and tested in https://github.com/summerwind/actions-runner-controller/issues/264#issuecomment-764549833 by @jolestar, @achedeuzot and @hfuss 🙇 🍻 Co-authored-by: Hayden Fuss <wifu1234@gmail.com>	2021-01-24 10:58:35 +09:00
Yusuke Kuoka	dfffd3fb62	feat: EKS IAM Roles for Service Accounts for Runner Pods (#226 ) One of the pod recreation conditions has been modified to use hash of runner spec, so that the controller does not keep restarting pods mutated by admission webhooks. This naturally allows us, for example, to use IRSA for EKS that requires its admission webhook to mutate the runner pod to have additional, IRSA-related volumes, volume mounts and env. Resolves #200	2020-12-08 17:56:06 +09:00
Juho Saarinen	f710a54110	Don't compare runner connetion token at restart need check (#227 ) Fixes #143	2020-12-08 08:48:35 +09:00
Shinnosuke Sawada	be25715e1e	Use TLS for secure docker connection (#192 )	2020-11-30 08:57:33 +09:00
Reinier Timmer	ee8fb5a388	parametrized working directory (#185 ) * parametrized working directory * manifests v3.0	2020-11-25 08:55:26 +09:00
Erik Nobel	4e93879b8f	[BUG?]: Create mountpoint for /externals/ (#203 ) * runner/controller: Add externals directory mount point * Runner: Create hack for moving content of /runner/externals/ dir * Externals dir Mount: mount examples for '__e/node12/bin/node' not found error	2020-11-25 08:53:47 +09:00
Shinnosuke Sawada	4371de9733	add dockerEnabled option (#191 ) Add dockerEnabled option for users who does not need docker and want not to run privileged container. if `dockerEnabled == false`, dind container not run, and there are no privileged container. Do the same as closed #96	2020-11-16 09:41:12 +09:00
Shinnosuke Sawada	a4061d0625	gofmt ed	2020-11-12 09:20:06 +09:00
Shinnosuke Sawada	83857ba7e0	use tcp DOCKER_HOST instead of sharing docker.sock	2020-11-12 08:07:52 +09:00
Yusuke Kuoka	e613219a89	Fix token registration broken since v0.11.0 (#167 ) Fixes #166	2020-11-11 09:38:05 +09:00
Dan Webb	dcf8524b5c	Adds RUNNER_GROUP argument to the runner registration (#157 ) * Adds RUNNER_GROUP argument to the runner registration Adds the ability to register a runner to a predefined runner_group Resolves #137 * Update README with runner group example - Updates the README with instructions of how to add the runner to a group - Fix code fencing for shell and yaml blocks in the README - Use consistent bullet points (dash not asterisk)	2020-11-10 17:15:54 +09:00
Juho Saarinen	f2a2ab7ede	Check token validity only when creating new pod (#159 ) Fixes #143	2020-11-10 17:02:30 +09:00
Juho Saarinen	40c5050978	Added support for other than public GitHub URL (#146 ) Refactoring a bit	2020-10-28 22:15:53 +09:00
Yusuke Kuoka	faaca10fba	Rename Runner.Spec.dockerWithinRunnerContainer to docker"d"WithinRunnerContainer (#134 ) * Rename Runner.Spec.dockerWithinRunnerContainer to dockerdWithinRunnerContainer Ref https://github.com/summerwind/actions-runner-controller/pull/126#issuecomment-712501790	2020-10-21 21:32:40 +09:00
Juho Saarinen	d16dfac0f8	Restart if pod ends up succeeded (#136 ) Fixes #132	2020-10-21 21:32:26 +09:00
Juho Saarinen	92920926fe	Configurable "runner and DinD in a single container" (#126 )	2020-10-20 08:48:28 +09:00
Brendan Galloway	7d0bfb77e3	Inject Env Vars into Runner defined Container Spec (#127 ) The runner token is now injected into the `runner` container defined within Runner.Spec.Containers[]	2020-10-20 08:43:53 +09:00
Yusuke Kuoka	1e466ad3df	Ensure controller-gen is up-to-date and the code and the manifests are in-sync Follow-up for #95 that added /finalizers subresource permission and #103 that upgraded controller-gen from 0.2.4 from 0.3.0	2020-10-06 09:23:03 +09:00
Helder Moreira	7a2fa7fbce	runner-controller: do not delete runner if it is busy (#103 ) Currently, after refreshing the token, the controller re-creates the runner with the new token. This results in jobs being interrupted. This PR makes sure the pod is not restarted if it is busy. Closes #74	2020-10-05 09:06:37 +09:00
David Liao	c0914743b0	add config to respect image pull policy	2020-07-08 23:53:52 -07:00
Moto Ishizawa	390f2a62d9	Merge pull request #50 from summerwind/runner-validation-webhook Add validation webhooks	2020-05-08 22:39:13 +09:00
Moto Ishizawa	f80c3c1928	Set volume to pod properly	2020-05-01 08:51:25 +09:00
Moto Ishizawa	e889eaeb04	Add validation webhooks	2020-04-30 22:11:59 +09:00
Reinier Timmer	b96979888c	fix delete pod when runner failed to register	2020-04-29 14:23:58 +09:00
Reinier Timmer	9f57f52e36	organization and repository are now exclusive	2020-04-28 11:14:31 +02:00

1 2

71 Commits