actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Daniel	8a73560dbc	if a Volume is defined by the operator don't add another "work" volume. (#1015 ) This allows providing a different `work` Volume. This should be a cloud agnostic way of allowing the operator to use (for example) NVME backed storage. This is a working example where the workDir will use the provided volume, additionally here docker is placed on the same NVME. ``` apiVersion: actions.summerwind.dev/v1alpha1 kind: RunnerDeployment metadata: name: runner-2 spec: template: spec: dockerdContainerResources: {} env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name # this is to mount the docker in docker onto NVME disk dockerVolumeMounts: - mountPath: /var/lib/docker name: scratch subPathExpr: $(POD_NAME)-docker - mountPath: /runner/_work name: work subPathExpr: $(POD_NAME)-work volumeMounts: - mountPath: /runner/_work name: work subPathExpr: $(POD_NAME)-work dockerEnv: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name volumes: - hostPath: path: /mnt/disks/ssd0 name: scratch - hostPath: path: /mnt/disks/ssd0 name: work nodeSelector: cloud.google.com/gke-nodepool: runner-16-with-nvme ephemeral: false image: "" imagePullPolicy: Always labels: - runner-2 - self-hosted organization: yourorganization ```	2022-01-07 10:01:40 +09:00
Yusuke Kuoka	01301d3ce8	Stop creating registration-only runners on scale-to-zero (#1028 ) Resolves #859	2022-01-07 09:56:21 +09:00
Hyeonmin Park	1a6e5719c3	test: Add tests with self-hosted label for #953 (#1030 )	2022-01-07 08:50:26 +09:00
Callum Tait	ad48851dc9	feat: expose if docker is enabled and wait for docker to be ready (#962 ) Resolves #897 Resolves #915	2021-12-29 10:23:35 +09:00
Lars Haugan	c5950d75fa	fix: pagination for ListWorkflowJobs in autoscaler (#990 ) (#992 ) Adding handling of paginated results when calling `ListWorkflowJobs`. By default the `per_page` is 30, which potentially would return 30 queued and 30 in_progress jobs. This change should enable the autoscaler to scale workflows with more than 60 jobs to the exact number of runners needed. Problem: I did not find any support for pagination in the Github fake client, and have not been able to test this (as I have not been able to push an image to an environment where I can verify this). If anyone is able to help out verifying this PR, i would really appreciate it. Resolves #990	2021-12-24 09:12:36 +09:00
Felipe Galindo Sanchez	608c56936e	Remove duplicate self-hosted condition (#1016 ) Duplicate condition caused after merge of #953 and #1012	2021-12-21 09:08:21 +09:00
Felipe Galindo Sanchez	4ebec38208	Support runner groups with selected visibility in webhooks autoscaler (#1012 ) The current implementation doesn't support yet runner groups with custom visibility (e.g selected repositories only). If there are multiple runner groups with selected visibility - not all runner groups may be a potential target to be scaled up. Thus this PR introduces support to allow having runner groups with selected visibility. This requires to query GitHub API to find what are the potential runner groups that are linked to a specific repository (whether using visibility all or selected). This also improves resolving the `scaleTargetKey` that are used to match an HRA based on the inputs of the `RunnerSet`/`RunnerDeployment` spec to better support for runner groups. This requires to configure github auth in the webhook server, to keep backwards compatibility if github auth is not provided to the webhook server, this will assume all runner groups have no selected visibility and it will target any available runner group as before	2021-12-19 18:29:44 +09:00
clement-loiselet-talend	0c34196d87	fix(#951 ): add exception for self-hosted label in webhook search (#953 ) The webhook "workflowJob" pass the labels the job needs to the controller, who in turns search for them in its RunnerDeployment / RunnerSet. The current implementation ignore the search for `self-hosted` if this is the only label, however if multiple labels are found the `self-hosted` label must be declared explicitely or the RD / RS will not be selected for the autoscaling. This PR fixes the behavior by ignoring this label, and add documentation on this webhook for the other labels that will still require an explicit declaration (OS and architecture). The exception should be temporary, ideally the labels implicitely created (self-hosted, OS, architecture) should be searchable alongside the explicitly declared labels. code tested, work with `["self-hosted"]` and `["self-hosted","anotherLabel"]` Fixes #951	2021-12-19 10:55:23 +09:00
renovate[bot]	c64000e11c	fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.0 (#740 ) * fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.0 * Fix dependencies and bump Go to 1.17 so that it builds after controller-runtime 0.11.0 upgrade * Regenerate manifests with the latest K8s dependencies Co-authored-by: Renovate Bot <bot@renovateapp.com> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-12-17 09:06:55 +09:00
Felipe Galindo Sanchez	9bb21aef1f	Add support for default image pull secret name (#921 ) Resolves #896 Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-12-15 09:29:31 +09:00
Pavel Smalenski	91102c8088	Add dockerEnv variable for RunnerDeployment (#912 ) Resolves #878 Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-12-14 17:13:24 +09:00
Felipe Galindo Sanchez	f0fccc020b	refactor: split Reconciler from Reconcile in a few methods (#926 ) Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-12-12 14:22:55 +09:00
Patrick Ellis	ea2dbc2807	Update go-github from v37 -> v39 (#925 )	2021-12-11 21:43:40 +09:00
Yusuke Kuoka	898ad3c355	Work-around for offline+busy runners (#993 ) Ref #911	2021-12-09 09:31:06 +09:00
Max N. Boyarov	88b8871830	Reduce number of http superfluous messages (#894 ) write to http.ResponseWriter create HTTP OK response, so set ok to disable error code in defered function	2021-11-09 09:07:07 +09:00
Yusuke Kuoka	2191617eb5	Remove unnecessary scale-target-not-found error on in_progress workflow_job event (#927 ) Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/877#issuecomment-955614456	2021-11-09 09:05:50 +09:00
Yusuke Kuoka	b305e38b17	Add webhook-based autoscale for Enterprise runners (#906 ) Fixes #892	2021-11-09 09:04:19 +09:00
apr-1985	0d3de9ee2a	chore: correct logging typo (#904 )	2021-10-22 09:03:23 +09:00
Maxim Pogozhiy	fce7d6d2a7	Add topologySpreadConstraints (#814 )	2021-10-17 21:49:44 +01:00
Callum Tait	5805e39e1f	Revert "feat: adding workflow_dispatch webhook event" (#879 ) This reverts commit `d36d47fe66`.	2021-10-09 18:36:02 +01:00
Callum	d36d47fe66	feat: adding workflow_dispatch webhook event	2021-10-09 10:07:07 +01:00
Aidan	fccf29970b	Fix bug related to label matching. (#852 ) * Fix bug related to label matching. Add start of test framework for Workflow Job Events Signed-off-by: Aidan Jensen <aidan@artificial.com> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-09-30 11:02:59 +09:00
Alex Kulikovskikh	ea06001819	fix: scaling issue based on `workflow_job` event (#850 ) This PR fix scaling issue based on `workflow_job` event discussed in #819	2021-09-30 10:36:59 +09:00
Rob Bos	3f331e9a39	Fixing capitalization and a typo (#838 ) * Fixing capitalization and a typo * typo * Typo * Update controllers/autoscaling.go * Update controllers/autoscaling.go Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-09-26 14:34:55 +09:00
Tristan Keen	5e3f89bdc5	Correct test to append docker container (#837 ) Fixes #835	2021-09-24 09:18:20 +09:00
Tristan Keen	1eb135cace	Correct default image logic	2021-09-14 17:00:57 +09:00
Tarasovych	7008b0c257	feat: Organization RunnerDeployment with webhook-based autoscaling only for certain repositories (#766 ) Resolves #765 Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-08-31 09:46:36 +09:00
Yusuke Kuoka	fabead8c8e	feat: Workflow job based ephemeral runner scaling (#721 ) This add support for two upcoming enhancements on the GitHub side of self-hosted runners, ephemeral runners, and `workflow_jow` events. You can't use these yet. These features are not yet generally available to all GitHub users. Please take this pull request as a preparation to make it available to actions-runner-controller users as soon as possible after GitHub released the necessary features on their end. Ephemeral runners: The former, ephemeral runners, is basically the reliable alternative to `--once`, which we've been using when you enabled `ephemeral: true` (default in actions-runner-controller). `--once` has been suffering from a race issue #466. `--ephemeral` fixes that. To enable ephemeral runners with `actions/runner`, you give `--ephemeral` to `config.sh`. This updated version of `actions-runner-controller` does it for you, by using `--ephemeral` instead of `--once` when you set `RUNNER_FEATURE_FLAG_EPHEMERAL=true`. Please read the section `Ephemeral Runners` in the updated version of our README for more information. Note that ephemeral runners is not released on GitHub yet. And `RUNNER_FEATURE_FLAG_EPHEMERAL=true` won't work at all until the feature gets released on GitHub. Stay tuned for an announcement from GitHub! `workflow_job` events: `workflow_job` is the additional webhook event that corresponds to each GitHub Actions workflow job run. It provides `actions-runner-controller` a solid foundation to improve our webhook-based autoscale. Formerly, we've been exploiting webhook events like `check_run` for autoscaling. However, as none of our supported events has included `labels`, you had to configure an HRA to only match relevant `check_run` events. It wasn't trivial. In contrast, a `workflow_job` event payload contains `labels` of runners requested. `actions-runner-controller` is able to automatically decide which HRA to scale by filtering the corresponding RunnerDeployment by `labels` included in the webhook payload. So all you need to use webhook-based autoscale will be to enable `workflow_job` on GitHub and expose actions-runner-controller's webhook server to the internet. Note that the current implementation of `workflow_job` support works in two ways, increment, and decrement. An increment happens when the webhook server receives` workflow_job` of `queued` status. A decrement happens when it receives `workflow_job` of `completed` status. The latter is used to make scaling-down faster so that you waste money less than before. You still don't suffer from flapping, as a scale-down is still subject to `scaleDownDelaySecondsAfterScaleOut `. Please read the section `Example 3: Scale on each `workflow_job` event` in the updated version of our README for more information on its usage.	2021-08-11 09:52:04 +09:00
Rolf Ahrenberg	14564c7b8e	Allow disabling /runner emptydir mounts and setting storage volume (#674 ) * Allow disabling /runner emptydir mounts * Support defining storage medium for emptydirs * Fix typos	2021-07-15 06:29:58 +09:00
Sebastien Le Digabel	7f2795b5d6	Adding a default docker registry mirror (#689 ) * Adding a default docker registry mirror This change allows the controller to start with a specified default docker registry mirror and avoid having to specify it in all the runner* objects. The change is backward compatible, if a runner has a docker registry mirror specified, it will supersede the default one.	2021-07-15 06:20:08 +09:00
Yusuke Kuoka	f858e2e432	Add POC of GitHub Webhook Delivery Forwarder (#682 ) * Add POC of GitHub Webhook Delivery Forwarder * multi-forwarder and ctrl-c existing and fix for non-woring http post * Rename source files * Extract signal handling into a dedicated source file * Faster ctrl-c handling * Enable automatic creation of repo hook on startup * Add support for forwarding org hook deliveries * Set hook secret on hook creation via envvar (HOOK_SECRET) * Fix org hook support * Fix HOOK_SECRET for consistency * Refactor to prepare for custom log position provider * Refactor to extract inmemory log position provider * Add configmap-based log position provider * Rename githubwebhookdeliveryforwarder to hookdeliveryforwarder * Refactor to rename LogPositionProvider to Checkpointer and extract ConfigMap checkpointer into a dedicated pkg * Refactor to extract logger initialization * Add hookdeliveryforwarder README and bump go-github to unreleased ver	2021-07-14 10:18:55 +09:00
Yusuke Kuoka	6f130c2db5	Fix dockerdWithinRunnerContainer for Runner(Deployment) not working in the main branch (#696 ) Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/674#issuecomment-878600993	2021-07-13 18:14:15 +09:00
Yusuke Kuoka	f19e7ea8a8	chore: Upgrade go-github to v36 (#681 )	2021-07-04 17:43:52 +09:00
Yusuke Kuoka	acb906164b	RunnerSet: Automatic-recovery from registration timeout and deregistration on pod termination (#652 ) Ref #629 Ref #613 Ref #612	2021-06-24 20:39:37 +09:00
Yusuke Kuoka	98da4c2adb	Add HRA support for RunnerSet (#647 ) `HRA.Spec.ScaleTargetRef.Kind` is added to denote that the scale-target is a RunnerSet. It defaults to `RunnerDeployment` for backward compatibility. ``` apiVersion: actions.summerwind.dev/v1alpha1 kind: HorizontalRunnerAutoscaler metadata: name: myhra spec: scaleTargetRef: kind: RunnerSet name: myrunnerset ``` Ref #629 Ref #613 Ref #612	2021-06-23 20:25:03 +09:00
Yusuke Kuoka	8b90b0f0e3	Clean up import list (#645 ) Resolves #644	2021-06-22 17:55:06 +09:00
Jonathan Gonzalez V	a277489003	Added support to enable and disable enableServiceLinks. (#628 ) This option expose internally some `KUBERNETES_*` environment variables that doesn't allow the runner to use KinD (Kubernetes in Docker) since it will try to connect to the Kubernetes cluster where the runner it's running. This option it's set by default to `true` in any Kubernetes deployment. Signed-off-by: Jonathan Gonzalez V <jonathan.gonzalez@enterprisedb.com>	2021-06-22 17:27:26 +09:00
Yusuke Kuoka	9e4dbf497c	feat: RunnerSet backed by StatefulSet (#629 ) * feat: RunnerSet backed by StatefulSet Unlike a runner deployment, a runner set can manage a set of stateful runners by combining a statefulset and an admission webhook that mutates statefulset-managed pods with required envvars and registration tokens. Resolves #613 Ref #612 * Upgrade controller-runtime to 0.9.0 * Bump Go to 1.16.x following controller-runtime 0.9.0 * Upgrade kubebuilder to 2.3.2 for updated etcd and apiserver following local setup * Fix startup failure due to missing LeaderElectionID * Fix the issue that any pods become unable to start once actions-runner-controller got failed after the mutating webhook has been registered * Allow force-updating statefulset * Fix runner container missing work and certs-client volume mounts and DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false * Fix runnerset-controller not applying statefulset.spec.template.spec changes when there were no changes in runnerset spec * Enable running acceptance tests against arbitrary kind cluster * RunnerSet supports non-ephemeral runners only today * fix: docker-build from root Makefile on intel mac * fix: arch check fixes for mac and ARM * ci: aligning test data format and patching checks * fix: removing namespace in test data * chore: adding more ignores * chore: removing leading space in shebang * Re-add metrics to org hra testdata * Bump cert-manager to v1.1.1 and fix deploy.sh Co-authored-by: toast-gear <15716903+toast-gear@users.noreply.github.com> Co-authored-by: Callum James Tait <callum.tait@photobox.com>	2021-06-22 17:10:09 +09:00
Jonah Back	8c42f99d0b	feat: avoid setting privileged flag if seLinuxOptions is not null (#599 ) Sets the privileged flag to false if SELinuxOptions are present/defined. This is needed because containerd treats SELinux and Privileged controls as mutually exclusive. Also see https://github.com/containerd/cri/blob/aa2d5a97c/pkg/server/container_create.go#L164. This allows users who use SELinux for managing privileged processes to use GH Actions - otherwise, based on the SELinux policy, the Docker in Docker container might not be privileged enough. Signed-off-by: Jonah Back <jonah@jonahback.com> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-06-04 08:59:11 +09:00
Ameer Ghani	7523ea44f1	feat: allow specifying runtime class in runner spec (#580 ) This allows using the `runtimeClassName` directive in the runner's spec. One of the use-cases for this is Kata Containers, which use `runtimeClassName` in a pod spec as an indicator that the pod should run inside a Kata container. This allows us a greater degree of pod isolation.	2021-06-04 08:56:43 +09:00
Yusuke Kuoka	cb14d7530b	Add HRA printer column "SCHEDULE" (#561 ) Adds a column to help the operator see if they configured HRA.Spec.ScheduledOverrides correctly, in a form of "next override schedule recognized by the controller": ``` $ k get horizontalrunnerautoscaler NAME MIN MAX DESIRED SCHEDULE actions-runner-aos-autoscaler 0 5 0 org 0 5 0 min=0 time=2021-05-21 15:00:00 +0000 UTC ``` Ref https://github.com/actions-runner-controller/actions-runner-controller/issues/484	2021-05-22 08:29:53 +09:00
Yusuke Kuoka	0b88b246d3	Fix additionalPrinterColumns (#556 ) This fixes human-readable output of `kubectl get` on `runnerdeployment`, `runnerreplicaset`, and `runner`. Most notably, CURRENT and READY of runner replicasets are now computed and printed correctly. Runner deployments now have UP-TO-DATE and AVAILABLE instead of READY so that it is consistent with columns of K8s deployments. A few fixes has been also made to runner deployment and runner replicaset controllers so that those numbers stored in Status objects are reliably updated and in-sync with actual values. Finally, `AGE` columns are added to runnerdeployment, runnerreplicaset, runnner to make that more visible to users. `kubectl get` outputs should now look like the below examples: ``` # Immediately after runnerdeployment updated/created $ k get runnerdeployment NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE example-runnerdeploy 0 0 0 0 8d org-runnerdeploy 5 5 5 0 8d # A few dozens of seconds after update/create all the runners are registered that "available" numbers increase $ k get runnerdeployment NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE example-runnerdeploy 0 0 0 0 8d org-runnerdeploy 5 5 5 5 8d ``` ``` $ k get runnerreplicaset NAME DESIRED CURRENT READY AGE example-runnerdeploy-wnpf6 0 0 0 61m org-runnerdeploy-fsnmr 2 2 0 8m41s ``` ``` $ k get runner NAME ENTERPRISE ORGANIZATION REPOSITORY LABELS STATUS AGE example-runnerdeploy-wnpf6-registration-only actions-runner-controller/mumoshu-actions-test Running 61m org-runnerdeploy-fsnmr-n8kkx actions-runner-controller ["mylabel 1","mylabel 2"] 21s org-runnerdeploy-fsnmr-sq6m8 actions-runner-controller ["mylabel 1","mylabel 2"] 21s ``` Fixes #490	2021-05-21 09:10:47 +09:00
Yusuke Kuoka	3cd124dce3	chore: Add debug logs for scheduledOverrides (#540 ) Follow-up for #515 Ref #484	2021-05-11 17:30:22 +09:00
Yusuke Kuoka	25f5817a5e	Improve debug log in webhook-based autoscaling Adds some helpful debug log messages I have used while verifying #534	2021-05-11 15:49:03 +09:00
Yusuke Kuoka	4e7b8b57c0	edge: Enable scaling from zero with PercentageRunnersBusy (#524 ) `PercentageRunnersBusy`, in combination with a secondary `TotalInProgressAndQueuedWorkflowRuns` metric, enables scale-from-zero for PercentageRunnersBusy. Please see the new `Autoscaling to/from 0` section in the updated documentation about how it works. Resolves #522	2021-05-05 14:27:17 +09:00
Yusuke Kuoka	e7020c7c0f	Fix scale-from-zero to retain the reg-only runner until other pods come up (#523 ) Fixes #516	2021-05-05 12:13:51 +09:00
Yusuke Kuoka	0e0f385f72	Experimental support for ScheduledOverrides (#515 ) This adds the initial version of ScheduledOverrides to HorizontalRunnerAutoscaler. `MinReplicas` overriding should just work. When there are two or more ScheduledOverrides, the earliest one that matched is activated. Each ScheduledOverride can be recurring or one-time. If you have two or more ScheduledOverrides, only one of them should be one-time. And the one-time override should be the earliest item in the list to make sense. Tests will be added in another commit. Logging improvements and additional observability in HRA.Status will also be added in yet another commits. Ref #484	2021-05-03 23:31:17 +09:00
Yusuke Kuoka	469b117a09	Foundation for ScheduledOverrides (#513 ) Adds two types `RecurrenceRule` and `Period` and one function `MatchSchedule` as the foundation for building the upcoming ScheduledOverrides feature. Ref #484	2021-05-03 22:03:49 +09:00
Thejas N	588872a316	feat: allow ephemeral runner to be optional (#498 ) - Adds `ephemeral` option to `runner.spec` ``` .... template: spec: ephemeral: false repository: mumoshu/actions-runner-controller-ci .... ``` - `ephemeral` defaults to `true` - `entrypoint.sh` in runner/Dockerfile modified to read `RUNNER_EPHEMERAL` flag - Runner images are backward-compatible. `--once` is omitted only when the new envvar `RUNNER_EPHEMERAL` is explicitly set to `false`. Resolves #457	2021-05-02 19:04:14 +09:00
Christoph Brand	a18ac330bb	feature(controller): allow autoscaler to scale down to 0 (#447 )	2021-05-02 16:46:51 +09:00

1 2 3 4

175 Commits