actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Yusuke Kuoka	3115d71471	acceptance,e2e: Enhance deploy.sh to support more types of runnersets	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	1463d4927f	acceptance,e2e: Let capacity reservation expired more later	2022-02-21 00:07:49 +00:00
Yusuke Kuoka	d4a9750e20	acceptance,e2e: Enhance E2E test and deploy.sh to support scaleDownDelaySeconds~ and minReplicas for HRA	2022-02-20 13:45:42 +00:00
Yusuke Kuoka	4e6bfd8114	e2e: Add ability to toggle dockerdWithinRunnerContainer	2022-02-20 04:37:15 +00:00
Yusuke Kuoka	f3ceccd904	acceptance: Improve deploy.sh to recreate ARC (not runner) pods on new test id So that one does not need to manually recreate ARC pods frequently.	2022-02-19 12:22:53 +00:00
Yusuke Kuoka	4b557dc54c	Add logging transport to log HTTP requests in log level -3 The log level -3 is the minimum log level that is supported today, smaller than debug(-1) and -2(used to log some HRA related logs). This commit adds a logging HTTP transport to log HTTP requests and responses to that log level. It implements http.RoundTripper so that it can log each HTTP request with useful metadata like `from_cache` and `ratelimit_remaining`. The former is set to `true` only when the logged request's response was served from ARC's in-memory cache. The latter is set to X-RateLimit-Remaining response header value if and only if the response was served by GitHub, not by ARC's cache.	2022-02-19 12:22:53 +00:00
Yusuke Kuoka	ba4bd7c0db	e2e,acceptance: Cover enterprise runners (#1124 ) Adds various code and changes I have used while testing #1062	2022-02-17 09:16:28 +09:00
Yusuke Kuoka	5b92c412a4	chart: Allow using different secrets for controller-manager and gh-webhook-server (#1122 ) * chart: Allow using different secrets for controller-manager and gh-webhook-server As it is entirely possible to do so because they are two different K8s deployments. It may provide better scalability because then each component gets its own GitHub API quota.	2022-02-17 09:16:16 +09:00
Yusuke Kuoka	a7b39cc247	acceptance: Avoid "metadata.annotations too long" errors on applying CRDs	2022-02-17 09:01:44 +09:00
Yusuke Kuoka	1e452358b4	acceptance: Do recreate the controller-manager secret on every deployment We had to manually remove the secret first to update the GitHub credentials used by the controller, which was cumbersome. Note that you still need to recreate the controller pods and the gh webhook server pods to let them remount the recreated secret.	2022-02-17 09:01:44 +09:00
Yusuke Kuoka	fabead8c8e	feat: Workflow job based ephemeral runner scaling (#721 ) This add support for two upcoming enhancements on the GitHub side of self-hosted runners, ephemeral runners, and `workflow_jow` events. You can't use these yet. These features are not yet generally available to all GitHub users. Please take this pull request as a preparation to make it available to actions-runner-controller users as soon as possible after GitHub released the necessary features on their end. Ephemeral runners: The former, ephemeral runners, is basically the reliable alternative to `--once`, which we've been using when you enabled `ephemeral: true` (default in actions-runner-controller). `--once` has been suffering from a race issue #466. `--ephemeral` fixes that. To enable ephemeral runners with `actions/runner`, you give `--ephemeral` to `config.sh`. This updated version of `actions-runner-controller` does it for you, by using `--ephemeral` instead of `--once` when you set `RUNNER_FEATURE_FLAG_EPHEMERAL=true`. Please read the section `Ephemeral Runners` in the updated version of our README for more information. Note that ephemeral runners is not released on GitHub yet. And `RUNNER_FEATURE_FLAG_EPHEMERAL=true` won't work at all until the feature gets released on GitHub. Stay tuned for an announcement from GitHub! `workflow_job` events: `workflow_job` is the additional webhook event that corresponds to each GitHub Actions workflow job run. It provides `actions-runner-controller` a solid foundation to improve our webhook-based autoscale. Formerly, we've been exploiting webhook events like `check_run` for autoscaling. However, as none of our supported events has included `labels`, you had to configure an HRA to only match relevant `check_run` events. It wasn't trivial. In contrast, a `workflow_job` event payload contains `labels` of runners requested. `actions-runner-controller` is able to automatically decide which HRA to scale by filtering the corresponding RunnerDeployment by `labels` included in the webhook payload. So all you need to use webhook-based autoscale will be to enable `workflow_job` on GitHub and expose actions-runner-controller's webhook server to the internet. Note that the current implementation of `workflow_job` support works in two ways, increment, and decrement. An increment happens when the webhook server receives` workflow_job` of `queued` status. A decrement happens when it receives `workflow_job` of `completed` status. The latter is used to make scaling-down faster so that you waste money less than before. You still don't suffer from flapping, as a scale-down is still subject to `scaleDownDelaySecondsAfterScaleOut `. Please read the section `Example 3: Scale on each `workflow_job` event` in the updated version of our README for more information on its usage.	2021-08-11 09:52:04 +09:00
Yusuke Kuoka	7a305d2892	e2e: Install and run workflow and verify the result (#661 ) This enhances the E2E test suite introduced in #658 to also include the following steps: - Install GitHub Actions workflow - Trigger a workflow run via a git commit - Verify the workflow run result In the workflow, we use `kubectl create cm --from-literal` to create a configmap that contains an unique test ID. In the last step we obtain the configmap from within the E2E test and check the test ID to match the expected one. To install a GitHub Actions workflow, we clone a GitHub repository denoted by the TEST_REPO envvar, progmatically generate a few files with some Go code, run `git-add`, `git-commit`, and then `git-push` to actually push the files to the repository. A single commit containing an updated workflow definition and an updated file seems to run a workflow derived to the definition introduced in the commit, which was a bit surpirising and useful behaviour. At this point, the E2E test fully covers all the steps for a GitHub token based installation. We need to add scenarios for more deployment options, like GitHub App, RunnerDeployment, HRA, and so on. But each of them would worth another pull request.	2021-06-28 08:30:32 +09:00
Yusuke Kuoka	acb906164b	RunnerSet: Automatic-recovery from registration timeout and deregistration on pod termination (#652 ) Ref #629 Ref #613 Ref #612	2021-06-24 20:39:37 +09:00
Yusuke Kuoka	98da4c2adb	Add HRA support for RunnerSet (#647 ) `HRA.Spec.ScaleTargetRef.Kind` is added to denote that the scale-target is a RunnerSet. It defaults to `RunnerDeployment` for backward compatibility. ``` apiVersion: actions.summerwind.dev/v1alpha1 kind: HorizontalRunnerAutoscaler metadata: name: myhra spec: scaleTargetRef: kind: RunnerSet name: myrunnerset ``` Ref #629 Ref #613 Ref #612	2021-06-23 20:25:03 +09:00
Yusuke Kuoka	9e4dbf497c	feat: RunnerSet backed by StatefulSet (#629 ) * feat: RunnerSet backed by StatefulSet Unlike a runner deployment, a runner set can manage a set of stateful runners by combining a statefulset and an admission webhook that mutates statefulset-managed pods with required envvars and registration tokens. Resolves #613 Ref #612 * Upgrade controller-runtime to 0.9.0 * Bump Go to 1.16.x following controller-runtime 0.9.0 * Upgrade kubebuilder to 2.3.2 for updated etcd and apiserver following local setup * Fix startup failure due to missing LeaderElectionID * Fix the issue that any pods become unable to start once actions-runner-controller got failed after the mutating webhook has been registered * Allow force-updating statefulset * Fix runner container missing work and certs-client volume mounts and DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false * Fix runnerset-controller not applying statefulset.spec.template.spec changes when there were no changes in runnerset spec * Enable running acceptance tests against arbitrary kind cluster * RunnerSet supports non-ephemeral runners only today * fix: docker-build from root Makefile on intel mac * fix: arch check fixes for mac and ARM * ci: aligning test data format and patching checks * fix: removing namespace in test data * chore: adding more ignores * chore: removing leading space in shebang * Re-add metrics to org hra testdata * Bump cert-manager to v1.1.1 and fix deploy.sh Co-authored-by: toast-gear <15716903+toast-gear@users.noreply.github.com> Co-authored-by: Callum James Tait <callum.tait@photobox.com>	2021-06-22 17:10:09 +09:00
Yusuke Kuoka	3cd124dce3	chore: Add debug logs for scheduledOverrides (#540 ) Follow-up for #515 Ref #484	2021-05-11 17:30:22 +09:00
Yusuke Kuoka	0510f19607	chore: Enhance acceptance test to cover webhook-based autoscaling for repo and org runners Adds what I used while verifying #534	2021-05-11 15:36:02 +09:00
Yusuke Kuoka	e00b3b9714	Make development cycle faster (#508 ) Improves Makefile, acceptance/deploy.sh, acceptance/testdata/runnerdeploy.yaml, and the documentation to help developers and contributors.	2021-05-03 13:03:17 +09:00
Yusuke Kuoka	0901456320	Update README with more detailed test instructions (#503 ) - You can now use `make acceptance/run` to run only a specific acceptance test case - Add note about Ubuntu 20.04 users / snap-provided docker - Add instruction to run Ginkgo tests - Extract acceptance/load from acceptance/kind - Make `acceptance/pull` not depend on `docker-build`, so that you can do `make docker-build acceptance/load` for faster image reload	2021-05-02 16:31:07 +09:00
Yusuke Kuoka	dbd7b486d2	feat: Support for scaling from/to zero (#465 ) This is an attempt to support scaling from/to zero. The basic idea is that we create a one-off "registration-only" runner pod on RunnerReplicaSet being scaled to zero, so that there is one "offline" runner, which enables GitHub Actions to queue jobs instead of discarding those. GitHub Actions seems to immediately throw away the new job when there are no runners at all. Generally, having runners of any status, `busy`, `idle`, or `offline` would prevent GitHub actions from failing jobs. But retaining `busy` or `idle` runners means that we need to keep runner pods running, which conflicts with our desired to scale to/from zero, hence we retain `offline` runners. In this change, I enhanced the runnerreplicaset controller to create a registration-only runner on very beginning of its reconciliation logic, only when a runnerreplicaset is scaled to zero. The runner controller creates the registration-only runner pod, waits for it to become "offline", and then removes the runner pod. The runner on GitHub stays `offline`, until the runner resource on K8s is deleted. As we remove the registration-only runner pod as soon as it registers, this doesn't block cluster-autoscaler. Related to #447	2021-05-02 16:11:36 +09:00
ToMe25	ba175148c8	Locally build runner image instead of pulling it (#473 ) * Fix acceptance helm test not using newly built controller image * Locally build runner image instead of pulling it * Revert runner controller image pull policy to always and add a line to the test deployment to use IfNotPresent * Change runner repository from summerwind/action-runner to the owner of actions-runner-controller. Also fix some Makefile formatting. * Undo renaming acceptance/pull to docker-pull * Some env var cleanup Rename USERNAME to DOCKER_USER(is still used for github too tho) Add RUNNER_NAME var(defaults to $DOCKER_USER/actions-runner) Add TEST_REPO(defaults to $DOCKER_USER/actions-runner-controller)	2021-05-01 15:10:57 +09:00
ToMe25	a612b38f9b	Cache docker images in acceptance test (#463 ) * Cache docker images locally Cache dind, runner, and kube-rbac-proxy docker image on the host and copy onto the kind node instead of downloading it to the node directly. * Also cache certmanager docker images	2021-04-18 09:44:59 +09:00
ToMe25	c26fb5ad5f	Make acceptance use local docker image (#448 ) load the local docker image to the kind cluster instead of pushing it to dockerhub and pulling it from there	2021-04-17 17:13:47 +09:00
callum-tait-pbx	f2680b2f2d	Bumping runner to Ubuntu 20.04 (#438 ) Images for `actions-runner:v${VERSION}` and `actions-runner:latest` tags are upgraded to Ubuntu 20.04. If you would like not to upgrade Ubuntu in the runner image in the future, migrate to new tags suffixed with `-ubuntu-20.04` like`actions-runner:v${VERSION}-ubuntu-20.04`. We also keep publishing the existing Ubuntu 18.04 images with new `actions-runner:v${VERSION}-ubuntu-18.04` tags. Please use it when it turned out that you had workflows dependent on Ubuntu 18.04. Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-04-17 17:02:03 +09:00
Yusuke Kuoka	156e2c1987	Fix MTU configuration for dockerd (#421 ) Resolves #393	2021-03-31 09:29:21 +09:00
Yusuke Kuoka	dfffd3fb62	feat: EKS IAM Roles for Service Accounts for Runner Pods (#226 ) One of the pod recreation conditions has been modified to use hash of runner spec, so that the controller does not keep restarting pods mutated by admission webhooks. This naturally allows us, for example, to use IRSA for EKS that requires its admission webhook to mutate the runner pod to have additional, IRSA-related volumes, volume mounts and env. Resolves #200	2020-12-08 17:56:06 +09:00
Yusuke Kuoka	1658f51fcb	Make Runner{Deployment,ReplicaSet} replicas actually optional (#186 ) If omitted, it properly defaults to 1. Fixes #64	2020-11-14 22:06:33 +09:00
Yusuke Kuoka	b63879f59f	Ensure the chart is passing acceptance tests	2020-11-14 21:58:16 +09:00
Yusuke Kuoka	6a4c29d30e	Set `ACCEPTANCE_TEST_DEPLOYMENT_TOOL=helm` to run acceptance tests with chart	2020-11-14 20:31:37 +09:00
Yusuke Kuoka	bbfe03f02b	Add acceptance test (#168 ) To ease verifying the controller to work before submitting/merging PRs and releasing a new version of the controller.	2020-11-14 20:07:14 +09:00

30 Commits