actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Yusuke Kuoka	94c089c407	Revert docker.sock path to /var/run/docker.sock (#2536 ) Starting ARC v0.27.2, we've changed the `docker.sock` path from `/var/run/docker.sock` to `/var/run/docker/docker.sock`. That resulted in breaking some container-based actions due to the hard-coded `docker.sock` path in various places. Even `actions/runner` seem to use `/var/run/docker.sock` for building container-based actions and for service containers? Anyway, this fixes that by moving the sock file back to the previous location. Once this gets merged, users stuck at ARC v0.27.1, previously upgraded to 0.27.2 or 0.27.3 and reverted back to v0.27.1 due to #2519, should be able to upgrade to the upcoming v0.27.4. Resolves #2519 Resolves #2538	2023-04-27 13:06:35 +09:00
Jonathan Wiemers	4536707af6	chart: Allow webhook server env to be set individually (#2377 ) Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2023-03-28 11:18:07 +09:00
Waldek Herka	13802c5a6d	chart: Restricting the RBAC rules on secrets (#2265 ) Co-authored-by: Waldek Herka <wherka-ama@users.noreply.github.com> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2023-03-28 08:43:33 +09:00
Zane Hala	65184f1ed8	chart: Allow customization of admission webhook timeout (#2398 ) Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2023-03-28 08:42:20 +09:00
dhawalseth	73e35b1dc6	chart: Create actionsmetrics.secrets.yaml (#2208 ) Co-authored-by: Dhawal Seth <dseth@linkedin.com> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2023-03-01 08:19:58 +09:00
Nikola Jokic	aa6dab5a9a	Changes to folder structure to allow multigroups and changed go mod name (#2105 ) * Changed folder structure to allow multi group registration * included actions.github.com directory for resources and controllers * updated go module to actions/actions-runner-controller * publish arc packages under actions-runner-controller * Update charts/actions-runner-controller/docs/UPGRADING.md Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-12-28 09:38:34 +09:00
Yusuke Kuoka	300e93c59d	Expose workflow job metrics via new actions-metrics-server (#2057 ) * Add workflow job metrics to Github webhook server * Fix handling of workflow_job.Conclusion * Make the prometheus metrics exporter for the workflow jobs a dedicated application * chart: Add support for deploying actions-metrics-server * A few improvements to make it easy to cover in E2E * chart: Add missing actionsmetrics.service.yaml * chart: Do not modify actionsMetricsServer.replicaCount * chart: Add documentation for actionsMetrics and actionsMetricsServer Co-authored-by: Colin Heathman <cheathman@benchsci.com>	2022-12-10 08:24:28 +09:00
Yusuke Kuoka	3ae9f09532	e2e: Do honor the runner graceful stop timeout also in the dockerd sidecar prestop hook (#2044 ) The runner graceful stop timeout has never been propagated to the dind sidecar due to configuration error in E2E. This fixes it, so that we can verify that the dind sidecar prestop can respect the graceful stop timeout. Related to #1759	2022-11-27 11:13:56 +09:00
Yusuke Kuoka	154fcde7d0	runner: Make WAIT_FOR_DOCKER_SECONDS configurable and working (#1999 ) * runner: Make WAIT_FOR_DOCKER_SECONDS configurable and working Ref #1830 Ref #1804 * Update acceptance/testdata/runnerdeploy.envsubst.yaml Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com> * Update docs/detailed-docs.md Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com> Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>	2022-11-22 12:08:54 +09:00
Claudio Vellage	3b36a81db6	Allow to set docker default address pool (#1971 ) * Allow to set docker default address pool * fixup! Allow to set docker default address pool Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> * Revert unnecessary chart ver bump * Update docs for DOCKER_DEFAULT_ADDRESS_POOL_* * Fix the dockerd default address pool scripts to actually work as probably intended * Update the E2E testdata runnerdeployment to accomodate the new docker default addr pool options * Correct default dockerd addr pool doc Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> Co-authored-by: Claudio Vellage <claudio.vellage@pm.me> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-11-05 14:46:32 +09:00
malachiobadeyi	fbdfe0df8c	1770 update log format and add additional fields to webhook server logs (#1771 ) * 1770 update log format and add runID and Id to worflow logs update tests, change log format for controllers.HorizontalRunnerAutoscalerGitHubWebhook use logging package remove unused modules add setup name to setuplog add flag to change log format change flag name to enableProdLogConfig move log opts to logger package remove empty else and reset timeEncoder update flag description use get function to handle nil rename flag and update logger function Update main.go Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com> Update controllers/horizontal_runner_autoscaler_webhook.go Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com> Update logging/logger.go Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com> copy log opt per each NewLogger call revert to use autoscaler.log update flag descript and remove unused imports add logFormat to readme rename setupLog to logger make fmt * Fix E2E along the way Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-11-04 10:46:58 +09:00
Yusuke Kuoka	c74ad6195f	Fix runners to do their best to gracefully stop on pod eviction (#1759 ) Ref #1535 Ref #1581 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-11-01 20:30:10 +09:00
Yusuke Kuoka	e4879e7ae4	Tweak E2E and documentation about MTU configuration	2022-09-25 07:50:12 +09:00
Yusuke Kuoka	f8e07c7fe4	e2e: Update RunnerSet template for rootless-dind test	2022-08-27 07:12:55 +00:00
Yusuke Kuoka	ebcd838501	e2e: Continuous rolling-update of runners while workflow jobs are running This should help revealing issues like https://github.com/actions-runner-controller/actions-runner-controller/issues/1535 if any.	2022-08-26 01:28:08 +00:00
Yusuke Kuoka	6ef276b239	e2e: Custom RBAC resources for make test success reporting work when k8s container mode or runner update hook is enabled	2022-08-26 01:28:08 +00:00
Yusuke Kuoka	4bf1c12a98	e2e: Fix inability to install the stable version of ARC before the edge / Validate GH tokenn on start (#1748 ) Let me improve two things I had found while I was E2E-testing ARC for the upcoming 0.26.0 release. Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-08-25 10:25:06 +09:00
Yusuke Kuoka	ea94b3cc5b	e2e: Add new option to test rootless docker (#1742 ) Related to #1644 Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com> Signed-off-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-08-24 10:42:45 +09:00
Yusuke Kuoka	b77489d098	Fix E2E to not fail due to missing storageclass for RunnerDeployment w/ kubernetes container mode (#1649 )	2022-07-17 19:43:13 +09:00
Yusuke Kuoka	4152afbd30	Fix E2E against local cluster to not fail on helm-upgrade (#1648 )	2022-07-17 19:43:01 +09:00
Yusuke Kuoka	544d620bc3	e2e: Ensure ARC is roll-updated on deployment even if the container image tag name does not change	2022-07-10 16:16:32 +09:00
Yusuke Kuoka	7e4b6ebd6d	chart: Add rbac.allowGrantingKubernetesContainerModePermissions	2022-07-10 16:16:32 +09:00
Yusuke Kuoka	473295e3fc	Enhance the E2E test to be runnable against remote clusters on e.g. AWS EKS (#1610 ) This contains apparently enough changes to the current E2E test code to make it runnable against remote Kubernetes clusters. I was actually able to make the test passing against my AWS EKS based test clusters with these changes. You still need to trigger it manually from a local checkout of the ARC repo today. But this might be the foundation for automated E2E tests against major cloud providers.	2022-07-07 20:48:07 +09:00
Yusuke Kuoka	2a475f25c7	Use Argo Tunnel for exposing the autoscaler's webhook server (#1595 ) I've been manually setting up Argo Tunnel to expose the webhook server while running E2E tests so that I can cover the webhook-based autoscaling. This automates the setup process so that we can automatiaclly bring up and down cloudflared before/after the test run, so that it can be a part of our upcoming automated E2E test.	2022-07-07 11:27:27 +09:00
Yusuke Kuoka	4446ba57e1	Cover ARC upgrade in E2E test (#1592 ) * Cover ARC upgrade in E2E test so that we can make it extra sure that you can upgrade the existing installation of ARC to the next and also (hopefully) it is backward-compatible, or at least it does not break immediately after upgrading. * Consolidate E2E tests for RS and RD * Fix E2E for RD to pass * Add some comment in E2E for how to release disk consumed after dozens of test runs	2022-07-01 21:32:05 +09:00
Yusuke Kuoka	b5d1a63bdf	Enhance the acceptance runnerset yaml template for manual E2E (#1587 ) The primary goal of this change is to let the tester know about the config difference between the explicitly configured ephemeral work volume vs the automatically configured work volume with workVolumeClaimTemplate+containerMode=kubernetes.	2022-06-29 22:15:50 +09:00
Yusuke Kuoka	63be0223ad	fix: Avoid duplicate volume and mount name error for generic ephemeral volume as "work" (#1471 ) * fix: Avoid duplicate volume and mount name error for generic ephemeral volume as "work" While manually testing configurations being documented in #1464, I discovered that the use of dynamic ephemeral volume for "work" directory was not working correctly due to the valiadation error. This fixes the runner pod generation logic to not add the default volume and volume mount for "work" dir, so that the error disappears. Ref #1464 * e2e: Ensure work generic ephemeral volume to work as expected	2022-05-22 10:25:50 +09:00
Yusuke Kuoka	b5194fd75a	Enhance RunnerSet to optionally retain PVs accross restarts (#1340 ) * Enhance RunnerSet to optionally retain PVs accross restarts This is our initial attempt to bring back the ability to retain PVs across runner pod restarts when using RunnerSet. The implementation is composed of two new controllers, `runnerpersistentvolumeclaim-controller` and `runnerpersistentvolume-controller`. It all starts from our existing `runnerset-controller`. The controller now tries to mark any PVCs created by StatefulSets created for the RunnerSet. Once the controller terminated statefulsets, their corresponding PVCs are clean up by `runnerpersistentvolumeclaim-controller`, then PVs are unbound from their corresponding PVCs by `runnerpersistentvolume-controller` so that they can be reused by future PVCs createf for future StatefulSets that shares the same same StorageClass. Ref #1286 * Update E2E test suite to cover runner, docker, and go caching with RunnerSet + PVs Ref #1286	2022-05-16 09:26:48 +09:00
Yusuke Kuoka	3b67ee727f	e2e: Fix wrong scale trigger configuration used in test (#1434 )	2022-05-12 09:19:58 +09:00
Yusuke Kuoka	e6bddcd238	Fix certain runnerset name in E2E and acceptance (#1435 )	2022-05-12 09:19:47 +09:00
Yusuke Kuoka	dabbc99c78	refactor(controller): stop auto-setting RUNNER_FEATURE_FLAG_EPHEMERAL (#1385 ) This feature flag was provided from ARC to runner container automatically to let it use `--ephemeral` instead of `--once` by default. As the support for `--once` is being dropped from the runner image via #1384, we no longer need that. Ref #1196	2022-05-11 11:42:55 +01:00
Yusuke Kuoka	c4b24f8366	Prevent static runners from terminating due to unregister timeout The unregister timeout of 1 minute (no matter how long it is) can negatively impact availability of static runner constantly running workflow jobs, and ephemeral runner that runs a long-running job. We deal with that by completely removing the unregistaration timeout, so that regarldess of the type of runner(static or ephemeral) it waits forever until it successfully to get unregistered before being terminated.	2022-03-13 07:26:36 +00:00
Yusuke Kuoka	83e550cde5	Experimetanl log level "-4" for logging every HTTP round-trip for GitHub API calls	2022-03-12 12:11:16 +00:00
Yusuke Kuoka	22ef7b3a71	acceptance,e2e: Fix deploy.sh and e2e_test.go for testing with GitHub App	2022-03-12 12:10:04 +00:00
Yusuke Kuoka	55ff4de79a	Remove legacy GitHub API cache of HRA.Status.CachedEntries (#1192 ) * Remove legacy GitHub API cache of HRA.Status.CachedEntries We migrated to the transport-level cache introduced in #1127 so not only this is useless, it is harder to deduce which cache resulted in the desired replicas number calculated by HRA. Just remove the legacy cache to keep it simple and easy to understand. * Deprecate githubAPICacheDuration helm chart value and the --github-api-cache-duration as well * Fix integration test	2022-03-08 19:05:43 +09:00
Yusuke Kuoka	14a878bfae	refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet	2022-03-06 05:53:26 +00:00
Yusuke Kuoka	59c3288e87	acceptance,e2e: Automate restarts of ARC pods in case image tag is not changed	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	3115d71471	acceptance,e2e: Enhance deploy.sh to support more types of runnersets	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	1463d4927f	acceptance,e2e: Let capacity reservation expired more later	2022-02-21 00:07:49 +00:00
Yusuke Kuoka	d4a9750e20	acceptance,e2e: Enhance E2E test and deploy.sh to support scaleDownDelaySeconds~ and minReplicas for HRA	2022-02-20 13:45:42 +00:00
Yusuke Kuoka	4e6bfd8114	e2e: Add ability to toggle dockerdWithinRunnerContainer	2022-02-20 04:37:15 +00:00
Yusuke Kuoka	f3ceccd904	acceptance: Improve deploy.sh to recreate ARC (not runner) pods on new test id So that one does not need to manually recreate ARC pods frequently.	2022-02-19 12:22:53 +00:00
Yusuke Kuoka	4b557dc54c	Add logging transport to log HTTP requests in log level -3 The log level -3 is the minimum log level that is supported today, smaller than debug(-1) and -2(used to log some HRA related logs). This commit adds a logging HTTP transport to log HTTP requests and responses to that log level. It implements http.RoundTripper so that it can log each HTTP request with useful metadata like `from_cache` and `ratelimit_remaining`. The former is set to `true` only when the logged request's response was served from ARC's in-memory cache. The latter is set to X-RateLimit-Remaining response header value if and only if the response was served by GitHub, not by ARC's cache.	2022-02-19 12:22:53 +00:00
Yusuke Kuoka	ba4bd7c0db	e2e,acceptance: Cover enterprise runners (#1124 ) Adds various code and changes I have used while testing #1062	2022-02-17 09:16:28 +09:00
Yusuke Kuoka	5b92c412a4	chart: Allow using different secrets for controller-manager and gh-webhook-server (#1122 ) * chart: Allow using different secrets for controller-manager and gh-webhook-server As it is entirely possible to do so because they are two different K8s deployments. It may provide better scalability because then each component gets its own GitHub API quota.	2022-02-17 09:16:16 +09:00
Yusuke Kuoka	a7b39cc247	acceptance: Avoid "metadata.annotations too long" errors on applying CRDs	2022-02-17 09:01:44 +09:00
Yusuke Kuoka	1e452358b4	acceptance: Do recreate the controller-manager secret on every deployment We had to manually remove the secret first to update the GitHub credentials used by the controller, which was cumbersome. Note that you still need to recreate the controller pods and the gh webhook server pods to let them remount the recreated secret.	2022-02-17 09:01:44 +09:00
Yusuke Kuoka	fabead8c8e	feat: Workflow job based ephemeral runner scaling (#721 ) This add support for two upcoming enhancements on the GitHub side of self-hosted runners, ephemeral runners, and `workflow_jow` events. You can't use these yet. These features are not yet generally available to all GitHub users. Please take this pull request as a preparation to make it available to actions-runner-controller users as soon as possible after GitHub released the necessary features on their end. Ephemeral runners: The former, ephemeral runners, is basically the reliable alternative to `--once`, which we've been using when you enabled `ephemeral: true` (default in actions-runner-controller). `--once` has been suffering from a race issue #466. `--ephemeral` fixes that. To enable ephemeral runners with `actions/runner`, you give `--ephemeral` to `config.sh`. This updated version of `actions-runner-controller` does it for you, by using `--ephemeral` instead of `--once` when you set `RUNNER_FEATURE_FLAG_EPHEMERAL=true`. Please read the section `Ephemeral Runners` in the updated version of our README for more information. Note that ephemeral runners is not released on GitHub yet. And `RUNNER_FEATURE_FLAG_EPHEMERAL=true` won't work at all until the feature gets released on GitHub. Stay tuned for an announcement from GitHub! `workflow_job` events: `workflow_job` is the additional webhook event that corresponds to each GitHub Actions workflow job run. It provides `actions-runner-controller` a solid foundation to improve our webhook-based autoscale. Formerly, we've been exploiting webhook events like `check_run` for autoscaling. However, as none of our supported events has included `labels`, you had to configure an HRA to only match relevant `check_run` events. It wasn't trivial. In contrast, a `workflow_job` event payload contains `labels` of runners requested. `actions-runner-controller` is able to automatically decide which HRA to scale by filtering the corresponding RunnerDeployment by `labels` included in the webhook payload. So all you need to use webhook-based autoscale will be to enable `workflow_job` on GitHub and expose actions-runner-controller's webhook server to the internet. Note that the current implementation of `workflow_job` support works in two ways, increment, and decrement. An increment happens when the webhook server receives` workflow_job` of `queued` status. A decrement happens when it receives `workflow_job` of `completed` status. The latter is used to make scaling-down faster so that you waste money less than before. You still don't suffer from flapping, as a scale-down is still subject to `scaleDownDelaySecondsAfterScaleOut `. Please read the section `Example 3: Scale on each `workflow_job` event` in the updated version of our README for more information on its usage.	2021-08-11 09:52:04 +09:00
Yusuke Kuoka	7a305d2892	e2e: Install and run workflow and verify the result (#661 ) This enhances the E2E test suite introduced in #658 to also include the following steps: - Install GitHub Actions workflow - Trigger a workflow run via a git commit - Verify the workflow run result In the workflow, we use `kubectl create cm --from-literal` to create a configmap that contains an unique test ID. In the last step we obtain the configmap from within the E2E test and check the test ID to match the expected one. To install a GitHub Actions workflow, we clone a GitHub repository denoted by the TEST_REPO envvar, progmatically generate a few files with some Go code, run `git-add`, `git-commit`, and then `git-push` to actually push the files to the repository. A single commit containing an updated workflow definition and an updated file seems to run a workflow derived to the definition introduced in the commit, which was a bit surpirising and useful behaviour. At this point, the E2E test fully covers all the steps for a GitHub token based installation. We need to add scenarios for more deployment options, like GitHub App, RunnerDeployment, HRA, and so on. But each of them would worth another pull request.	2021-06-28 08:30:32 +09:00
Yusuke Kuoka	acb906164b	RunnerSet: Automatic-recovery from registration timeout and deregistration on pod termination (#652 ) Ref #629 Ref #613 Ref #612	2021-06-24 20:39:37 +09:00

1 2

67 Commits