actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Yusuke Kuoka	473295e3fc	Enhance the E2E test to be runnable against remote clusters on e.g. AWS EKS (#1610 ) This contains apparently enough changes to the current E2E test code to make it runnable against remote Kubernetes clusters. I was actually able to make the test passing against my AWS EKS based test clusters with these changes. You still need to trigger it manually from a local checkout of the ARC repo today. But this might be the foundation for automated E2E tests against major cloud providers.	2022-07-07 20:48:07 +09:00
Yusuke Kuoka	9f6f962fc7	Add toubleshooting for cert-manager ca error (#1598 ) I encountered this once while E2E testing ARC with K8s 1.22 and cert-manager 1.1.1. The K8s version is too high / The cert-manager is too low so you generally need to fix either. In a standard scenario, it should be more feasible and meaningful to upgrade cert-manager to a recent enough version that supports the new Kubernetes version.	2022-07-07 11:27:49 +09:00
Yusuke Kuoka	2a475f25c7	Use Argo Tunnel for exposing the autoscaler's webhook server (#1595 ) I've been manually setting up Argo Tunnel to expose the webhook server while running E2E tests so that I can cover the webhook-based autoscaling. This automates the setup process so that we can automatiaclly bring up and down cloudflared before/after the test run, so that it can be a part of our upcoming automated E2E test.	2022-07-07 11:27:27 +09:00
Viktor Lindgren	dd9f25ea78	Update README.md (#1606 )	2022-07-06 08:57:54 +09:00
Yusuke Kuoka	b8e4eee904	Make it easier to E2E test on various K8s versions (#1599 )	2022-07-06 08:57:21 +09:00
Yusuke Kuoka	edbdef8d20	Bump chart version to 0.20.0 for ARC 0.25.0 (#1600 ) We'll be merging this immediately after ARC 0.25.0 gets released.	2022-07-05 11:19:24 +09:00
Nguyễn Đức Chiến	a190fa97bb	Fix helm charts (#1603 )	2022-07-05 10:35:57 +09:00
Yusuke Kuoka	bfc5ea4727	Fix a regression in webhook-based autoscaler (#1596 ) The regression resulted in the webhook-based autoscaler be unable to find visible runner groups and therefore unable to scale up and down the target RunnerDeployment/RunnerSet at all when the webhook-based autoscaler was provided GitHub API credentials to enable the runner groups support. This fixes that. The regression was introduced via #1578 which is not released yet. Users of existing ARC releases are therefore not affected.	2022-07-04 20:17:09 +09:00
renovate[bot]	5a9e8545aa	fix(deps): update golang.org/x/oauth2 digest to 2104d58 (#1593 ) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>	2022-07-02 14:06:21 +09:00
Yusuke Kuoka	4446ba57e1	Cover ARC upgrade in E2E test (#1592 ) * Cover ARC upgrade in E2E test so that we can make it extra sure that you can upgrade the existing installation of ARC to the next and also (hopefully) it is backward-compatible, or at least it does not break immediately after upgrading. * Consolidate E2E tests for RS and RD * Fix E2E for RD to pass * Add some comment in E2E for how to release disk consumed after dozens of test runs	2022-07-01 21:32:05 +09:00
Martin Moon (문성주)	d62c8a4697	fix typo. (#1594 )	2022-07-01 10:24:41 +09:00
Yusuke Kuoka	946d5b1fa7	Add release note for v0.25.0 (#1591 )	2022-06-30 22:11:22 +09:00
renovate[bot]	da6b07660e	fix(deps): update golang.org/x/oauth2 digest to 02e64fa (#1480 ) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>	2022-06-30 11:32:26 +09:00
Callum Tait	e3deb0d752	chore: move runner docker check (#1548 )	2022-06-30 11:31:50 +09:00
Callum Tait	82641e5036	chore: move HOME to more logical place (#1460 ) * chore: move HOME to more logical place * chore: don't break the PATH * chore: don't break the PATH Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>	2022-06-30 11:21:05 +09:00
Vladyslav Miletskyi	2fe6adf5b7	Runner Entrypoint: fix daemon.json (#1409 ) * Runner Entrypoint: fix daemon.json Do not owerwrite daemon.json if it already exists. Usage: custom images, which are using public image as source. * Update runner/startup.sh Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com> Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>	2022-06-30 11:03:12 +09:00
renovate[bot]	736126b793	chore(deps): update helm values quay.io/brancz/kube-rbac-proxy to v0.13.0 (#1589 ) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>	2022-06-30 09:51:38 +09:00
renovate[bot]	6abf5bbac8	fix(deps): update module github.com/stretchr/testify to v1.8.0 (#1584 ) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>	2022-06-30 09:50:55 +09:00
Yusuke Kuoka	dc4f116bda	Reflect manual test scenario for containerMode=kubernetes to E2E (#1588 ) With this my semi-automatic E2E manual testing becomes even easier :)	2022-06-30 09:09:58 +09:00
Callum Tait	cda10fd243	docs: remove once feature flag env var (#1590 )	2022-06-30 09:09:37 +09:00
Yusuke Kuoka	b5d1a63bdf	Enhance the acceptance runnerset yaml template for manual E2E (#1587 ) The primary goal of this change is to let the tester know about the config difference between the explicitly configured ephemeral work volume vs the automatically configured work volume with workVolumeClaimTemplate+containerMode=kubernetes.	2022-06-29 22:15:50 +09:00
Yusuke Kuoka	6f3e23973d	Bump E2E runner version to 2.294.0 (#1586 ) so that every runner does not result in auto-updating itself on startup in E2E, which makes E2E take longer to complete.	2022-06-29 22:05:50 +09:00
Yusuke Kuoka	a517c1ff66	Fix old runner pods stuck in Terminating since #1579 (#1585 ) Ref #1579	2022-06-29 22:02:42 +09:00
Yusuke Kuoka	9b28e633c1	Drop support for --once (#1580 ) Ref #1196	2022-06-29 21:49:52 +09:00
Yusuke Kuoka	8161136cbd	Fix PercentageRunnersBusy scaling delay (#1579 ) * Use a dedicated pod label to say it is a runner pod Follow-up for #1546 * Fix PercentageRunnersBusy scaling delay Ref #1374	2022-06-29 20:49:21 +09:00
Nikola Jokic	a9ac5a1cbf	extracted validations to a single point (#1582 )	2022-06-29 20:32:00 +09:00
Callum Tait	d4f35cff4f	ci: add paths to push trigger (#1583 )	2022-06-29 20:30:07 +09:00
Yusuke Kuoka	f661249f07	Use the go-github impl of ListRunnerGroups with visible_to_repository (#1578 ) Ref #1402	2022-06-29 09:53:03 +01:00
Mike	73e430ce54	Add a solution to InternalError webhook context timedout (#1558 ) * added troubleshooting solution * added error example * added entry to the pages index * sorted Co-authored-by: Mike Joseph <mike@Mikes-MacBook-Pro-5618.local> Co-authored-by: Mike Joseph <mike@efrontier.com>	2022-06-29 09:40:23 +09:00
renovate[bot]	858ef8979d	chore(deps): update helm/kind-action action to v1.3.0 (#1532 ) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>	2022-06-29 09:05:26 +09:00
renovate[bot]	1ce0a183a6	chore(deps): update azure/setup-helm action to v3 (#1571 ) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>	2022-06-29 09:04:33 +09:00
renovate[bot]	63935d2053	fix(deps): update module github.com/stretchr/testify to v1.7.5 (#1510 ) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>	2022-06-29 08:39:09 +09:00
John Delivuk	fc63d6d26e	Fix: Match Ingress API Version correctly. (#1541 ) * Updating conditional to match the api version and kind mend * Updating conditional to match the api version and kind mend	2022-06-29 08:30:11 +09:00
renovate[bot]	5ea08411e6	chore(deps): update dependency golang to v1.18.3 (#1509 ) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>	2022-06-29 08:29:14 +09:00
Giuseppe Crinò	067ed2e5ec	docs: fix logic explanation for scale down delay (#1562 ) Signed-off: Giuseppe Crinò <giuscri@gmail.com>	2022-06-29 08:26:28 +09:00
renovate[bot]	d86bd2bcd7	fix(deps): update module sigs.k8s.io/controller-runtime to v0.12.2 (#1449 ) * fix(deps): update module sigs.k8s.io/controller-runtime to v0.12.2 * Regenerate manfiests with the updated k8s and controller-runtime deps Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-06-29 06:42:17 +09:00
Yusuke Kuoka	ddd417f756	Bump go-github to v45 (#1573 ) * Bump go-github to v45 Ref #1402 * fixup! Bump go-github to v45	2022-06-29 06:34:58 +09:00
Thomas Boop	0386c0734c	`containerMode` option to allow running jobs in k8's instead of docker (#1546 ) * added containerMode=kubernetes env variables to the runner * removed unused logging * restored configs and charts * restored makefile cert version and acceptance/run * added workVolumeClaimTemplate in pod definition, including logic * added claim template name based on the runner * Apply suggestions from code review update errors * added concurrent cleanup before runner pod is deleted * update manifests * added retry after 30s if pod cleanup contains err * added admission webhook check, made workVolumeClaimTemplate mandatory for k8s * style changes and added comments * added izZero timestamp check for deleting runner-linked pods * changed order of local variable to avoid copy if p is deleted * removed docker from container mode k8s * restored charts, config, makefile * restored forked files back and not the ARC ones * created PersistentVolume on containerMode k8s * create pv only if storage class name is local-storage * removed actions if storage class name is local-storage * added service account validation if container mode kubernetes * changed the coding style to match rest of the ARC * added validation to the runnerdeployment webhook * specified fields more precisely, added webhook validation to the replicaset as well * remake manifests * wraped delete runner-linked-pods in kube mode * fixed empty line * fixed import * makefile changes for hooks * added cleanup secrets * create manifests * docs * update access modes * update dockerfile * nit changes * fixed dockerfile * rewrite allowing reuse for runners and runnersets * deepcopy forgot to stage * changed privileged * make manifests * partly moved to finalizer, still need to apply finalizer first * finalizer added if env variable used in container mode exists * bump runner version * error message moved from Error to Info on cleanup pods/secrets * removed useless dereferencing, added transformation tests of workVolumeClaimTemplate * Apply suggestions from code review * Update controllers/utils_test.go Co-authored-by: Thomas Boop <52323235+thboop@users.noreply.github.com> * Update controllers/utils_test.go Co-authored-by: Thomas Boop <52323235+thboop@users.noreply.github.com> * add hook version to cli, update to 0.1.2 * Apply suggestions from code review * Update controllers/utils_test.go * Update runner/Makefile * Fix missing secret permission and the error handling * Fix a runnerpod reconciler finalizer to not trigger unnecessary retry Co-authored-by: Nikola Jokic <nikola-jokic@github.com> Co-authored-by: Nikola Jokic <97525037+nikola-jokic@users.noreply.github.com> Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-06-28 14:12:40 +09:00
Yusuke Kuoka	af96de6184	Fix completed runner pod recreation not to be blocked after max out (#1568 ) Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/1477#issuecomment-1164154496	2022-06-28 13:50:07 +09:00
Arnaud	abb8615796	Webhook server configuration with kustomize (#1312 ) * webhook server configuration with kustomize * Update README.md * Update README.md * Update README.md Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-06-28 09:08:25 +09:00
Sam Weston	bc7a3cab1b	Add priorityClassName to CRDs (#1513 ) * Add pod priorityClassName to controller and crds * Add missing bits in bases directory * Regenerate crds	2022-06-28 08:45:19 +09:00
Yusuke Kuoka	e2c8163b8c	Make webhook-based scale race-free (#1477 ) * Make webhook-based scale operation asynchronous This prevents race condition in the webhook-based autoscaler when it received another webhook event while processing another webhook event and both ended up scaling up the same horizontal runner autoscaler. Ref #1321 * Fix typos * Update rather than Patch HRA to avoid race among webhook-based autoscaler servers * Batch capacity reservation updates for efficient use of apiserver * Fix potential never-ending HRA update conflicts in batch update * Extract batchScaler out of webhook-based autoscaler for testability * Fix log levels and batch scaler hang on start * Correlate webhook event with scale trigger amount in logs * Fix log message	2022-06-27 18:31:48 +09:00
Callum Tait	84d16c1c12	revert: "Overhauled `startup.sh` Script (#1454 )" (#1561 ) This reverts commit `071898c96b`.	2022-06-23 12:39:32 +01:00
Richard Fussenegger	071898c96b	Overhauled `startup.sh` Script (#1454 ) This overhaul turns it into a shellcheck valid script with explicit error handling for all possible situations I could think of. This change takes https://github.com/actions-runner-controller/actions-runner-controller/pull/1409 into account and things can be merged in any order. There are a few important changes here to the logic: - The wait logic for checking if docker comes up was fundamentally flawed because it checks for the PID. Docker will always come up and thus become visible in the process list, just to immediately die when it encounters an issue, after which supervisor starts it again. This means that our check so far is flaky due to the `sleep 1` it might encounter a PID, or it might not, and the existence of the PID does not mean anything. The `docker ps` check we have in the `entrypoint.sh` script does not suffer from this as it checks for a feature of docker and not a PID. I thus entirely removed the PID check, and instead I am handing things over to our `entrypoint.sh` script by setting the environment variables correctly. - This change has an influence on the `docker0` interface MTU configuration, because the interface might or might not exist after we started docker. Hence, I changed this to a time boxed loop that tries for one minute to set up the interface's MTU. In case the command fails we log an error and continue with the run. - I changed the entire MTU handling by validating its value before configuring it, logging an error and continuing without if it is set incorrectly. This ensures that we are not going to send our users on a bug hunt. - The way we started supervisord did not make much sense to me. It sends itself into the background automatically, there is no need for us to do so with Bash. The decision to not fail on errors but continue is a deliberate choice, because I believe that running a build is more important than having a perfectly configured system. However, this strategy might also hide issues for all users who are not properly checking their logs. It also makes testing harder. Hence, we could change all error conditions from graceful to panicking. We should then align the exit codes across `startup.sh` and `entrypoint.sh` to ensure that every possible error condition has its own unique error code for easy debugging.	2022-06-23 09:37:01 +09:00
renovate[bot]	f24e2fa44e	chore(deps): update dependency actions/runner to v2.294.0	2022-06-22 21:45:32 +00:00
Callum Tait	3c7d3d6b57	ci: hardcode dockerhub username (#1555 )	2022-06-22 16:15:50 +01:00
Callum Tait	23f091d7fa	ci: don't login on a pr (#1554 ) * ci: don't login on a pr Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>	2022-06-22 16:03:36 +01:00
Callum Tait	667764e027	chore: suggest gist first (#1539 )	2022-06-18 17:38:37 +09:00
Callum Tait	de693c4191	ci: runners trigger on push (#1549 ) * ci: runners trigger on push * ci: comments * ci: comments	2022-06-18 17:34:40 +09:00
Callum Tait	510fc9c834	ci: add GitHub packages to arc release (#1525 ) * ci: add GitHub packages to arc release * ci: use restrictive permissions	2022-06-15 11:37:19 +09:00

1 2 3 4 5 ...

978 Commits All Branches Search

978 Commits

All Branches