Commit Graph

1088 Commits

Author SHA1 Message Date
toast-gear 409dc4c114 docs: remove ephemeral and simplify 2022-07-11 10:43:09 +09:00
toast-gear 4b9a6c6700 docs: remove runner kind 2022-07-11 10:43:09 +09:00
Yusuke Kuoka 86e1a4a8f3 Fix helm lint error and the unability to install the chart with the default values 2022-07-10 16:16:32 +09:00
Yusuke Kuoka 544d620bc3 e2e: Ensure ARC is roll-updated on deployment even if the container image tag name does not change 2022-07-10 16:16:32 +09:00
Yusuke Kuoka 1cfe1974c4 Add missing job-related permissions to runner pods with k8s container mode 2022-07-10 16:16:32 +09:00
Yusuke Kuoka 7e4b6ebd6d chart: Add rbac.allowGrantingKubernetesContainerModePermissions 2022-07-10 16:16:32 +09:00
Felipe Galindo Sanchez 11cb9b7882
feat: allow to discover runner statuses (#1268)
* feat: allow to discover runner statuses

* fix manifests

* Bump runner version to 2.289.1 which includes the hooks support

* Add feedback from review

* Update reference to newRunnerPod

* Fix TestNewRunnerPodFromRunnerController and make hooks file names job specific

* Fix additional TestNewRunnerPod test

* Cover additional feedback from review

* fix rbac manager role

* Add permissions to service account for container mode if not provided

* Rename flag to runner.statusUpdateHook.enabled and fix needsServiceAccount

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-07-10 15:11:29 +09:00
Tamás Kádár 10b88bf070
Fix typos in README (#1613) 2022-07-10 08:49:35 +09:00
Callum Tait 8b619e7c6f
chore: bump helm chart (#1619) 2022-07-10 08:25:55 +09:00
everpcpc fea1457f12
fix: annotate pod instead of label on UnregistrationFailureMessage (#1615)
The error message should go to annotation instead of pod label, since we only check annotations on autoscale:

https://github.com/actions-runner-controller/actions-runner-controller/blob/master/controllers/autoscaling.go#L338-L342

Then there is no need to truncate or normalize the error message.
2022-07-09 11:45:05 +09:00
Yusuke Kuoka 473295e3fc
Enhance the E2E test to be runnable against remote clusters on e.g. AWS EKS (#1610)
This contains apparently enough changes to the current E2E test code to make it runnable against remote Kubernetes clusters. I was actually able to make the test passing against my AWS EKS based test clusters with these changes. You still need to trigger it manually from a local checkout of the ARC repo today. But this might be the foundation for automated E2E tests against major cloud providers.
2022-07-07 20:48:07 +09:00
Yusuke Kuoka 9f6f962fc7
Add toubleshooting for cert-manager ca error (#1598)
I encountered this once while E2E testing ARC with K8s 1.22 and cert-manager 1.1.1. The K8s version is too high / The cert-manager is too low so you generally need to fix either. In a standard scenario, it should be more feasible and meaningful to upgrade cert-manager to a recent enough version that supports the new Kubernetes version.
2022-07-07 11:27:49 +09:00
Yusuke Kuoka 2a475f25c7
Use Argo Tunnel for exposing the autoscaler's webhook server (#1595)
I've been manually setting up Argo Tunnel to expose the webhook server while running E2E tests so that I can cover the webhook-based autoscaling. This automates the setup process so that we can automatiaclly bring up and down cloudflared before/after the test run, so that it can be a part of our upcoming automated E2E test.
2022-07-07 11:27:27 +09:00
Viktor Lindgren dd9f25ea78
Update README.md (#1606) 2022-07-06 08:57:54 +09:00
Yusuke Kuoka b8e4eee904
Make it easier to E2E test on various K8s versions (#1599) 2022-07-06 08:57:21 +09:00
Yusuke Kuoka edbdef8d20
Bump chart version to 0.20.0 for ARC 0.25.0 (#1600)
We'll be merging this immediately after ARC 0.25.0 gets released.
2022-07-05 11:19:24 +09:00
Nguyễn Đức Chiến a190fa97bb
Fix helm charts (#1603) 2022-07-05 10:35:57 +09:00
Yusuke Kuoka bfc5ea4727
Fix a regression in webhook-based autoscaler (#1596)
The regression resulted in the webhook-based autoscaler be unable to find visible runner groups and therefore unable to scale up and down the target RunnerDeployment/RunnerSet at all when the webhook-based autoscaler was provided GitHub API credentials to enable the runner groups support. This fixes that.

The regression was introduced via #1578 which is not released yet. Users of existing ARC releases are therefore not affected.
2022-07-04 20:17:09 +09:00
renovate[bot] 5a9e8545aa
fix(deps): update golang.org/x/oauth2 digest to 2104d58 (#1593)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-07-02 14:06:21 +09:00
Yusuke Kuoka 4446ba57e1
Cover ARC upgrade in E2E test (#1592)
* Cover ARC upgrade in E2E test

so that we can make it extra sure that you can upgrade the existing installation of ARC to the next and also (hopefully) it is backward-compatible, or at least it does not break immediately after upgrading.

* Consolidate E2E tests for RS and RD

* Fix E2E for RD to pass

* Add some comment in E2E for how to release disk consumed after dozens of test runs
2022-07-01 21:32:05 +09:00
Martin Moon (문성주) d62c8a4697
fix typo. (#1594) 2022-07-01 10:24:41 +09:00
Yusuke Kuoka 946d5b1fa7
Add release note for v0.25.0 (#1591) 2022-06-30 22:11:22 +09:00
renovate[bot] da6b07660e
fix(deps): update golang.org/x/oauth2 digest to 02e64fa (#1480)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-30 11:32:26 +09:00
Callum Tait e3deb0d752
chore: move runner docker check (#1548) 2022-06-30 11:31:50 +09:00
Callum Tait 82641e5036
chore: move HOME to more logical place (#1460)
* chore: move HOME to more logical place

* chore: don't break the PATH

* chore: don't break the PATH

Co-authored-by: toast-gear <toast-gear@users.noreply.github.com>
2022-06-30 11:21:05 +09:00
Vladyslav Miletskyi 2fe6adf5b7
Runner Entrypoint: fix daemon.json (#1409)
* Runner Entrypoint: fix daemon.json

Do not owerwrite daemon.json if it already exists.
Usage: custom images, which are using public image as source.

* Update runner/startup.sh

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>

Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-06-30 11:03:12 +09:00
renovate[bot] 736126b793
chore(deps): update helm values quay.io/brancz/kube-rbac-proxy to v0.13.0 (#1589)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-30 09:51:38 +09:00
renovate[bot] 6abf5bbac8
fix(deps): update module github.com/stretchr/testify to v1.8.0 (#1584)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-30 09:50:55 +09:00
Yusuke Kuoka dc4f116bda
Reflect manual test scenario for containerMode=kubernetes to E2E (#1588)
With this my semi-automatic E2E manual testing becomes even easier :)
2022-06-30 09:09:58 +09:00
Callum Tait cda10fd243
docs: remove once feature flag env var (#1590) 2022-06-30 09:09:37 +09:00
Yusuke Kuoka b5d1a63bdf
Enhance the acceptance runnerset yaml template for manual E2E (#1587)
The primary goal of this change is to let the tester know about the config difference between the explicitly configured ephemeral work volume vs the automatically configured work volume with workVolumeClaimTemplate+containerMode=kubernetes.
2022-06-29 22:15:50 +09:00
Yusuke Kuoka 6f3e23973d
Bump E2E runner version to 2.294.0 (#1586)
so that every runner does not result in auto-updating itself on startup in E2E, which makes E2E take longer to complete.
2022-06-29 22:05:50 +09:00
Yusuke Kuoka a517c1ff66
Fix old runner pods stuck in Terminating since #1579 (#1585)
Ref #1579
2022-06-29 22:02:42 +09:00
Yusuke Kuoka 9b28e633c1
Drop support for --once (#1580)
Ref #1196
2022-06-29 21:49:52 +09:00
Yusuke Kuoka 8161136cbd
Fix PercentageRunnersBusy scaling delay (#1579)
* Use a dedicated pod label to say it is a runner pod

Follow-up for #1546

* Fix PercentageRunnersBusy scaling delay

Ref #1374
2022-06-29 20:49:21 +09:00
Nikola Jokic a9ac5a1cbf
extracted validations to a single point (#1582) 2022-06-29 20:32:00 +09:00
Callum Tait d4f35cff4f
ci: add paths to push trigger (#1583) 2022-06-29 20:30:07 +09:00
Yusuke Kuoka f661249f07
Use the go-github impl of ListRunnerGroups with visible_to_repository (#1578)
Ref #1402
2022-06-29 09:53:03 +01:00
Mike 73e430ce54
Add a solution to InternalError webhook context timedout (#1558)
* added troubleshooting solution

* added error example

* added entry to the pages index

* sorted

Co-authored-by: Mike Joseph <mike@Mikes-MacBook-Pro-5618.local>
Co-authored-by: Mike Joseph <mike@efrontier.com>
2022-06-29 09:40:23 +09:00
renovate[bot] 858ef8979d
chore(deps): update helm/kind-action action to v1.3.0 (#1532)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-29 09:05:26 +09:00
renovate[bot] 1ce0a183a6
chore(deps): update azure/setup-helm action to v3 (#1571)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-29 09:04:33 +09:00
renovate[bot] 63935d2053
fix(deps): update module github.com/stretchr/testify to v1.7.5 (#1510)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-29 08:39:09 +09:00
John Delivuk fc63d6d26e
Fix: Match Ingress API Version correctly. (#1541)
* Updating conditional to match the api version and kind

mend

* Updating conditional to match the api version and kind

mend
2022-06-29 08:30:11 +09:00
renovate[bot] 5ea08411e6
chore(deps): update dependency golang to v1.18.3 (#1509)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2022-06-29 08:29:14 +09:00
Giuseppe Crinò 067ed2e5ec
docs: fix logic explanation for scale down delay (#1562)
Signed-off: Giuseppe Crinò <giuscri@gmail.com>
2022-06-29 08:26:28 +09:00
renovate[bot] d86bd2bcd7
fix(deps): update module sigs.k8s.io/controller-runtime to v0.12.2 (#1449)
* fix(deps): update module sigs.k8s.io/controller-runtime to v0.12.2

* Regenerate manfiests with the updated k8s and controller-runtime deps

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-06-29 06:42:17 +09:00
Yusuke Kuoka ddd417f756
Bump go-github to v45 (#1573)
* Bump go-github to v45

Ref #1402

* fixup! Bump go-github to v45
2022-06-29 06:34:58 +09:00
Thomas Boop 0386c0734c
`containerMode` option to allow running jobs in k8's instead of docker (#1546)
* added containerMode=kubernetes env variables to the runner

* removed unused logging

* restored configs and charts

* restored makefile cert version and acceptance/run

* added workVolumeClaimTemplate in pod definition, including logic

* added claim template name based on the runner

* Apply suggestions from code review

update errors

* added concurrent cleanup before runner pod is deleted

* update manifests

* added retry after 30s if pod cleanup contains err

* added admission webhook check, made workVolumeClaimTemplate mandatory for k8s

* style changes and added comments

* added izZero timestamp check for deleting runner-linked pods

* changed order of local variable to avoid copy if p is deleted

* removed docker from container mode k8s

* restored charts, config, makefile

* restored forked files back and not the ARC ones

* created PersistentVolume on containerMode k8s

* create pv only if storage class name is local-storage

* removed actions if storage class name is local-storage

* added service account validation if container mode kubernetes

* changed the coding style to match rest of the ARC

* added validation to the runnerdeployment webhook

* specified fields more precisely, added webhook validation to the replicaset as well

* remake manifests

* wraped delete runner-linked-pods in kube mode

* fixed empty line

* fixed import

* makefile changes for hooks

* added cleanup secrets

* create manifests

* docs

* update access modes

* update dockerfile

* nit changes

* fixed dockerfile

* rewrite allowing reuse for runners and runnersets

* deepcopy forgot to stage

* changed privileged

* make manifests

* partly moved to finalizer, still need to apply finalizer first

* finalizer added if env variable used in container mode exists

* bump runner version

* error message moved from Error to Info on cleanup pods/secrets

* removed useless dereferencing, added transformation tests of workVolumeClaimTemplate

* Apply suggestions from code review

* Update controllers/utils_test.go

Co-authored-by: Thomas Boop <52323235+thboop@users.noreply.github.com>

* Update controllers/utils_test.go

Co-authored-by: Thomas Boop <52323235+thboop@users.noreply.github.com>

* add hook version to cli, update to 0.1.2

* Apply suggestions from code review

* Update controllers/utils_test.go

* Update runner/Makefile

* Fix missing secret permission and the error handling

* Fix a runnerpod reconciler finalizer to not trigger unnecessary retry

Co-authored-by: Nikola Jokic <nikola-jokic@github.com>
Co-authored-by: Nikola Jokic <97525037+nikola-jokic@users.noreply.github.com>
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-06-28 14:12:40 +09:00
Yusuke Kuoka af96de6184
Fix completed runner pod recreation not to be blocked after max out (#1568)
Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/1477#issuecomment-1164154496
2022-06-28 13:50:07 +09:00
Arnaud abb8615796
Webhook server configuration with kustomize (#1312)
* webhook server configuration with kustomize

* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-06-28 09:08:25 +09:00