actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Yusuke Kuoka	83e550cde5	Experimetanl log level "-4" for logging every HTTP round-trip for GitHub API calls	2022-03-12 12:11:16 +00:00
Yusuke Kuoka	22ef7b3a71	acceptance,e2e: Fix deploy.sh and e2e_test.go for testing with GitHub App	2022-03-12 12:10:04 +00:00
Yusuke Kuoka	28fccbcecd	Fix GitHub API cache to work with GitHub App authentication The version of `bradleyfalzon/ghinstallation` which is used to enable GitHub App authentication turned out to add an extra header `application/vnd.github.machine-man-preview+json` to every HTTP request. That revealed an edge-case in our HTTP cache layer `gregjones/httpcache` that results it to not serve responses from cache when it should. There were two problems. One was that it does not support multi-valued header and it only looked for the first value for each header, and another is that it does not support any http.RoundTripper implementation that modifies HTTP request headers in a RoundTrip function call. I fixed it in my fork of httpcache, which is hosted at https://github.com/actions-runner-controller/httpcache. The relevant commits are: - `70d975e77d` - `197a8a3546` This can be considered as a follow-up for #1127, which turned out to have enabled the cache only for the case that ARC uses PAT for authentication. Since this fix, the cache is also enabled when ARC authenticates as a GitHub App.	2022-03-12 11:14:16 +00:00
Yusuke Kuoka	9628bb2937	Prevent RemoveRunner spam on busy ephemeral runner scale down (#1204 ) Since #1127 and #1167, we had been retrying `RemoveRunner` API call on each graceful runner stop attempt when the runner was still busy. There was no reliable way to throttle the retry attempts. The combination of these resulted in ARC spamming RemoveRunner calls(one call per reconciliation loop but the loop runs quite often due to how the controller works) when it failed once due to that the runner is in the middle of running a workflow job. This fixes that, by adding a few short-circuit conditions that would work for ephemeral runners. An ephemeral runner can unregister itself on completion so in most of cases ARC can just wait for the runner to stop if it's already running a job. As a RemoveRunner response of status 422 implies that the runner is running a job, we can use that as a trigger to start the runner stop waiter. The end result is that 422 errors will be observed at most once per the whole graceful termination process of an ephemeral runner pod. RemoveRunner API calls are never retried for ephemeral runners. ARC consumes less GitHub API rate limit budget and logs are much cleaner than before. Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/1167#issuecomment-1064213271	2022-03-11 19:03:17 +09:00
Renovate Bot	736a53fed6	fix(deps): update golang.org/x/oauth2 commit hash to 6242fa9	2022-03-10 08:39:51 +09:00
yourmoonlight	132faa13a1	docs: fix the helm command for webhook installation (#1188 ) * fix doc for install the webhook server * modify cmd with single set && add double quote for zsh users	2022-03-08 17:59:01 +00:00
Callum Tait	66e070f798	docs: remove githubAPICacheDuration from docs (#1194 )	2022-03-08 13:27:30 +00:00
Yusuke Kuoka	55ff4de79a	Remove legacy GitHub API cache of HRA.Status.CachedEntries (#1192 ) * Remove legacy GitHub API cache of HRA.Status.CachedEntries We migrated to the transport-level cache introduced in #1127 so not only this is useless, it is harder to deduce which cache resulted in the desired replicas number calculated by HRA. Just remove the legacy cache to keep it simple and easy to understand. * Deprecate githubAPICacheDuration helm chart value and the --github-api-cache-duration as well * Fix integration test	2022-03-08 19:05:43 +09:00
Yusuke Kuoka	301439b06a	chore: Change log ts format to RFC3339 (#1191 ) The TimeEncoder for zap seems to have been set to EpochTimeEncoder which is the default and it was not very readable. Changing it to a TimeEncoderOfLayout(time.RFC3339) for readability. Another benefit of doing this is the ts format is now consistent with various timestamps ARC put into pod and other custom resource annotations.	2022-03-08 10:34:52 +09:00
Yusuke Kuoka	15ee6d6360	chore: Reorganize "Calculated desired replicas log fields (#1190 ) So that `max` is emitted immediately after `min`, which is the counterpart of it.	2022-03-08 10:29:53 +09:00
Felipe Galindo Sanchez	5b899f578b	fix(chart): allow to use basic auth when authSecret.create is false (#1149 ) * fix(chart): allow to use basic auth when authSecret.create is false When secret is created outside of the ARC chart using authSecret.create=false and basicAuth, the controller fails as we're not including the basic password as environment variable as the password value won't be inside the helm values. This PR includes both environment variables for consistent regardless if those are set or not similar as the rest of the other auth options (e.g app_id, private key, etc) * chart: Add back the conditional block for .Values.authSecret.github_basicauth_username Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-03-07 10:07:24 +09:00
Yusuke Kuoka	d8c9eb7ba7	Fix arm64 image (#1185 ) Fixes #1184	2022-03-07 10:00:20 +09:00
Yusuke Kuoka	cbbc383a80	Auto-correct replicas number on missing webhook_job completion event (#1180 ) While testing #1179, I discovered that ARC sometimes stop resyncing RunnerReplicaSet when the desired replicas is greater than the actual number of runner pods. This seems to happen when ARC missed receiving a workflow_job completion event but it has no way to decide if it is either (1) something went wrong on ARC or (2) a loadbalancer in the middle or GitHub or anything not ARC went wrong. It needs a standard to decide it, or if it's not impossible, how to deal with it. In this change, I added a hard-coded 10 minutes timeout(can be made customizable later) to prevent runner pod recreation. Now, a RunnerReplicaSet/RunnerSet to restart runner pod recreation 10 minutes after the last scale-up. If the workflow completion event arrived after the timeout, it will decrease the desired replicas number that results in the removal of a runner pod. The removed runner pod might be deleted without ever being used, but I think that's better than leaving the desired replicas and the actual number of replicas diverged forever.	2022-03-07 09:35:13 +09:00
seplak	b57e885a73	Fix service account typo in Helm README (#1183 ) Just fixing a typo I discovered while reading through the README.	2022-03-07 08:39:01 +09:00
Yusuke Kuoka	bed927052d	Merge pull request #1179 from actions-runner-controller/refactor-runner-and-runnerset Refactor Runner and RunnerSet so that they use the same library code that powers RunnerSet. RunnerSet is StatefulSet-based and RunnerSet/Runner is Pod-based so it had been hard to unify the implementation although they look very similar in many aspects. This change finally resolves that issue, by first introducing a library that implements the generic logic that is used to reconcile RunnerSet, then adding an adapter that can be used to let the generic logic manage runner pods via Runner, instead of via StatefulSet. Follow-up for #1127, #1167, and 1178	2022-03-06 15:56:51 +09:00
Yusuke Kuoka	14a878bfae	refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet	2022-03-06 05:53:26 +00:00
Yusuke Kuoka	c95e84a528	refactor: Extract runner pod owner management out of runnerset controller so that it can potentially be reusable from runnerreplicaset controller	2022-03-05 12:18:02 +00:00
Yusuke Kuoka	95a5770d55	Fix regression that registration-timeout check was not working for runnerset (#1178 ) Follow-up for #1167	2022-03-05 19:31:05 +09:00
Yusuke Kuoka	9cc9f8c182	chore: Add a few comments to runnerset and runnerpod controllers to help potential contributors	2022-03-05 05:41:56 +00:00
Yusuke Kuoka	b7c5611516	dockerfile: Fix unintended removal of CGO_ENABLED=0	2022-03-05 05:41:56 +00:00
Yusuke Kuoka	138e326705	chore: Add comment on lastSyncTime in runnerset controller	2022-03-05 05:41:56 +00:00
Renovate Bot	c21fa75afa	fix(deps): update kubernetes packages to v0.23.4	2022-03-04 08:39:18 +09:00
Yusuke Kuoka	34483e268f	ci: Enable actions/cache for Go modules	2022-03-03 18:47:54 +09:00
Yusuke Kuoka	5f2b5327f7	integration: Reduce error logs to ease debugging	2022-03-03 18:47:54 +09:00
renovate[bot]	a93b2fdad4	fix(deps): update golang.org/x/oauth2 commit hash to ee48083 (#1150 ) fix(deps): update golang.org/x/oauth2 commit hash to ee48083 Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com> Co-authored-by: Renovate Bot <bot@renovateapp.com>	2022-03-03 18:00:43 +09:00
Yusuke Kuoka	25570a0c6d	Fix docker build	2022-03-03 02:05:38 +00:00
Felipe Galindo Sanchez	d20ad71071	Fix minor log in runner controller (#1175 ) Log is mentioning registration only but this is about the standard runner pod	2022-03-03 09:51:30 +09:00
Daniel	8a379ac94b	Add custom volume mount documentation (#1045 ) one example for in-memory and one example for NVME backed storage, also pointing out all the current flaws/risks for that configuration	2022-03-03 09:13:42 +09:00
Felipe Galindo Sanchez	27563c4378	Remove unused function (#1173 )	2022-03-03 09:02:47 +09:00
Felipe Galindo Sanchez	4a0f68bfe3	Cleanup extra block in runner controller (#1174 )	2022-03-03 09:01:34 +09:00
Yusuke Kuoka	1917cf90c4	chore: Tweak runner-id annotation name and the annotation prefix to be more consistent	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	0ba3cad6c2	fix: Prefix runner pod related annotation keys with `actions/` to make them distinguishable from other annotations	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	7f0e65cb73	refactor: Extract definitions of various annotation keys and other defaults to their own source	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	12a04b7f38	Fix typo in comment	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	a3072c110d	Prevent runnerset pod unregistration until it gets runner ID This eliminates the race condition that results in the runner terminated prematurely when RunnerSet triggered unregistration of StatefulSet that added just a few seconds ago.	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	15b402bb32	Make RunnerSet much more reliable with or without webhook	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	11be6c1fb6	Prevent runner pod deletion delay when pod disappeared before unregistration	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	59c3288e87	acceptance,e2e: Automate restarts of ARC pods in case image tag is not changed	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	5030e075a9	dockerfile,e2e: Use buildx and cache mounts for faster rebuilds in E2E	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	3115d71471	acceptance,e2e: Enhance deploy.sh to support more types of runnersets	2022-03-02 19:03:20 +09:00
Renovate Bot	c221b6e278	chore(deps): update actions/checkout action to v3	2022-03-02 11:05:16 +09:00
Renovate Bot	a8dbc8a501	fix(deps): update module github.com/prometheus/client_golang to v1.12.1	2022-03-02 10:56:53 +09:00
Renovate Bot	b1ac63683f	fix(deps): update module go.uber.org/zap to v1.21.0	2022-03-02 10:54:35 +09:00
Renovate Bot	10bc28af75	fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.1	2022-03-02 10:52:43 +09:00
Renovate Bot	e23692b3bc	chore(deps): update actions/setup-python action to v3	2022-03-02 10:51:22 +09:00
renovate[bot]	e7f4a0e200	chore(deps): update actions/setup-go action to v3 (#1163 ) Co-authored-by: Renovate Bot <bot@renovateapp.com>	2022-03-02 10:51:01 +09:00
Yusuke Kuoka	828ddcd44e	Merge pull request #1151 from fgalind1/improve-logs logging: improve logs for scaling	2022-03-02 10:46:53 +09:00
Yusuke Kuoka	fc821fd473	Merge pull request #1168 from actions-runner-controller/docs/better-runner-group-description docs: better runner group description	2022-03-02 10:31:22 +09:00
Callum Tait	4b0aa92286	docs: better wording	2022-03-01 08:56:30 +00:00
Callum Tait	c69c8dd84d	docs: better runner group description	2022-03-01 08:54:24 +00:00

1 2 3 4 5 ...

731 Commits All Branches Search

731 Commits

All Branches