actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Yusuke Kuoka	15ee6d6360	chore: Reorganize "Calculated desired replicas log fields (#1190 ) So that `max` is emitted immediately after `min`, which is the counterpart of it.	2022-03-08 10:29:53 +09:00
Felipe Galindo Sanchez	5b899f578b	fix(chart): allow to use basic auth when authSecret.create is false (#1149 ) * fix(chart): allow to use basic auth when authSecret.create is false When secret is created outside of the ARC chart using authSecret.create=false and basicAuth, the controller fails as we're not including the basic password as environment variable as the password value won't be inside the helm values. This PR includes both environment variables for consistent regardless if those are set or not similar as the rest of the other auth options (e.g app_id, private key, etc) * chart: Add back the conditional block for .Values.authSecret.github_basicauth_username Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-03-07 10:07:24 +09:00
Yusuke Kuoka	d8c9eb7ba7	Fix arm64 image (#1185 ) Fixes #1184	2022-03-07 10:00:20 +09:00
Yusuke Kuoka	cbbc383a80	Auto-correct replicas number on missing webhook_job completion event (#1180 ) While testing #1179, I discovered that ARC sometimes stop resyncing RunnerReplicaSet when the desired replicas is greater than the actual number of runner pods. This seems to happen when ARC missed receiving a workflow_job completion event but it has no way to decide if it is either (1) something went wrong on ARC or (2) a loadbalancer in the middle or GitHub or anything not ARC went wrong. It needs a standard to decide it, or if it's not impossible, how to deal with it. In this change, I added a hard-coded 10 minutes timeout(can be made customizable later) to prevent runner pod recreation. Now, a RunnerReplicaSet/RunnerSet to restart runner pod recreation 10 minutes after the last scale-up. If the workflow completion event arrived after the timeout, it will decrease the desired replicas number that results in the removal of a runner pod. The removed runner pod might be deleted without ever being used, but I think that's better than leaving the desired replicas and the actual number of replicas diverged forever.	2022-03-07 09:35:13 +09:00
seplak	b57e885a73	Fix service account typo in Helm README (#1183 ) Just fixing a typo I discovered while reading through the README.	2022-03-07 08:39:01 +09:00
Yusuke Kuoka	bed927052d	Merge pull request #1179 from actions-runner-controller/refactor-runner-and-runnerset Refactor Runner and RunnerSet so that they use the same library code that powers RunnerSet. RunnerSet is StatefulSet-based and RunnerSet/Runner is Pod-based so it had been hard to unify the implementation although they look very similar in many aspects. This change finally resolves that issue, by first introducing a library that implements the generic logic that is used to reconcile RunnerSet, then adding an adapter that can be used to let the generic logic manage runner pods via Runner, instead of via StatefulSet. Follow-up for #1127, #1167, and 1178	2022-03-06 15:56:51 +09:00
Yusuke Kuoka	14a878bfae	refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet	2022-03-06 05:53:26 +00:00
Yusuke Kuoka	c95e84a528	refactor: Extract runner pod owner management out of runnerset controller so that it can potentially be reusable from runnerreplicaset controller	2022-03-05 12:18:02 +00:00
Yusuke Kuoka	95a5770d55	Fix regression that registration-timeout check was not working for runnerset (#1178 ) Follow-up for #1167	2022-03-05 19:31:05 +09:00
Yusuke Kuoka	9cc9f8c182	chore: Add a few comments to runnerset and runnerpod controllers to help potential contributors	2022-03-05 05:41:56 +00:00
Yusuke Kuoka	b7c5611516	dockerfile: Fix unintended removal of CGO_ENABLED=0	2022-03-05 05:41:56 +00:00
Yusuke Kuoka	138e326705	chore: Add comment on lastSyncTime in runnerset controller	2022-03-05 05:41:56 +00:00
Renovate Bot	c21fa75afa	fix(deps): update kubernetes packages to v0.23.4	2022-03-04 08:39:18 +09:00
Yusuke Kuoka	34483e268f	ci: Enable actions/cache for Go modules	2022-03-03 18:47:54 +09:00
Yusuke Kuoka	5f2b5327f7	integration: Reduce error logs to ease debugging	2022-03-03 18:47:54 +09:00
renovate[bot]	a93b2fdad4	fix(deps): update golang.org/x/oauth2 commit hash to ee48083 (#1150 ) fix(deps): update golang.org/x/oauth2 commit hash to ee48083 Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com> Co-authored-by: Renovate Bot <bot@renovateapp.com>	2022-03-03 18:00:43 +09:00
Yusuke Kuoka	25570a0c6d	Fix docker build	2022-03-03 02:05:38 +00:00
Felipe Galindo Sanchez	d20ad71071	Fix minor log in runner controller (#1175 ) Log is mentioning registration only but this is about the standard runner pod	2022-03-03 09:51:30 +09:00
Daniel	8a379ac94b	Add custom volume mount documentation (#1045 ) one example for in-memory and one example for NVME backed storage, also pointing out all the current flaws/risks for that configuration	2022-03-03 09:13:42 +09:00
Felipe Galindo Sanchez	27563c4378	Remove unused function (#1173 )	2022-03-03 09:02:47 +09:00
Felipe Galindo Sanchez	4a0f68bfe3	Cleanup extra block in runner controller (#1174 )	2022-03-03 09:01:34 +09:00
Yusuke Kuoka	1917cf90c4	chore: Tweak runner-id annotation name and the annotation prefix to be more consistent	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	0ba3cad6c2	fix: Prefix runner pod related annotation keys with `actions/` to make them distinguishable from other annotations	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	7f0e65cb73	refactor: Extract definitions of various annotation keys and other defaults to their own source	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	12a04b7f38	Fix typo in comment	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	a3072c110d	Prevent runnerset pod unregistration until it gets runner ID This eliminates the race condition that results in the runner terminated prematurely when RunnerSet triggered unregistration of StatefulSet that added just a few seconds ago.	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	15b402bb32	Make RunnerSet much more reliable with or without webhook	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	11be6c1fb6	Prevent runner pod deletion delay when pod disappeared before unregistration	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	59c3288e87	acceptance,e2e: Automate restarts of ARC pods in case image tag is not changed	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	5030e075a9	dockerfile,e2e: Use buildx and cache mounts for faster rebuilds in E2E	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	3115d71471	acceptance,e2e: Enhance deploy.sh to support more types of runnersets	2022-03-02 19:03:20 +09:00
Renovate Bot	c221b6e278	chore(deps): update actions/checkout action to v3	2022-03-02 11:05:16 +09:00
Renovate Bot	a8dbc8a501	fix(deps): update module github.com/prometheus/client_golang to v1.12.1	2022-03-02 10:56:53 +09:00
Renovate Bot	b1ac63683f	fix(deps): update module go.uber.org/zap to v1.21.0	2022-03-02 10:54:35 +09:00
Renovate Bot	10bc28af75	fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.1	2022-03-02 10:52:43 +09:00
Renovate Bot	e23692b3bc	chore(deps): update actions/setup-python action to v3	2022-03-02 10:51:22 +09:00
renovate[bot]	e7f4a0e200	chore(deps): update actions/setup-go action to v3 (#1163 ) Co-authored-by: Renovate Bot <bot@renovateapp.com>	2022-03-02 10:51:01 +09:00
Yusuke Kuoka	828ddcd44e	Merge pull request #1151 from fgalind1/improve-logs logging: improve logs for scaling	2022-03-02 10:46:53 +09:00
Yusuke Kuoka	fc821fd473	Merge pull request #1168 from actions-runner-controller/docs/better-runner-group-description docs: better runner group description	2022-03-02 10:31:22 +09:00
Callum Tait	4b0aa92286	docs: better wording	2022-03-01 08:56:30 +00:00
Callum Tait	c69c8dd84d	docs: better runner group description	2022-03-01 08:54:24 +00:00
Renovate Bot	e42db00006	chore(deps): update dependency actions/runner to v2.288.1	2022-02-28 22:30:10 +00:00
Felipe Galindo Sanchez	eff0c7364f	Merge branch 'master' into improve-logs	2022-02-28 09:25:30 -08:00
Tingluo Huang	516695b275	Set UserAgent to 'actions-runner-controller' for all Http Client. (#1140 ) I can't find any requests made by user agent `actions-runner-controller` in GitHub.com's telemetry in the past 7 days. Turns out we only set user agent `actions-runner-controller` if we are configured to use BasicAuth which is not the case for most customers I think. I update the code a little bit to make sure it always set `actions-runner-controller` as UserAgent for the GitHub HttpClient in ARC. A further step would be somehow baking the ARC release version into the UserAgent as well.	2022-02-28 09:17:58 +09:00
Yusuke Kuoka	686d40c20d	Merge pull request #1127 from actions-runner-controller/github-api-cache Enhances ARC(both the controller-manager and github-webhook-server) to cache any GitHub API responses with HTTP GET and an appropriate Cache-Control header. Ref #920 ## Cache Implementation `gregjones/httpcache` has been chosen as a library to implement this feature, as it is as recommended in `go-github`'s documentation: https://github.com/google/go-github#conditional-requests `gregjones/httpcache` supports a number of cache backends like `diskcache`, `s3cache`, and so on: https://github.com/gregjones/httpcache#cache-backends We stick to the built-in in-memory cache as a starter. Probably this will never becomes an issue as long as various HTTP responses for all the GitHub API calls that ARC makes, list-runners, list-workflow-jobs, list-runner-groups, etc., doesn't overflow the in-memory cache. `httpcache` has an known unfixed issue that it doesn't update cache on chunked responses. But we assume that the APIs that we call doesn't use chunked responses. See #1503 for more information on that. ## Ephemeral runner pods are no longer recreated The addition of the cache layer resulted in a slow down of a scale-down process and a trade-off between making the runner pod termination process fragile to various race conditions(shorter grace period before runner deletion) or delaying runner pod deletion depending on how long the grace period is(longer grace period). A grace period needs to be at least longer than 60s (which is the same as cache duration of ListRunners API) to not prematurely delete a runner pod that was just created. But once I disabled automatic recreation of ephemeral runner pod, it turned out to be no more of an issue when it's being scaled via workflow_job webhook. Ephemeral runner resources are still automatically added on demand by RunnerDeployment via RunnerReplicaSet(I've added `EffectiveTime` fields to our CRDs but that's an implementation detail so let's omit). A good side-effect of disabling ephemeral runner pod recreations is that ARC will no longer create redundant ephemeral runners when used with webhook-based autoscaler. Basically, autoscaling still works as everyone might expect. It's just better than before overall.	2022-02-28 08:37:26 +09:00
Renovate Bot	f0fa99fc53	chore(deps): update dependency actions/runner to v2.288.0	2022-02-26 01:34:49 +00:00
Javier Sotelo	6b12413fdd	Add optional hostNetwork (#1035 ) Co-authored-by: jsotelo <javier.sotelo@viasat.com>	2022-02-23 20:11:40 +00:00
Felipe Galindo Sanchez	3abecd0f19	logging: improve logs for scaling	2022-02-23 08:29:13 -08:00
Callum Tait	7156ce040e	chore: bump chart (#1138 )	2022-02-21 09:24:14 +00:00
Yusuke Kuoka	1463d4927f	acceptance,e2e: Let capacity reservation expired more later	2022-02-21 00:07:49 +00:00

... 14 15 16 17 18 ...

1472 Commits All Branches Search

1472 Commits

All Branches