actions-runner-controller/controllers
Yusuke Kuoka 9628bb2937
Prevent RemoveRunner spam on busy ephemeral runner scale down (#1204)
Since #1127 and #1167, we had been retrying `RemoveRunner` API call on each graceful runner stop attempt when the runner was still busy.
There was no reliable way to throttle the retry attempts. The combination of these resulted in ARC spamming RemoveRunner calls(one call per reconciliation loop but the loop runs quite often due to how the controller works) when it failed once due to that the runner is in the middle of running a workflow job.

This fixes that, by adding a few short-circuit conditions that would work for ephemeral runners. An ephemeral runner can unregister itself on completion so in most of cases ARC can just wait for the runner to stop if it's already running a job. As a RemoveRunner response of status 422 implies that the runner is running a job, we can use that as a trigger to start the runner stop waiter.

The end result is that 422 errors will be observed at most once per the whole graceful termination process of an ephemeral runner pod. RemoveRunner API calls are never retried for ephemeral runners. ARC consumes less GitHub API rate limit budget and logs are much cleaner than before.

Ref https://github.com/actions-runner-controller/actions-runner-controller/pull/1167#issuecomment-1064213271
2022-03-11 19:03:17 +09:00
..
metrics Clean up import list (#645) 2021-06-22 17:55:06 +09:00
testdata test: Add tests with self-hosted label for #953 (#1030) 2022-01-07 08:50:26 +09:00
autoscaling.go Remove legacy GitHub API cache of HRA.Status.CachedEntries (#1192) 2022-03-08 19:05:43 +09:00
autoscaling_test.go Remove legacy GitHub API cache of HRA.Status.CachedEntries (#1192) 2022-03-08 19:05:43 +09:00
constants.go Prevent RemoveRunner spam on busy ephemeral runner scale down (#1204) 2022-03-11 19:03:17 +09:00
horizontal_runner_autoscaler_webhook.go Merge branch 'master' into improve-logs 2022-02-28 09:25:30 -08:00
horizontal_runner_autoscaler_webhook_on_check_run.go Update go-github from v37 -> v39 (#925) 2021-12-11 21:43:40 +09:00
horizontal_runner_autoscaler_webhook_on_pull_request.go Update go-github from v37 -> v39 (#925) 2021-12-11 21:43:40 +09:00
horizontal_runner_autoscaler_webhook_on_push.go Update go-github from v37 -> v39 (#925) 2021-12-11 21:43:40 +09:00
horizontal_runner_autoscaler_webhook_test.go test: Add tests with self-hosted label for #953 (#1030) 2022-01-07 08:50:26 +09:00
horizontalrunnerautoscaler_controller.go Remove legacy GitHub API cache of HRA.Status.CachedEntries (#1192) 2022-03-08 19:05:43 +09:00
integration_test.go Remove legacy GitHub API cache of HRA.Status.CachedEntries (#1192) 2022-03-08 19:05:43 +09:00
pod_runner_token_injector.go RunnerSet: Automatic-recovery from registration timeout and deregistration on pod termination (#652) 2021-06-24 20:39:37 +09:00
runner_controller.go Prevent RemoveRunner spam on busy ephemeral runner scale down (#1204) 2022-03-11 19:03:17 +09:00
runner_graceful_stop.go Prevent RemoveRunner spam on busy ephemeral runner scale down (#1204) 2022-03-11 19:03:17 +09:00
runner_pod_controller.go refactor: Extract runner pod owner management out of runnerset controller 2022-03-05 12:18:02 +00:00
runner_pod_owner.go Auto-correct replicas number on missing webhook_job completion event (#1180) 2022-03-07 09:35:13 +09:00
runnerdeployment_controller.go Prevent unnecessary ephemeral runner recreations 2022-02-20 13:45:42 +00:00
runnerdeployment_controller_test.go Clean up import list (#645) 2021-06-22 17:55:06 +09:00
runnerreplicaset_controller.go refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet 2022-03-06 05:53:26 +00:00
runnerreplicaset_controller_test.go refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet 2022-03-06 05:53:26 +00:00
runnerset_controller.go refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet 2022-03-06 05:53:26 +00:00
schedule.go Experimental support for ScheduledOverrides (#515) 2021-05-03 23:31:17 +09:00
schedule_test.go Experimental support for ScheduledOverrides (#515) 2021-05-03 23:31:17 +09:00
suite_test.go Clean up import list (#645) 2021-06-22 17:55:06 +09:00
utils.go feat: EKS IAM Roles for Service Accounts for Runner Pods (#226) 2020-12-08 17:56:06 +09:00
utils_test.go feat: EKS IAM Roles for Service Accounts for Runner Pods (#226) 2020-12-08 17:56:06 +09:00