actions-runner-controller/controllers
Yusuke Kuoka cbbc383a80
Auto-correct replicas number on missing webhook_job completion event (#1180)
While testing #1179, I discovered that ARC sometimes stop resyncing RunnerReplicaSet when the desired replicas is greater than the actual number of runner pods.
This seems to happen when ARC missed receiving a workflow_job completion event but it has no way to decide if it is either (1) something went wrong on ARC or (2) a loadbalancer in the middle or GitHub or anything not ARC went wrong. It needs a standard to decide it, or if it's not impossible, how to deal with it.

In this change, I added a hard-coded 10 minutes timeout(can be made customizable later) to prevent runner pod recreation.
Now, a RunnerReplicaSet/RunnerSet to restart runner pod recreation 10 minutes after the last scale-up. If the workflow completion event arrived after the timeout, it will decrease the desired replicas number that results in the removal of a runner pod. The removed runner pod might be deleted without ever being used, but I think that's better than leaving the desired replicas and the actual number of replicas diverged forever.
2022-03-07 09:35:13 +09:00
..
metrics Clean up import list (#645) 2021-06-22 17:55:06 +09:00
testdata test: Add tests with self-hosted label for #953 (#1030) 2022-01-07 08:50:26 +09:00
autoscaling.go fix: pagination for ListWorkflowJobs in autoscaler (#990) (#992) 2021-12-24 09:12:36 +09:00
autoscaling_test.go Add HRA support for RunnerSet (#647) 2021-06-23 20:25:03 +09:00
constants.go Auto-correct replicas number on missing webhook_job completion event (#1180) 2022-03-07 09:35:13 +09:00
horizontal_runner_autoscaler_webhook.go Merge branch 'master' into improve-logs 2022-02-28 09:25:30 -08:00
horizontal_runner_autoscaler_webhook_on_check_run.go Update go-github from v37 -> v39 (#925) 2021-12-11 21:43:40 +09:00
horizontal_runner_autoscaler_webhook_on_pull_request.go Update go-github from v37 -> v39 (#925) 2021-12-11 21:43:40 +09:00
horizontal_runner_autoscaler_webhook_on_push.go Update go-github from v37 -> v39 (#925) 2021-12-11 21:43:40 +09:00
horizontal_runner_autoscaler_webhook_test.go test: Add tests with self-hosted label for #953 (#1030) 2022-01-07 08:50:26 +09:00
horizontalrunnerautoscaler_controller.go Make RunnerSet much more reliable with or without webhook 2022-03-02 19:03:20 +09:00
horizontalrunnerautoscaler_controller_test.go Clean up import list (#645) 2021-06-22 17:55:06 +09:00
integration_test.go integration: Reduce error logs to ease debugging 2022-03-03 18:47:54 +09:00
pod_runner_token_injector.go RunnerSet: Automatic-recovery from registration timeout and deregistration on pod termination (#652) 2021-06-24 20:39:37 +09:00
runner_controller.go refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet 2022-03-06 05:53:26 +00:00
runner_graceful_stop.go refactor: Extract runner pod owner management out of runnerset controller 2022-03-05 12:18:02 +00:00
runner_pod_controller.go refactor: Extract runner pod owner management out of runnerset controller 2022-03-05 12:18:02 +00:00
runner_pod_owner.go Auto-correct replicas number on missing webhook_job completion event (#1180) 2022-03-07 09:35:13 +09:00
runnerdeployment_controller.go Prevent unnecessary ephemeral runner recreations 2022-02-20 13:45:42 +00:00
runnerdeployment_controller_test.go Clean up import list (#645) 2021-06-22 17:55:06 +09:00
runnerreplicaset_controller.go refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet 2022-03-06 05:53:26 +00:00
runnerreplicaset_controller_test.go refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet 2022-03-06 05:53:26 +00:00
runnerset_controller.go refactor: Make RunnerReplicaSet and Runner backed by the same logic that backs RunnerSet 2022-03-06 05:53:26 +00:00
schedule.go Experimental support for ScheduledOverrides (#515) 2021-05-03 23:31:17 +09:00
schedule_test.go Experimental support for ScheduledOverrides (#515) 2021-05-03 23:31:17 +09:00
suite_test.go Clean up import list (#645) 2021-06-22 17:55:06 +09:00
utils.go feat: EKS IAM Roles for Service Accounts for Runner Pods (#226) 2020-12-08 17:56:06 +09:00
utils_test.go feat: EKS IAM Roles for Service Accounts for Runner Pods (#226) 2020-12-08 17:56:06 +09:00