actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Yusuke Kuoka	cbbc383a80	Auto-correct replicas number on missing webhook_job completion event (#1180 ) While testing #1179, I discovered that ARC sometimes stop resyncing RunnerReplicaSet when the desired replicas is greater than the actual number of runner pods. This seems to happen when ARC missed receiving a workflow_job completion event but it has no way to decide if it is either (1) something went wrong on ARC or (2) a loadbalancer in the middle or GitHub or anything not ARC went wrong. It needs a standard to decide it, or if it's not impossible, how to deal with it. In this change, I added a hard-coded 10 minutes timeout(can be made customizable later) to prevent runner pod recreation. Now, a RunnerReplicaSet/RunnerSet to restart runner pod recreation 10 minutes after the last scale-up. If the workflow completion event arrived after the timeout, it will decrease the desired replicas number that results in the removal of a runner pod. The removed runner pod might be deleted without ever being used, but I think that's better than leaving the desired replicas and the actual number of replicas diverged forever.	2022-03-07 09:35:13 +09:00
Yusuke Kuoka	95a5770d55	Fix regression that registration-timeout check was not working for runnerset (#1178 ) Follow-up for #1167	2022-03-05 19:31:05 +09:00
Yusuke Kuoka	1917cf90c4	chore: Tweak runner-id annotation name and the annotation prefix to be more consistent	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	0ba3cad6c2	fix: Prefix runner pod related annotation keys with `actions/` to make them distinguishable from other annotations	2022-03-02 19:03:20 +09:00
Yusuke Kuoka	7f0e65cb73	refactor: Extract definitions of various annotation keys and other defaults to their own source	2022-03-02 19:03:20 +09:00

5 Commits