actions-runner-controller/controllers
Tingluo Huang 0b9bef2c08
Try to unconfig runner before deleting the pod to recreate (#1125)
There is a race condition between ARC and GitHub service about deleting runner pod.

- The ARC use REST API to find a particular runner in a pod that is not running any jobs, so it decides to delete the pod.
- A job is queued on the GitHub service side, and it sends the job to this idle runner right before ARC deletes the pod.
- The ARC delete the runner pod which cause the in-progress job to end up canceled.

To avoid this race condition, I am calling `r.unregisterRunner()` before deleting the pod.
- `r.unregisterRunner()` will return 204 to indicate the runner is deleted from the GitHub service, we should be safe to delete the pod.
- `r.unregisterRunner` will return 400 to indicate the runner is still running a job, so we will leave this runner pod as it is.

TODO: I need to do some E2E tests to force the race condition to happen.

Ref #911
2022-02-19 21:22:31 +09:00
..
metrics Clean up import list (#645) 2021-06-22 17:55:06 +09:00
testdata test: Add tests with self-hosted label for #953 (#1030) 2022-01-07 08:50:26 +09:00
autoscaling.go fix: pagination for ListWorkflowJobs in autoscaler (#990) (#992) 2021-12-24 09:12:36 +09:00
autoscaling_test.go Add HRA support for RunnerSet (#647) 2021-06-23 20:25:03 +09:00
horizontal_runner_autoscaler_webhook.go Fix regression that prevented default organizational runner group from being scale target 2022-02-19 14:43:41 +09:00
horizontal_runner_autoscaler_webhook_on_check_run.go Update go-github from v37 -> v39 (#925) 2021-12-11 21:43:40 +09:00
horizontal_runner_autoscaler_webhook_on_pull_request.go Update go-github from v37 -> v39 (#925) 2021-12-11 21:43:40 +09:00
horizontal_runner_autoscaler_webhook_on_push.go Update go-github from v37 -> v39 (#925) 2021-12-11 21:43:40 +09:00
horizontal_runner_autoscaler_webhook_test.go test: Add tests with self-hosted label for #953 (#1030) 2022-01-07 08:50:26 +09:00
horizontalrunnerautoscaler_controller.go fix(deps): update module sigs.k8s.io/controller-runtime to v0.11.0 (#740) 2021-12-17 09:06:55 +09:00
horizontalrunnerautoscaler_controller_test.go Clean up import list (#645) 2021-06-22 17:55:06 +09:00
integration_test.go Option to consider runner group visibility on scale based on webhook (#1062) 2022-02-16 19:08:56 +09:00
pod_runner_token_injector.go RunnerSet: Automatic-recovery from registration timeout and deregistration on pod termination (#652) 2021-06-24 20:39:37 +09:00
runner_controller.go Try to unconfig runner before deleting the pod to recreate (#1125) 2022-02-19 21:22:31 +09:00
runner_pod_controller.go Fix RunerSet managed runner pods to terminate more gracefully (#1126) 2022-02-19 21:19:37 +09:00
runnerdeployment_controller.go Clean up import list (#645) 2021-06-22 17:55:06 +09:00
runnerdeployment_controller_test.go Clean up import list (#645) 2021-06-22 17:55:06 +09:00
runnerreplicaset_controller.go Stop creating registration-only runners on scale-to-zero (#1028) 2022-01-07 09:56:21 +09:00
runnerreplicaset_controller_test.go Stop creating registration-only runners on scale-to-zero (#1028) 2022-01-07 09:56:21 +09:00
runnerset_controller.go Stop creating registration-only runners on scale-to-zero (#1028) 2022-01-07 09:56:21 +09:00
schedule.go Experimental support for ScheduledOverrides (#515) 2021-05-03 23:31:17 +09:00
schedule_test.go Experimental support for ScheduledOverrides (#515) 2021-05-03 23:31:17 +09:00
suite_test.go Clean up import list (#645) 2021-06-22 17:55:06 +09:00
unregister.go Fix RunerSet managed runner pods to terminate more gracefully (#1126) 2022-02-19 21:19:37 +09:00
utils.go feat: EKS IAM Roles for Service Accounts for Runner Pods (#226) 2020-12-08 17:56:06 +09:00
utils_test.go feat: EKS IAM Roles for Service Accounts for Runner Pods (#226) 2020-12-08 17:56:06 +09:00