57 lines
3.7 KiB
Markdown
57 lines
3.7 KiB
Markdown
# ADR 2022-10-27: Lifetime of RunnerScaleSet on Service
|
|
|
|
**Date**: 2022-10-27
|
|
|
|
**Status**: Done
|
|
|
|
## Context
|
|
|
|
We have created the RunnerScaleSet object and APIs around it on the GitHub Actions service for better support of any self-hosted runner auto-scale solution, like [actions-runner-controller](https://github.com/actions-runner-controller/actions-runner-controller).
|
|
|
|
The `RunnerScaleSet` object will represent a set of homogeneous self-hosted runners to the Actions service job routing system.
|
|
|
|
A `RunnerScaleSet` client (ARC) needs to communicate with the Actions service via HTTP long-poll in a certain protocol to get a workflow job successfully landed on one of its homogeneous self-hosted runners.
|
|
|
|
In this ADR, we discuss the following within the context of actions-runner-controller's new scaling mode:
|
|
|
|
- Who and how to create a RunnerScaleSet on the service?
|
|
- Who and how to delete a RunnerScaleSet on the service?
|
|
- What will happen to all the runners and jobs when the deletion happens?
|
|
|
|
## RunnerScaleSet creation
|
|
|
|
- `AutoScalingRunnerSet` custom resource controller will create the `RunnerScaleSet` object in the Actions service on any `AutoScalingRunnerSet` resource deployment.
|
|
- The creation is via REST API on Actions service `POST _apis/runtime/runnerscalesets`
|
|
- The creation needs to use the runner registration token (admin).
|
|
- `RunnerScaleSet.Name` == `AutoScalingRunnerSet.metadata.Name`
|
|
- The created `RunnerScaleSet` will only have 1 label and it's the `RunnerScaleSet`'s name
|
|
- `AutoScalingRunnerSet` controller will store the `RunnerScaleSet.Id` as an annotation on the k8s resource for future lookup.
|
|
|
|
## RunnerScaleSet modification
|
|
|
|
- When the user patch existing `AutoScalingRunnerSet`'s RunnerScaleSet related properly, ex: `runnerGroupName`, `runnerWorkDir`, the controller needs to make an HTTP PATCH call to the `_apis/runtime/runnerscalesets/2` endpoint in order to update the object on the service.
|
|
- We will put the deployed `AutoScalingRunnerSet` resource in an error state when the user tries to patch the resource with a different `githubConfigUrl`
|
|
> Basically, you can't move a deployed `AutoScalingRunnerSet` across GitHub entity, repoA->repoB, repoA->OrgC, etc.
|
|
> We evaluated blocking the change before instead of erroring at runtime and that we decided not to go down this route because it forces us to re-introduce admission webhooks (require cert-manager).
|
|
|
|
## RunnerScaleSet deletion
|
|
|
|
- `AutoScalingRunnerSet` custom resource controller will delete the `RunnerScaleSet` object in the Actions service on any `AutoScalingRunnerSet` resource deletion.
|
|
> `AutoScalingRunnerSet` deletion will contain several steps:
|
|
>
|
|
> - Stop the listener app so no more new jobs coming and no more scaling up/down.
|
|
> - Request scale down to 0
|
|
> - Force stop all runners
|
|
> - Wait for the scale down to 0
|
|
> - Delete the `RunnerScaleSet` object from service via REST API
|
|
- The deletion is via REST API on Actions service `DELETE _apis/runtime/runnerscalesets/1`
|
|
- The deletion needs to use the runner registration token (admin).
|
|
|
|
The user's `RunnerScaleSet` will be deleted from the service by `DormantRunnerScaleSetCleanupJob` if the particular `AutoScalingRunnerSet` has not connected to the service for the past 7 days. We have a similar rule for self-hosted runners.
|
|
|
|
## Jobs and Runners on deletion
|
|
|
|
- `RunnerScaleSet` deletion will be blocked if there is any job assigned to a runner within the `RunnerScaleSet`, which has to scale down to 0 before deletion.
|
|
- Any job that has been assigned to the `RunnerScaleSet` but hasn't been assigned to a runner within the `RunnerScaleSet` will get thrown back to the queue and wait for assignment again.
|
|
- Any offline runners within the `RunnerScaleSet` will be deleted from the service side.
|