3.7 KiB
		
	
	
	
	
	
			
		
		
	
	ADR 2022-10-27: Lifetime of RunnerScaleSet on Service
Date: 2022-10-27
Status: Done
Context
We have created the RunnerScaleSet object and APIs around it on the GitHub Actions service for better support of any self-hosted runner auto-scale solution, like actions-runner-controller.
The RunnerScaleSet object will represent a set of homogeneous self-hosted runners to the Actions service job routing system.
A RunnerScaleSet client (ARC) needs to communicate with the Actions service via HTTP long-poll in a certain protocol to get a workflow job successfully landed on one of its homogeneous self-hosted runners.
In this ADR, we discuss the following within the context of actions-runner-controller's new scaling mode:
- Who and how to create a RunnerScaleSet on the service?
- Who and how to delete a RunnerScaleSet on the service?
- What will happen to all the runners and jobs when the deletion happens?
RunnerScaleSet creation
- AutoScalingRunnerSetcustom resource controller will create the- RunnerScaleSetobject in the Actions service on any- AutoScalingRunnerSetresource deployment.
- The creation is via REST API on Actions service POST _apis/runtime/runnerscalesets
- The creation needs to use the runner registration token (admin).
- RunnerScaleSet.Name==- AutoScalingRunnerSet.metadata.Name
- The created RunnerScaleSetwill only have 1 label and it's theRunnerScaleSet's name
- AutoScalingRunnerSetcontroller will store the- RunnerScaleSet.Idas an annotation on the k8s resource for future lookup.
RunnerScaleSet modification
- When the user patch existing AutoScalingRunnerSet's RunnerScaleSet related properly, ex:runnerGroupName,runnerWorkDir, the controller needs to make an HTTP PATCH call to the_apis/runtime/runnerscalesets/2endpoint in order to update the object on the service.
- We will put the deployed AutoScalingRunnerSetresource in an error state when the user tries to patch the resource with a differentgithubConfigUrlBasically, you can't move a deployed AutoScalingRunnerSetacross GitHub entity, repoA->repoB, repoA->OrgC, etc. We evaluated blocking the change before instead of erroring at runtime and that we decided not to go down this route because it forces us to re-introduce admission webhooks (require cert-manager).
RunnerScaleSet deletion
- AutoScalingRunnerSetcustom resource controller will delete the- RunnerScaleSetobject in the Actions service on any- AutoScalingRunnerSetresource deletion.- AutoScalingRunnerSetdeletion will contain several steps:- Stop the listener app so no more new jobs coming and no more scaling up/down.
- Request scale down to 0
- Force stop all runners
- Wait for the scale down to 0
- Delete the RunnerScaleSetobject from service via REST API
 
- The deletion is via REST API on Actions service DELETE _apis/runtime/runnerscalesets/1
- The deletion needs to use the runner registration token (admin).
The user's RunnerScaleSet will be deleted from the service by DormantRunnerScaleSetCleanupJob if the particular AutoScalingRunnerSet has not connected to the service for the past 7 days. We have a similar rule for self-hosted runners.
Jobs and Runners on deletion
- RunnerScaleSetdeletion will be blocked if there is any job assigned to a runner within the- RunnerScaleSet, which has to scale down to 0 before deletion.
- Any job that has been assigned to the RunnerScaleSetbut hasn't been assigned to a runner within theRunnerScaleSetwill get thrown back to the queue and wait for assignment again.
- Any offline runners within the RunnerScaleSetwill be deleted from the service side.