Introduce ADR change for adding labels to our resources (#2407)

Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-03-16 16:02:42 +01:00 · 2023-03-16 16:02:42 +01:00 · e5d8d65396
parent c465ace8fb
commit e5d8d65396
12 changed files with 208 additions and 83 deletions
--- a/adrs/2022-10-17-runner-image.md
+++ b/adrs/2022-10-17-runner-image.md
@ -1,4 +1,5 @@
-# ADR 0001: Produce the runner image for the scaleset client
+# ADR 2022-10-17: Produce the runner image for the scaleset client
+
 **Date**: 2022-10-17

 **Status**: Done
@ -7,6 +8,7 @@

 We aim to provide an similar experience (as close as possible) between self-hosted and GitHub-hosted runners. To achieve this, we are making the following changes to align our self-hosted runner container image with the Ubuntu runners managed by GitHub.
 Here are the changes:
+
 - We created a USER `runner(1001)` and a GROUP `docker(123)`
 - `sudo` has been on the image and the `runner` will be a passwordless sudoer.
 - The runner binary was placed placed under `/home/runner/` and launched using `/home/runner/run.sh`
@ -18,31 +20,33 @@ The latest Dockerfile can be found at: https://github.com/actions/runner/blob/ma

 # Context

-user can bring their own runner images, the contract we have are:
- It must have a runner binary under /actions-runner (/actions-runner/run.sh exists)
- The WORKDIR is set to /actions-runner
- If the user inside the container is root, the ENV RUNNER_ALLOW_RUNASROOT should be set to 1
+users can bring their own runner images, the contract we require is:

-The existing ARC runner images will not work with the new ARC mode out-of-box for the following reason:
+- It must have a runner binary under `/actions-runner` i.e. `/actions-runner/run.sh` exists
+- The `WORKDIR` is set to `/actions-runner`
+- If the user inside the container is root, the environment variable `RUNNER_ALLOW_RUNASROOT` should be set to `1`

- The current runner image requires caller to pass runner configure info, ex: URL and Config Token
- The current runner image has the runner binary under /runner
+The existing [ARC runner images](https://github.com/orgs/actions-runner-controller/packages?tab=packages&q=actions-runner) will not work with the new ARC mode out-of-box for the following reason:
+
+- The current runner image requires the caller to pass runner configuration info, ex: URL and Config Token
+- The current runner image has the runner binary under `/runner` which violates the contract described above
 - The current runner image requires a special entrypoint script in order to work around some volume mount limitation for setting up DinD.

-However, since we expose the raw runner Pod spec to our user, advanced user can modify the helm values.yaml to make everything lines up properly.
+Since we expose the raw runner PodSpec to our end users, they can modify the helm `values.yaml` to adjust the runner container to their needs.

 # Guiding Principles

 - Build image is separated in two stages.

 ## The first stage (build)
+
 - Reuses the same base image, so it is faster to build.
- Installs utilities needed to download assets (runner and runner-container-hooks).
+- Installs utilities needed to download assets (`runner` and `runner-container-hooks`).
 - Downloads the runner and stores it into `/actions-runner` directory.
 - Downloads the runner-container-hooks and stores it into `/actions-runner/k8s` directory.
 - You can use build arguments to control the runner version, the target platform and runner container hooks version.

-Preview:
+Preview (the published runner image might vary):

 ```Dockerfile
 FROM mcr.microsoft.com/dotnet/runtime-deps:6.0 as build
@ -64,6 +68,7 @@ RUN curl -f -L -o runner-container-hooks.zip https://github.com/actions/runner-c
 ```

 ## The main image:
+
 - Copies assets from the build stage to `/actions-runner`
 - Does not provide an entrypoint. The entrypoint should be set within the container definition.

@ -77,6 +82,7 @@ COPY --from=build /actions-runner .
 ```

 ## Example of pod spec with the init container copying assets
+
 ```yaml
 apiVersion: v1
 kind: Pod
@ -84,20 +90,20 @@ metadata:
  name: <name>
 spec:
  containers:
-  - name: runner
-    image: <image>
-    command: ["/runner/run.sh"]
-    volumeMounts:
    - name: runner
-      mountPath: /runner
+      image: <image>
+      command: ["/runner/run.sh"]
+      volumeMounts:
+        - name: runner
+          mountPath: /runner
  initContainers:
-  - name: setup
-    image: <image> 
-    command: ["sh", "-c", "cp -r /actions-runner/* /runner/"]
-    volumeMounts:
-    - name: runner
-      mountPath: /runner
+    - name: setup
+      image: <image>
+      command: ["sh", "-c", "cp -r /actions-runner/* /runner/"]
+      volumeMounts:
+        - name: runner
+          mountPath: /runner
  volumes:
-  - name: runner
-    emptyDir: {}
+    - name: runner
+      emptyDir: {}
 ```
--- a/adrs/2022-10-27-runnerscaleset-lifetime.md
+++ b/adrs/2022-10-27-runnerscaleset-lifetime.md
@ -1,4 +1,4 @@
-# ADR 0003: Lifetime of RunnerScaleSet on Service
+# ADR 2022-10-27: Lifetime of RunnerScaleSet on Service

 **Date**: 2022-10-27

@ -12,8 +12,9 @@ The `RunnerScaleSet` object will represent a set of homogeneous self-hosted runn

 A `RunnerScaleSet` client (ARC) needs to communicate with the Actions service via HTTP long-poll in a certain protocol to get a workflow job successfully landed on one of its homogeneous self-hosted runners.

-In this ADR, I want to discuss the following within the context of actions-runner-controller's new scaling mode:
- Who and how to create a RunnerScaleSet on the service? 
+In this ADR, we discuss the following within the context of actions-runner-controller's new scaling mode:
+
+- Who and how to create a RunnerScaleSet on the service?
 - Who and how to delete a RunnerScaleSet on the service?
 - What will happen to all the runners and jobs when the deletion happens?

@ -30,18 +31,19 @@ In this ADR, I want to discuss the following within the context of actions-runne

 - When the user patch existing `AutoScalingRunnerSet`'s RunnerScaleSet related properly, ex: `runnerGroupName`, `runnerWorkDir`, the controller needs to make an HTTP PATCH call to the `_apis/runtime/runnerscalesets/2` endpoint in order to update the object on the service.
 - We will put the deployed `AutoScalingRunnerSet` resource in an error state when the user tries to patch the resource with a different `githubConfigUrl`
-> Basically, you can't move a deployed `AutoScalingRunnerSet` across GitHub entity, repoA->repoB, repoA->OrgC, etc.
-> We evaluated blocking the change before instead of erroring at runtime and that we decided not to go down this route because it forces us to re-introduce admission webhooks (require cert-manager).
+  > Basically, you can't move a deployed `AutoScalingRunnerSet` across GitHub entity, repoA->repoB, repoA->OrgC, etc.
+  > We evaluated blocking the change before instead of erroring at runtime and that we decided not to go down this route because it forces us to re-introduce admission webhooks (require cert-manager).

 ## RunnerScaleSet deletion

 - `AutoScalingRunnerSet` custom resource controller will delete the `RunnerScaleSet` object in the Actions service on any `AutoScalingRunnerSet` resource deletion.
-> `AutoScalingRunnerSet` deletion will contain several steps:
-> - Stop the listener app so no more new jobs coming and no more scaling up/down.
-> - Request scale down to 0
-> - Force stop all runners
-> - Wait for the scale down to 0
-> - Delete the `RunnerScaleSet` object from service via REST API
+  > `AutoScalingRunnerSet` deletion will contain several steps:
+  >
+  > - Stop the listener app so no more new jobs coming and no more scaling up/down.
+  > - Request scale down to 0
+  > - Force stop all runners
+  > - Wait for the scale down to 0
+  > - Delete the `RunnerScaleSet` object from service via REST API
 - The deletion is via REST API on Actions service `DELETE _apis/runtime/runnerscalesets/1`
 - The deletion needs to use the runner registration token (admin).

--- a/adrs/2022-11-04-crd-api-group-name.md
+++ b/adrs/2022-11-04-crd-api-group-name.md
@ -1,4 +1,5 @@
-# ADR 0004: Technical detail about actions-runner-controller repository transfer
+# ADR 2022-11-04: Technical detail about actions-runner-controller repository transfer
+
 **Date**: 2022-11-04

 **Status**: Done
@ -8,16 +9,17 @@
 As part of ARC Private Beta: Repository Migration & Open Sourcing Process, we have decided to transfer the current [actions-runner-controller repository](https://github.com/actions-runner-controller/actions-runner-controller) into the [Actions org](https://github.com/actions).

 **Goals:**
+
 - A clear signal that GitHub will start taking over ARC and provide support.
 - Since we are going to deprecate the existing auto-scale mode in ARC at some point, we want to have a clear separation between the legacy mode (not supported) and the new mode (supported).
- Avoid disrupting users as much as we can, existing ARC users will not notice any difference after the repository transfer, they can keep upgrading to the newer version of ARC and keep using the legacy mode. 
+- Avoid disrupting users as much as we can, existing ARC users will not notice any difference after the repository transfer, they can keep upgrading to the newer version of ARC and keep using the legacy mode.

 **Challenges**
- The original creator's name (`summerwind`) is all over the place, including some critical parts of ARC:
-    - The k8s user resource API's full name is `actions.summerwind.dev/v1alpha1/RunnerDeployment`, renaming it to `actions.github.com` is a breaking change and will force the user to rebuild their entire k8s cluster. 
-    - All docker images around ARC (controller + default runner) is published to [dockerhub/summerwind](https://hub.docker.com/u/summerwind)
- The helm chart for ARC is currently hosted on [GitHub pages](https://actions-runner-controller.github.io/actions-runner-controller) for https://github.com/actions-runner-controller/actions-runner-controller, moving the repository means we will break users who install ARC via the helm chart

+- The original creator's name (`summerwind`) is all over the place, including some critical parts of ARC:
+  - The k8s user resource API's full name is `actions.summerwind.dev/v1alpha1/RunnerDeployment`, renaming it to `actions.github.com` is a breaking change and will force the user to rebuild their entire k8s cluster.
+  - All docker images around ARC (controller + default runner) is published to [dockerhub/summerwind](https://hub.docker.com/u/summerwind)
+- The helm chart for ARC is currently hosted on [GitHub pages](https://actions-runner-controller.github.io/actions-runner-controller) for https://github.com/actions-runner-controller/actions-runner-controller, moving the repository means we will break users who install ARC via the helm chart

 # Decisions

@ -27,8 +29,9 @@ As part of ARC Private Beta: Repository Migration & Open Sourcing Process, we ha
 - For any new resource API we are going to add, those will be named properly under GitHub, ex: `actions.github.com/v1alpha1/AutoScalingRunnerSet`

 Benefits:
+
 - A clear separation from existing ARC:
-    - Easy for the support engineer to triage income tickets and figure out whether we need to support the use case from the user
+  - Easy for the support engineer to triage income tickets and figure out whether we need to support the use case from the user
 - We won't break existing users when they upgrade to a newer version of ARC after the repository transfer

 Based on the spike done by `@nikola-jokic`, we have confidence that we can host multiple resources with different API names under the same repository, and the published ARC controller can handle both resources properly.
--- a/adrs/2022-12-05-adding-labels-k8s-resources.md
+++ b/adrs/2022-12-05-adding-labels-k8s-resources.md
@ -1,8 +1,8 @@
-# ADR 0007: Adding labels to our resources
+# ADR 2022-12-05: Adding labels to our resources

 **Date**: 2022-12-05

-**Status**: Done
+**Status**: Deprecated [^1]

 ## Context

@ -20,12 +20,15 @@ Assuming standard logging that would allow us to get all ARC logs by running
 ```bash
 kubectl logs -l 'app.kubernetes.io/part-of=actions-runner-controller'
 ```
+
 which would be very useful for development to begin with.

 The proposal is to add these sets of labels to the pods ARC creates:

 #### controller-manager
+
 Labels to be set by the Helm chart:
+
 ```yaml
 metadata:
  labels:
@ -35,7 +38,9 @@ metadata:
 ```

 #### Listener
+
 Labels to be set by controller at creation:
+
 ```yaml
 metadata:
  labels:
@ -43,7 +48,7 @@ metadata:
    app.kubernetes.io/component: runner-scale-set-listener
    app.kubernetes.io/version: "x.x.x"
    actions.github.com/scale-set-name: scale-set-name # this corresponds to metadata.name as set for AutoscalingRunnerSet
-    
+
    # the following labels are to be extracted by the config URL
    actions.github.com/enterprise: enterprise
    actions.github.com/organization: organization
@ -51,7 +56,9 @@ metadata:
 ```

 #### Runner
+
 Labels to be set by controller at creation:
+
 ```yaml
 metadata:
  labels:
@ -78,3 +85,5 @@ Or for example if they're having problems specifically with runners:

 This way users don't have to understand ARC moving parts but we still have a
 way to target them specifically if we need to.
+
+[^1]: Superseded by [ADR 2023-04-14](2023-04-14-adding-labels-k8s-resources.md)
--- a/adrs/2022-12-27-pick-the-right-runner-to-scale-down.md
+++ b/adrs/2022-12-27-pick-the-right-runner-to-scale-down.md
@ -1,4 +1,5 @@
-# ADR 0008: Pick the right runner to scale down
+# ADR 2022-12-27: Pick the right runner to scale down
+
 **Date**: 2022-12-27

 **Status**: Done
@ -7,35 +8,37 @@

 - A custom resource `EphemeralRunnerSet` manage a set of custom resource `EphemeralRunners`
 - The `EphemeralRunnerSet` has `Replicas` in its `Spec`, and the responsibility of the `EphemeralRunnerSet_controller` is to reconcile a given `EphemeralRunnerSet` to have
- the same amount of `EphemeralRunners` as the `Spec.Replicas` defined.
- - This means the `EphemeralRunnerSet_controller` will scale up the `EphemeralRunnerSet` by creating more `EphemeralRunner` in the case of the `Spec.Replicas` is higher than
- the current amount of `EphemeralRunners`.
- - This also means the `EphemeralRunnerSet_controller` will scale down the `EphemeralRunnerSet` by finding some existing `EphemeralRunner` to delete in the case of
+  the same amount of `EphemeralRunners` as the `Spec.Replicas` defined.
+- This means the `EphemeralRunnerSet_controller` will scale up the `EphemeralRunnerSet` by creating more `EphemeralRunner` in the case of the `Spec.Replicas` is higher than
+  the current amount of `EphemeralRunners`.
+- This also means the `EphemeralRunnerSet_controller` will scale down the `EphemeralRunnerSet` by finding some existing `EphemeralRunner` to delete in the case of
  the `Spec.Replicas` is less than the current amount of `EphemeralRunners`.
- 
- This ADR is about how can we find the right existing `EphemeralRunner` to delete when we need to scale down.
- 
- 
- ## Current approach
- 
+
+This ADR is about how can we find the right existing `EphemeralRunner` to delete when we need to scale down.
+
+## Current approach
+
 1. `EphemeralRunnerSet_controller` figure out how many `EphemeralRunner` it needs to delete, ex: need to scale down from 10 to 2 means we need to delete 8 `EphemeralRunner`

 2. `EphemeralRunnerSet_controller` find all `EphemeralRunner` that is in the `Running` or `Pending` phase.
-    > `Pending` means the `EphemeralRunner` is still probably creating and a runner has not yet configured with the Actions service.
-    > `Running` means the `EphemeralRunner` is created and a runner has probably configured with Actions service, the runner may sit there idle,
-    > or maybe actively running a workflow job. We don't have a clear answer for it from the ARC side. (Actions service knows it for sure)
+
+   > `Pending` means the `EphemeralRunner` is still probably creating and a runner has not yet configured with the Actions service.
+   > `Running` means the `EphemeralRunner` is created and a runner has probably configured with Actions service, the runner may sit there idle,
+   > or maybe actively running a workflow job. We don't have a clear answer for it from the ARC side. (Actions service knows it for sure)

 3. `EphemeralRunnerSet_controller` make an HTTP DELETE request to the Actions service for each `EphemeralRunner` from the previous step and ask the Actions service to delete the runner via `RunnerId`.
-(The `RunnerId` is generated after the runner registered with the Actions service, and stored on the `EphemeralRunner.Status.RunnerId`)
-  > - The HTTP DELETE request looks like the following:  
-  > `DELETE https://pipelines.actions.githubusercontent.com/WoxlUxJHrKEzIp4Nz3YmrmLlZBonrmj9xCJ1lrzcJ9ZsD1Tnw7/_apis/distributedtask/pools/0/agents/1024`
-  > The Actions service will return 2 types of responses:
-  >  1. 204 (No Content): The runner with Id 1024 has been successfully removed from the service or the runner with Id 1024 doesn't exist.
-  >  2. 400 (Bad Request) with JSON body that contains an error message like `JobStillRunningException`: The service can't remove this runner at this point since it has been
-  >  assigned to a job request, the client won't be able to remove the runner until the runner finishes its current assigned job request.
+   (The `RunnerId` is generated after the runner registered with the Actions service, and stored on the `EphemeralRunner.Status.RunnerId`)

-4. `EphemeralRunnerSet_controller` will ignore any deletion error from runners that are still running a job, and keep trying deletion until the amount of `204` equals the amount of 
-`EphemeralRunner` needs to delete.
+   > - The HTTP DELETE request looks like the following:
+   >   `DELETE https://pipelines.actions.githubusercontent.com/WoxlUxJHrKEzIp4Nz3YmrmLlZBonrmj9xCJ1lrzcJ9ZsD1Tnw7/_apis/distributedtask/pools/0/agents/1024`
+   >   The Actions service will return 2 types of responses:
+   >
+   > 1. 204 (No Content): The runner with Id 1024 has been successfully removed from the service or the runner with Id 1024 doesn't exist.
+   > 2. 400 (Bad Request) with JSON body that contains an error message like `JobStillRunningException`: The service can't remove this runner at this point since it has been
+   >    assigned to a job request, the client won't be able to remove the runner until the runner finishes its current assigned job request.
+
+4. `EphemeralRunnerSet_controller` will ignore any deletion error from runners that are still running a job, and keep trying deletion until the amount of `204` equals the amount of
+   `EphemeralRunner` needs to delete.

 ## The problem with the current approach

@ -68,6 +71,7 @@ this would be a big `NO` from a security point of view since we may not trust th
 The nature of the k8s controller-runtime means we might reconcile the resource base on stale cache data.

 I think our goal for the solution should be:
+
 - Reduce wasteful HTTP requests on a scale-down as much as we can.
 - We can accept that we might make 1 or 2 wasteful requests to Actions service, but we can't accept making 5/10+ of them.
 - See if we can meet feature parity with what the RunnerJobHook support with compromise any security concerns.
@ -77,9 +81,11 @@ a simple thought is how about we somehow attach some info to the `EphemeralRunne

 How about we send this info from the service to the auto-scaling-listener via the existing HTTP long-poll
 and let the listener patch the `EphemeralRunner.Status` to indicate it's running a job?
+
 > The listener is normally in a separate namespace with elevated permission and it's something we can trust.

 Changes:
+
 - Introduce a new message type `JobStarted` (in addition to the existing `JobAvailable/JobAssigned/JobCompleted`) on the service side, the message is sent when a runner of the `RunnerScaleSet` get assigned to a job,
  `RequestId`, `RunnerId`, and `RunnerName` will be included in the message.
 - Add `RequestId (int)` to `EphemeralRunner.Status`, this will indicate which job the runner is running.
--- a/adrs/2023-02-02-automate-runner-updates.md
+++ b/adrs/2023-02-02-automate-runner-updates.md
@ -1,4 +1,6 @@
-# Automate updating runner version
+# ADR 2023-02-02: Automate updating runner version
+
+**Date**: 2023-02-02

 **Status**: Proposed

@ -16,6 +18,7 @@ version is updated (and this is currently done manually).

 We can have another workflow running on a cadence (hourly seems sensible) and checking for new runner
 releases, creating a PR updating `RUNNER_VERSION` in:
+
 - `.github/workflows/release-runners.yaml`
 - `Makefile`
 - `runner/Makefile`
--- a/adrs/2023-02-10-limit-manager-role-permission.md
+++ b/adrs/2023-02-10-limit-manager-role-permission.md
@ -1,4 +1,5 @@
-# ADR 0007: Limit Permissions for Service Accounts in Actions-Runner-Controller
+# ADR 2023-02-10: Limit Permissions for Service Accounts in Actions-Runner-Controller
+
 **Date**: 2023-02-10

 **Status**: Pending
@ -7,7 +8,7 @@

 - `actions-runner-controller` is a Kubernetes CRD (with controller) built using https://github.com/kubernetes-sigs/controller-runtime

- [controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) has a default cache based k8s API client.Reader to make query k8s API server more efficiency. 
+- [controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) has a default cache based k8s API client.Reader to make query k8s API server more efficiency.

 - The cache-based API client requires cluster scope `list` and `watch` permission for any resource the controller may query.

@ -22,6 +23,7 @@ There are 3 service accounts involved for a working `AutoscalingRunnerSet` based
 This should have the lowest privilege (not any `RoleBinding` nor `ClusterRoleBinding`) by default, in the case of `containerMode=kubernetes`, it will get certain write permission with `RoleBinding` to limit the permission to a single namespace.

 > References:
+>
 > - ./charts/gha-runner-scale-set/templates/no_permission_serviceaccount.yaml
 > - ./charts/gha-runner-scale-set/templates/kube_mode_role.yaml
 > - ./charts/gha-runner-scale-set/templates/kube_mode_role_binding.yaml
@ -52,7 +54,7 @@ The current `ClusterRole` has the following permissions:

 ## Limit cluster role permission on Secrets

-The cluster scope `List` `Secrets` permission might be a blocker for adopting `actions-runner-controller` for certain customers as they may have certain restriction in their cluster that simply doesn't allow any service account to have cluster scope `List Secrets` permission. 
+The cluster scope `List` `Secrets` permission might be a blocker for adopting `actions-runner-controller` for certain customers as they may have certain restriction in their cluster that simply doesn't allow any service account to have cluster scope `List Secrets` permission.

 To help these customers and improve security for `actions-runner-controller` in general, we will try to limit the `ClusterRole` permission of the controller manager's service account down to the following:

@ -79,9 +81,10 @@ The `Role` and `RoleBinding` creation will happen during the `helm install demo

 During `helm install demo oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller`, we will store the controller's service account info as labels on the controller `Deployment`.
 Ex:
+
 ```yaml
-    actions.github.com/controller-service-account-namespace: {{ .Release.Namespace }}
-    actions.github.com/controller-service-account-name: {{ include "gha-runner-scale-set-controller.serviceAccountName" . }}
+actions.github.com/controller-service-account-namespace: {{ .Release.Namespace }}
+actions.github.com/controller-service-account-name: {{ include "gha-runner-scale-set-controller.serviceAccountName" . }}
 ```

 Introduce a new `Role` per `AutoScalingRunnerSet` installation and `RoleBinding` the `Role` with the controller's `ServiceAccount` in the namespace that each `AutoScalingRunnerSet` deployed with the following permission.
@ -102,8 +105,9 @@ The `gha-runner-scale-set` helm chart will use this service account to properly
 The `gha-runner-scale-set` helm chart will also allow customers to explicitly provide the controller service account info, in case the `helm lookup` couldn't locate the right controller `Deployment`.

 New sections in `values.yaml` of `gha-runner-scale-set`:
+
 ```yaml
-## Optional controller service account that needs to have required Role and RoleBinding 
+## Optional controller service account that needs to have required Role and RoleBinding
 ## to operate this gha-runner-scale-set installation.
 ## The helm chart will try to find the controller deployment and its service account at installation time.
 ## In case the helm chart can't find the right service account, you can explicitly pass in the following value
@ -129,5 +133,6 @@ You will deploy the `AutoScalingRunnerSet` with something like `helm install dem
 In this mode, you will end up with a manager `Role` that has all Get/List/Create/Delete/Update/Patch/Watch permissions on resources we need, and a `RoleBinding` to bind the `Role` with the controller `ServiceAccount` in the watched single namespace and the controller namespace, ex: `test-namespace` and `arc-system` in the above example.

 The downside of this mode:
+
 - When you have multiple controllers deployed, they will still use the same version of the CRD. So you will need to make sure every controller you deployed has to be the same version as each other.
- You can't mismatch install both `actions-runner-controller` in this mode (watchSingleNamespace) with the regular installation mode (watchAllClusterNamespaces) in your cluster.
+- You can't mismatch install both `actions-runner-controller` in this mode (watchSingleNamespace) with the regular installation mode (watchAllClusterNamespaces) in your cluster.
--- a/adrs/2023-04-14-adding-labels-k8s-resources.md
+++ b/adrs/2023-04-14-adding-labels-k8s-resources.md
@ -0,0 +1,89 @@
+# ADR 2023-04-14: Adding labels to our resources
+
+**Date**: 2023-04-14
+
+**Status**: Done [^1]
+
+## Context
+
+Users need to provide us with logs so that we can help support and troubleshoot their issues. We need a way for our users to filter and retrieve the logs we need.
+
+## Proposal
+
+A good start would be a catch-all label to get all logs that are
+ARC-related: one of the [recommended labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/)
+is `app.kubernetes.io/part-of` and we can set that for all ARC components
+to be `actions-runner-controller`.
+
+Assuming standard logging that would allow us to get all ARC logs by running
+
+```bash
+kubectl logs -l 'app.kubernetes.io/part-of=gha-runner-scale-set-controller'
+```
+
+which would be very useful for development to begin with.
+
+The proposal is to add these sets of labels to the pods ARC creates:
+
+#### controller-manager
+
+Labels to be set by the Helm chart:
+
+```yaml
+metadata:
+  labels:
+    app.kubernetes.io/part-of: gha-runner-scale-set-controller
+    app.kubernetes.io/component: controller-manager
+    app.kubernetes.io/version: "x.x.x"
+```
+
+#### Listener
+
+Labels to be set by controller at creation:
+
+```yaml
+metadata:
+  labels:
+    app.kubernetes.io/part-of: gha-runner-scale-set-controller
+    app.kubernetes.io/component: runner-scale-set-listener
+    app.kubernetes.io/version: "x.x.x"
+    actions.github.com/scale-set-name: scale-set-name # this corresponds to metadata.name as set for AutoscalingRunnerSet
+
+    # the following labels are to be extracted by the config URL
+    actions.github.com/enterprise: enterprise
+    actions.github.com/organization: organization
+    actions.github.com/repository: repository
+```
+
+#### Runner
+
+Labels to be set by controller at creation:
+
+```yaml
+metadata:
+  labels:
+    app.kubernetes.io/part-of: gha-runner-scale-set-controller
+    app.kubernetes.io/component: runner
+    app.kubernetes.io/version: "x.x.x"
+    actions.github.com/scale-set-name: scale-set-name # this corresponds to metadata.name as set for AutoscalingRunnerSet
+    actions.github.com/runner-name: runner-name
+    actions.github.com/runner-group-name: runner-group-name
+
+    # the following labels are to be extracted by the config URL
+    actions.github.com/enterprise: enterprise
+    actions.github.com/organization: organization
+    actions.github.com/repository: repository
+```
+
+This would allow us to ask users:
+
+> Can you please send us the logs coming from pods labelled 'app.kubernetes.io/part-of=gha-runner-scale-set-controller'?
+
+Or for example if they're having problems specifically with runners:
+
+> Can you please send us the logs coming from pods labelled 'app.kubernetes.io/component=runner'?
+
+This way users don't have to understand ARC moving parts but we still have a
+way to target them specifically if we need to.
+
+[^1]: [ADR 2022-12-05](2022-12-05-adding-labels-k8s-resources.md)
--- a/adrs/yyyy-mm-dd-TEMPLATE.md
+++ b/adrs/yyyy-mm-dd-TEMPLATE.md
@ -6,13 +6,13 @@

 ## Context

-*What is the issue or background knowledge necessary for future readers
-to understand why this ADR was written?*
+_What is the issue or background knowledge necessary for future readers
+to understand why this ADR was written?_

 ## Decision

-**What** is the change being proposed? / **How** will it be implemented?*
+_**What** is the change being proposed? **How** will it be implemented?_

 ## Consequences

-*What becomes easier or more difficult to do because of this change?*
+_What becomes easier or more difficult to do because of this change?_
--- a/charts/gha-runner-scale-set-controller/templates/_helpers.tpl
+++ b/charts/gha-runner-scale-set-controller/templates/_helpers.tpl
@ -39,7 +39,7 @@ helm.sh/chart: {{ include "gha-runner-scale-set-controller.chart" . }}
 {{- if .Chart.AppVersion }}
 app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
 {{- end }}
-app.kubernetes.io/part-of: {{ .Chart.Name }}
+app.kubernetes.io/part-of: gha-runner-scale-set-controller
 app.kubernetes.io/managed-by: {{ .Release.Service }}
 {{- range $k, $v := .Values.labels }}
 {{ $k }}: {{ $v }}
@ -110,4 +110,4 @@ Create the name of the service account to use
  {{- $names = append $names $v.name }}
 {{- end }}
 {{- $names | join ","}}
-{{- end }}
+{{- end }}
--- a/charts/gha-runner-scale-set-controller/templates/deployment.yaml
+++ b/charts/gha-runner-scale-set-controller/templates/deployment.yaml
@ -23,7 +23,7 @@ spec:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      labels:
-        app.kubernetes.io/part-of: actions-runner-controller
+        app.kubernetes.io/part-of: gha-runner-scale-set-controller
        app.kubernetes.io/component: controller-manager
        app.kubernetes.io/version: {{ .Chart.Version }}
        {{- include "gha-runner-scale-set-controller.selectorLabels" . | nindent 8 }}
--- a/charts/gha-runner-scale-set-controller/tests/template_test.go
+++ b/charts/gha-runner-scale-set-controller/tests/template_test.go
@ -310,6 +310,7 @@ func TestTemplate_ControllerDeployment_Defaults(t *testing.T) {
 	assert.Equal(t, namespaceName, deployment.Labels["actions.github.com/controller-service-account-namespace"])
 	assert.Equal(t, "test-arc-gha-runner-scale-set-controller", deployment.Labels["actions.github.com/controller-service-account-name"])
 	assert.NotContains(t, deployment.Labels, "actions.github.com/controller-watch-single-namespace")
+	assert.Equal(t, "gha-runner-scale-set-controller", deployment.Labels["app.kubernetes.io/part-of"])

 	assert.Equal(t, int32(1), *deployment.Spec.Replicas)

@ -416,6 +417,7 @@ func TestTemplate_ControllerDeployment_Customize(t *testing.T) {
 	assert.Equal(t, "test-arc", deployment.Labels["app.kubernetes.io/instance"])
 	assert.Equal(t, chart.AppVersion, deployment.Labels["app.kubernetes.io/version"])
 	assert.Equal(t, "Helm", deployment.Labels["app.kubernetes.io/managed-by"])
+	assert.Equal(t, "gha-runner-scale-set-controller", deployment.Labels["app.kubernetes.io/part-of"])
 	assert.Equal(t, "bar", deployment.Labels["foo"])
 	assert.Equal(t, "actions", deployment.Labels["github"])