doc: Use RunnerSet to retain various cache by leveraging PV (#1464)

* doc: Use RunnerSet to retain various cache In relation to #1286 and as a follow-up for #1340 * docs: clarify client vs daemon * docs: better wording * Separate RunnerSet examples for docker iimage layer caching * Revert changes on testdata as it is going to be added via #1471 instead * Update README.md Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com> * fixup! Update README.md * Remove the outdated RunnerSet limitation Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
2022-05-25 11:09:36 +09:00 · 2022-05-25 11:09:36 +09:00 · ef3313d147
parent c7eea169ad
commit ef3313d147
1 changed files with 123 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -473,7 +473,6 @@ Under the hood, `RunnerSet` relies on Kubernetes's `StatefulSet` and Mutating We
 **Limitations**

 * For autoscaling the `RunnerSet` kind only supports pull driven scaling or the `workflow_job` event for webhook driven scaling.
-* Whilst `RunnerSets` support all runner modes as well as autoscaling, currently PVs are **NOT** automatically cleaned up as they are still bound to their respective PVCs when a runner is deleted by the controller. This has **major** implications when using `RunnerSets` in the standard runner mode, `ephemeral: true`, see [persistent runners](#persistent-runners) for more details. As a result of this, using the default ephemeral configuration or implementing autoscaling for your `RunnerSets`, you will get a build-up of PVCs and PVs without some sort of custom solution for cleaning up.

 ### Persistent Runners

@ -1168,7 +1167,8 @@ spec:
 You can configure your own custom volume mounts. For example to have the work/docker data in memory or on NVME SSD, for
 i/o intensive builds. Other custom volume mounts should be possible as well, see [kubernetes documentation](https://kubernetes.io/docs/concepts/storage/volumes/)

-**RAM Disk Runner**<br />
+#### RAM Disk
+
 Example how to place the runner work dir, docker sidecar and /tmp within the runner onto a ramdisk.
 ```yaml
 kind: RunnerDeployment
@ -1194,7 +1194,8 @@ spec:
      emphemeral: true # recommended to not leak data between builds.
 ```

-**NVME SSD Runner**<br />
+#### NVME SSD
+
 In this example we provide NVME backed storage for the workdir, docker sidecar and /tmp within the runner.
 Here we use a working example on GKE, which will provide the NVME disk at /mnt/disks/ssd0.  We will be placing the respective volumes in subdirs here and in order to be able to run multiple runners we will use the pod name as a prefix for subdirectories. Also the disk will fill up over time and disk space will not be freed until the node is removed.

@ -1242,6 +1243,125 @@ spec:
    emphemeral: true # VERY important. otherwise data inside the workdir and /tmp is not cleared between builds
 ```

+#### Docker image layers caching
+
+> **Note**: Ensure that the volume mount is added to the container that is running the Docker daemon.
+
+`docker` stores pulled and built image layers in the [daemon's (note not client)](https://docs.docker.com/get-started/overview/#docker-architecture) [local storage area](https://docs.docker.com/storage/storagedriver/#sharing-promotes-smaller-images) which is usually at `/var/lib/docker`.
+
+By leveraging RunnerSet's dynamic PV provisioning feature and your CSI driver, you can let ARC maintain a pool of PVs that are
+reused across runner pods to retain `/var/lib/docker`.
+
+_Be sure to add the volume mount to the container that is supposed to run the docker daemon._
+
+By default, ARC creates a sidecar container named `docker` within the runner pod for running the docker daemon. In that case,
+it's where you need the volume mount so that the manifest looks like:
+
+```yaml
+kind: RunnerSet
+metadata:
+  name: example
+spec:
+  template:
+    spec:
+      containers:
+      - name: docker
+        volumeMounts:
+        - name: var-lib-docker
+          mountPath: /var/lib/docker
+  volumeClaimtemplates:
+  - metadata:
+      name: var-lib-docker
+    spec:
+      accessModes:
+      - ReadWriteOnce
+      resources:
+        requests:
+          storage: 10Mi
+      storageClassName: var-lib-docker
+```
+
+With `dockerdWithinRunnerContainer: true`, you need to add the volume mount to the `runner` container.
+
+#### Go module and build caching
+
+`Go` is known to cache builds under `$HOME/.cache/go-build` and downloaded modules under `$HOME/pkg/mod`.
+The module cache dir can be customized by setting `GOMOD_CACHE` so by setting it to somewhere under `$HOME/.cache`,
+we can have a single PV to host both build and module cache, which might improve Go module downloading and building time.
+
+```yaml
+kind: RunnerSet
+metadata:
+  name: example
+spec:
+  template:
+    spec:
+      containers:
+      - name: runner
+        env:
+        - name: GOMODCACHE
+          value: "/home/runner/.cache/go-mod"
+        volumeMounts:
+        - name: cache
+          mountPath: "/home/runner/.cache"
+  volumeClaimTemplates:
+  - metadata:
+      name: cache
+    spec:
+      accessModes:
+      - ReadWriteOnce
+      resources:
+        requests:
+          storage: 10Mi
+      storageClassName: cache
+```
+
+#### PV-backed runner work directory
+
+ARC works by automatically creating runner pods for running [`actions/runner`](https://github.com/actions/runner) and [running `config.sh`](https://docs.github.com/en/actions/hosting-your-own-runners/adding-self-hosted-runners#adding-a-self-hosted-runner-to-a-repository) which you had to ran manually without ARC.
+
+`config.sh` is the script provided by `actions/runner` to pre-configure the runner process before being started. One of the options provided by `config.sh` is `--work`,
+which specifies the working directory where the runner runs your workflow jobs in.
+
+The volume and the partition that hosts the work directory should have several or dozens of GBs free space that might be used by your workflow jobs.
+
+By default, ARC uses `/runner/_work` as work directory, which is powered by Kubernetes's `emptyDir`. [`emptyDir` is usually backed by a directory created within a host's volume](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir), somewhere under `/var/lib/kuberntes/pods`. Therefore
+your host's volume that is backing `/var/lib/kubernetes/pods` must have enough free space to serve all the concurrent runner pods that might be deployed onto your host at the same time.
+
+So, in case you see a job failure seemingly due to "disk full", it's very likely you need to reconfigure your host to have more free space.
+
+In case you can't rely on host's volume, consider using `RunnerSet` and backing the work directory with a ephemeral PV.
+
+Kubernetes 1.23 or greater provides the support for [generic ephemeral volumes](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes), which is designed to support this exact use-case. It's defined in the Pod spec API so it isn't currently available for `RunnerDeployment`. `RunnerSet` is based on Kubernetes' `StatefulSet` which mostly embeds the Pod spec under `spec.template.spec`, so there you go.
+
+```yaml
+kind: RunnerSet
+metadata:
+  name: example
+spec:
+  template:
+    spec:
+      containers:
+      - name: runner
+        volumeMounts:
+        - mountPath: /runner/_work
+          name: work
+      - name: docker
+        volumeMounts:
+        - mountPath: /runner/_work
+          name: work
+      volumes:
+      - name: work
+        ephemeral:
+          volumeClaimTemplate:
+            spec:
+              accessModes: [ "ReadWriteOnce" ]
+              storageClassName: "runner-work-dir"
+              resources:
+                requests:
+                  storage: 10Gi
+```
+
 ### Runner Labels

 To run a workflow job on a self-hosted runner, you can use the following syntax in your workflow: