Expand ArgoCD health check docs for runner resources

This commit is contained in:
Kuangyu Jing 2025-07-17 02:51:32 +09:00
parent 1d2ef5d7a8
commit 1f5407e60c
1 changed files with 168 additions and 53 deletions

View File

@ -1,83 +1,198 @@
# ArgoCD Health Check Configuration for Actions Runner Controller
This document explains how to configure ArgoCD to properly recognize the health status of Runner resources.
This document explains how to configure ArgoCD to properly monitor the health status of GitHub Actions Runner resources.
## Problem
By default, ArgoCD doesn't understand the health status of custom resources like `Runner`. Even when a Runner Pod is up and running, ArgoCD may show the status as "Progressing" instead of "Healthy".
## Solution
## Overview
Add a custom health check configuration to ArgoCD's ConfigMap to interpret the Runner resource's status fields.
ArgoCD needs custom health check configurations to understand the status of Actions Runner Controller resources. This guide provides ready-to-use configurations that enable ArgoCD to correctly display the health status of your runners.
### 1. Apply the Custom Health Check
## Quick Start
Apply the following configuration to your ArgoCD installation:
Apply one of the following configurations based on your runner deployment type:
For New Runner API
```bash
kubectl apply -f argocd-runner-health.yaml
kubectl apply -f config/argocd/ephemeralrunner-health.yaml
```
Or, if you already have an `argocd-cm` ConfigMap, add the following to the `data` section:
```yaml
data:
resource.customizations.health.actions.summerwind.dev_Runner: |
hs = {}
if obj.status ~= nil then
if obj.status.ready == true and obj.status.phase == "Running" then
hs.status = "Healthy"
hs.message = "Runner is ready and running"
elseif obj.status.phase == "Pending" or obj.status.phase == "Created" then
hs.status = "Progressing"
hs.message = "Runner is starting up"
elseif obj.status.phase == "Failed" then
hs.status = "Degraded"
hs.message = obj.status.message or "Runner has failed"
else
hs.status = "Progressing"
hs.message = "Runner status: " .. (obj.status.phase or "Unknown")
end
else
hs.status = "Progressing"
hs.message = "Waiting for runner status"
end
return hs
For Legacy Runner API
```bash
kubectl apply -f config/argocd/runner-health.yaml
```
### 2. Restart ArgoCD Server
After applying the configuration, restart the ArgoCD server to load the new health check:
After applying, restart ArgoCD server:
```bash
kubectl rollout restart deployment argocd-server -n argocd
```
## Health Status Mapping
## What These Configurations Do
The custom health check maps Runner statuses to ArgoCD health statuses as follows:
### Runner Health Status in ArgoCD
| Runner Status | ArgoCD Health Status | Description |
| -- | -- | -- |
| `ready: true` and `phase: Running` | Healthy | Runner is fully operational |
| `phase: Pending` or `Created` | Progressing | Runner is starting up |
| `phase: Failed` | Degraded | Runner has encountered an error |
| Other states | Progressing | Runner is in transition |
Once configured, ArgoCD will display runner health as follows:
## Verification
| Runner State | ArgoCD Display | Description |
|-------------|----------------|-------------|
| Running and Ready | **Healthy** (Green) | Runner is online and processing jobs |
| Starting up | **Progressing** (Yellow) | Runner pod is initializing |
| Failed | **Degraded** (Red) | Runner encountered an error |
| Scaling | **Progressing** (Yellow) | AutoScaler is adjusting runner count |
After configuration, you can verify the health status in ArgoCD:
### Supported Resources
1. Check the ArgoCD UI - Runner resources should show as "Healthy" when ready
2. Use the ArgoCD CLI:
```bash
argocd app get <your-app-name> --refresh
```
The configurations support three resource types:
## Alternative Approach: Patching the ConfigMap
1. **Runner** (actions.summerwind.dev/v1alpha1)
- Legacy runner type
- Shows as healthy when pod is running and runner is registered
If you need to patch an existing ConfigMap:
2. **EphemeralRunner** (actions.github.com/v1alpha1)
- New ephemeral runner type
- Supports job-specific runners that terminate after use
- Shows as healthy during job execution and after completion
3. **AutoScalingRunnerSet** (actions.github.com/v1alpha1)
- Manages groups of ephemeral runners
- Shows current vs desired runner count
- Healthy when scaled to target size
## Installation Methods
### Method 1: Apply YAML Files
Use the provided configuration files:
```bash
kubectl patch configmap argocd-cm -n argocd --type merge -p '{"data":{"resource.customizations.health.actions.summerwind.dev_Runner":"hs = {}\nif obj.status ~= nil then\n if obj.status.ready == true and obj.status.phase == \"Running\" then\n hs.status = \"Healthy\"\n hs.message = \"Runner is ready and running\"\n elseif obj.status.phase == \"Pending\" or obj.status.phase == \"Created\" then\n hs.status = \"Progressing\"\n hs.message = \"Runner is starting up\"\n elseif obj.status.phase == \"Failed\" then\n hs.status = \"Degraded\"\n hs.message = obj.status.message or \"Runner has failed\"\n else\n hs.status = \"Progressing\"\n hs.message = \"Runner status: \" .. (obj.status.phase or \"Unknown\")\n end\nelse\n hs.status = \"Progressing\"\n hs.message = \"Waiting for runner status\"\nend\nreturn hs"}}'
# For ephemeral runners
kubectl apply -f config/argocd/ephemeralrunner-health.yaml
# For legacy runners
kubectl apply -f config/argocd/runner-health.yaml
```
### Method 2: Edit ConfigMap Directly
Add the health check configurations directly to the existing ArgoCD ConfigMap:
```bash
kubectl edit configmap argocd-cm -n argocd
```
Then add the health check configurations under the `data` section. You can copy the content from the provided YAML files, ensuring proper indentation.
### Method 3: Patch Existing ConfigMap
If you already have an ArgoCD ConfigMap:
```bash
kubectl patch configmap argocd-cm -n argocd --type merge -p @config/argocd/ephemeralrunner-health.yaml
```
### Method 4: Using Kustomize
Create a kustomization.yaml file:
```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: argocd
configMapGenerator:
- name: argocd-cm
behavior: merge
files:
- resource.customizations.health.actions.summerwind.dev_Runner=config/argocd/runner-health.yaml
- resource.customizations.health.actions.github.com_EphemeralRunner=config/argocd/ephemeralrunner-health.yaml
```
Then apply with:
```bash
kubectl apply -k .
```
### Method 5: Helm Values
When installing ArgoCD via Helm, add to your values.yaml:
```yaml
server:
config:
# Copy the health check configurations from the YAML files
resource.customizations.health.actions.summerwind.dev_Runner: |
# ... (content from YAML file)
```
## Verifying the Configuration
### Check ArgoCD UI
1. Navigate to your application in ArgoCD UI
2. Look for Runner resources
3. Verify health status indicators show correct colors
### Using ArgoCD CLI
```bash
# Refresh and check application status
argocd app get <your-app-name> --refresh
# Check specific resource health
argocd app resources <your-app-name> --kind Runner
```
### Using kubectl
Verify runner status that ArgoCD reads:
```bash
# Check runner status
kubectl get runners -o jsonpath='{.items[*].status.phase}'
# Check ephemeral runner status
kubectl get ephemeralrunners -o jsonpath='{.items[*].status.phase}'
# Check autoscaling runner set
kubectl get autoscalingrunnersets -o jsonpath='{.items[*].status.currentReplicas}'
```
## Troubleshooting
### Health Status Not Updating
1. **Verify ConfigMap is applied**:
```bash
kubectl get configmap argocd-cm -n argocd -o yaml | grep actions
```
2. **Ensure ArgoCD server was restarted**:
```bash
kubectl rollout status deployment argocd-server -n argocd
```
3. **Check ArgoCD logs**:
```bash
kubectl logs -n argocd deployment/argocd-server | grep health
```
### Incorrect Health Status
If runners show as "Progressing" when they should be "Healthy":
1. Check runner pod status:
```bash
kubectl get pods -l app.kubernetes.io/name=runner
```
2. Verify runner registration:
```bash
kubectl describe runner <runner-name>
```
3. Look for status fields:
- `status.phase` should be "Running"
- `status.ready` should be "true"