7.0 KiB

Raw Blame History

ArgoCD Health Checks for Actions Runner Controller

This document explains how to configure ArgoCD to properly monitor the health status of GitHub Actions Runner resources.

Problem

By default, ArgoCD doesn't understand the health status of custom resources like Runner. Even when a Runner Pod is up and running, ArgoCD may show the status as "Progressing" instead of "Healthy".

Overview

ArgoCD needs custom health check configurations to understand the status of Actions Runner Controller resources. This guide provides ready-to-use configurations that enable ArgoCD to correctly display the health status of your runners.

File Structure

config/argocd/
├── argocd-cm.yaml                          # Complete health check configuration
├── health-check-runner.yaml                # Legacy Runner API health check
├── health-check-ephemeralrunner.yaml       # New Runner API health check
├── health-check-autoscalingrunnerset.yaml  # AutoScalingRunnerSet health check
├── health-check-pod.yaml                   # Pod health check for runners
└── kustomization.yaml                      # Main kustomization file with usage examples

Quick Start

Method 1: Apply All Health Checks

kubectl apply -f config/argocd/argocd-cm.yaml

Method 2: Use Kustomize

kubectl apply -k config/argocd/

Method 3: Apply Specific Health Checks

# For legacy runners only
kubectl apply -f config/argocd/health-check-runner.yaml

# For new API runners
kubectl apply -f config/argocd/health-check-ephemeralrunner.yaml
kubectl apply -f config/argocd/health-check-autoscalingrunnerset.yaml

# For pod monitoring
kubectl apply -f config/argocd/health-check-pod.yaml

Method 4: Edit ConfigMap Directly

Add the health check configurations directly to the existing ArgoCD ConfigMap:

kubectl edit configmap argocd-cm -n argocd

Then add the health check configurations under the data section. You can copy the content from the provided YAML files, ensuring proper indentation.

Method 5: Helm Values

When installing ArgoCD via Helm, add to your values.yaml:

server:
  config:
    # Copy the health check configurations from the YAML files
    resource.customizations.health.actions.summerwind.dev_Runner: |
      # ... (content from YAML file)

Kustomize Usage

The provided kustomization.yaml file includes three different usage patterns:

Option 1: Apply All Health Checks

The default configuration applies all health checks at once using the complete argocd-cm.yaml.

Option 2: Selective Health Checks

Uncomment specific patches in kustomization.yaml to apply only the health checks you need.

Option 3: ConfigMapGenerator

Use the configMapGenerator approach when ArgoCD ConfigMap is managed by another system. This method merges health checks without replacing the existing ConfigMap.

See kustomization.yaml for detailed examples and comments for each option.

Verifying the Configuration

Check ArgoCD UI

Navigate to your application in ArgoCD UI
Look for Runner resources
Verify health status indicators show correct colors

Using ArgoCD CLI

# Refresh and check application status
argocd app get <your-app-name> --refresh

# Check specific resource health
argocd app resources <your-app-name> --kind Runner

Using kubectl

Verify runner status that ArgoCD reads:

# Check runner status
kubectl get runners -o jsonpath='{.items[*].status.phase}'

# Check ephemeral runner status
kubectl get ephemeralrunners -o jsonpath='{.items[*].status.phase}'

# Check autoscaling runner set
kubectl get autoscalingrunnersets -o jsonpath='{.items[*].status.currentReplicas}'

What These Configurations Do

Runner Health Status in ArgoCD

Once configured, ArgoCD will display runner health as follows:

Runner State	ArgoCD Status	Description
Running and Ready	Healthy	Runner is online and processing jobs
Starting up	Progressing	Runner pod is initializing
Failed	Degraded	Runner encountered an error
Scaling	Progressing	AutoScaler is adjusting runner count

Pod Health Status in ArgoCD

The enhanced configuration also monitors the health of Runner pods:

Pod State	ArgoCD Status	Description
Running with all containers ready	Healthy	Pod is fully operational
Succeeded	Healthy	Pod completed its task successfully
Pending	Progressing	Pod is being scheduled or starting
Running but containers not ready	Progressing	Pod is running but not fully ready
Failed	Degraded	Pod or containers have failed
CrashLoopBackOff	Degraded	Container is repeatedly crashing
ImagePullBackOff	Degraded	Cannot pull container image

Supported Resources

The configurations support four resource types:

Runner (actions.summerwind.dev/v1alpha1)
- Legacy runner type
- Shows as healthy when pod is running and runner is registered
EphemeralRunner (actions.github.com/v1alpha1)
- New ephemeral runner type
- Supports job-specific runners that terminate after use
- Shows as healthy during job execution and after completion
- Enhanced status tracking including job IDs and runner IDs
AutoScalingRunnerSet (actions.github.com/v1alpha1)
- Manages groups of ephemeral runners
- Shows current vs desired runner count
- Healthy when scaled to target size
- Displays pending, running, and terminating runner counts
Pod (core/v1)
- Health checks for runner pods specifically
- Monitors container readiness and status
- Detects common issues like CrashLoopBackOff and ImagePullBackOff
- Only applies to pods with runner-specific labels

Important Notes

Restart ArgoCD: After applying health checks, restart ArgoCD server:

kubectl rollout restart deployment argocd-server -n argocd

Label Detection: Pod health checks only apply to pods with runner-specific labels
Namespace: All configurations assume ArgoCD is installed in the argocd namespace

Troubleshooting

Health Status Not Updating

Verify ConfigMap is applied:

kubectl get configmap argocd-cm -n argocd -o yaml | grep customizations

Check ArgoCD logs:

kubectl logs -n argocd deployment/argocd-server | grep health

Refresh application in ArgoCD:

argocd app get <app-name> --refresh

Incorrect Health Status

Check runner status:

kubectl get runners -o yaml
kubectl get pods -l app.kubernetes.io/component=runner

Verify labels on pods:

kubectl get pods -o jsonpath='{.items[*].metadata.labels}'

7.0 KiB Raw Blame History