actions-runner-controller/docs/argocd-health-check.md

7.2 KiB

ArgoCD Health Check Configuration for Actions Runner Controller

This document explains how to configure ArgoCD to properly monitor the health status of GitHub Actions Runner resources.

Problem

By default, ArgoCD doesn't understand the health status of custom resources like Runner. Even when a Runner Pod is up and running, ArgoCD may show the status as "Progressing" instead of "Healthy".

Overview

ArgoCD needs custom health check configurations to understand the status of Actions Runner Controller resources. This guide provides ready-to-use configurations that enable ArgoCD to correctly display the health status of your runners.

Quick Start

Apply one of the following configurations based on your runner deployment type:

For New Runner API

kubectl apply -f config/argocd/ephemeralrunner-health.yaml

For Legacy Runner API

kubectl apply -f config/argocd/runner-health.yaml

After applying, restart ArgoCD server:

kubectl rollout restart deployment argocd-server -n argocd

What These Configurations Do

Runner Health Status in ArgoCD

Once configured, ArgoCD will display runner health as follows:

Runner State ArgoCD Display Description
Running and Ready Healthy (Green) Runner is online and processing jobs
Starting up Progressing (Yellow) Runner pod is initializing
Failed Degraded (Red) Runner encountered an error
Scaling Progressing (Yellow) AutoScaler is adjusting runner count

Supported Resources

The configurations support three resource types:

  1. Runner (actions.summerwind.dev/v1alpha1)

    • Legacy runner type
    • Shows as healthy when pod is running and runner is registered
  2. EphemeralRunner (actions.github.com/v1alpha1)

    • New ephemeral runner type
    • Supports job-specific runners that terminate after use
    • Shows as healthy during job execution and after completion
  3. AutoScalingRunnerSet (actions.github.com/v1alpha1)

    • Manages groups of ephemeral runners
    • Shows current vs desired runner count
    • Healthy when scaled to target size

Installation Methods

Method 1: Apply YAML Files

Use the provided configuration files:

# For ephemeral runners
kubectl apply -f config/argocd/ephemeralrunner-health.yaml

# For legacy runners
kubectl apply -f config/argocd/runner-health.yaml

Method 2: Edit ConfigMap Directly

Add the health check configurations directly to the existing ArgoCD ConfigMap:

kubectl edit configmap argocd-cm -n argocd

Then add the health check configurations under the data section. You can copy the content from the provided YAML files, ensuring proper indentation.

Method 3: Patch Existing ConfigMap

If you already have an ArgoCD ConfigMap:

kubectl patch configmap argocd-cm -n argocd --type merge -p @config/argocd/ephemeralrunner-health.yaml

Method 4: Using Kustomize

Option A: Merge with existing ConfigMap

If ArgoCD ConfigMap is managed by ArgoCD itself:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: argocd

generatorOptions:
  disableNameSuffixHash: true

configMapGenerator:
- name: argocd-cm
  behavior: merge
  files:
  - resource.customizations.health.actions.summerwind.dev_Runner=config/argocd/runner-health.yaml
  - resource.customizations.health.actions.github.com_EphemeralRunner=config/argocd/ephemeralrunner-health.yaml

Option B: Direct ConfigMap management

If you manage argocd-cm as a file, add the health checks directly to your ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  # Add health check for legacy Runner
  resource.customizations.health.actions.summerwind.dev_Runner: |
    hs = {}
    if obj.status ~= nil then
      if obj.status.ready == true and obj.status.phase == "Running" then
        hs.status = "Healthy"
        hs.message = "Runner is ready and running"
      elseif obj.status.phase == "Pending" or obj.status.phase == "Created" then
        hs.status = "Progressing"
        hs.message = "Runner is starting up"
      elseif obj.status.phase == "Failed" then
        hs.status = "Degraded"
        hs.message = obj.status.message or "Runner has failed"
      else
        hs.status = "Progressing"
        hs.message = "Runner status: " .. (obj.status.phase or "Unknown")
      end
    else
      hs.status = "Progressing"
      hs.message = "Waiting for runner status"
    end
    return hs    

  # Add health check for EphemeralRunner
  resource.customizations.health.actions.github.com_EphemeralRunner: |
    hs = {}
    if obj.status ~= nil then
      if obj.status.phase == "Running" then
        hs.status = "Healthy"
        hs.message = "EphemeralRunner is running"
      elseif obj.status.phase == "Pending" then
        hs.status = "Progressing"
        hs.message = "EphemeralRunner is pending"
      elseif obj.status.phase == "Failed" then
        hs.status = "Degraded"
        hs.message = obj.status.message or "EphemeralRunner has failed"
      elseif obj.status.phase == "Finished" then
        hs.status = "Healthy"
        hs.message = "EphemeralRunner has finished"
      else
        hs.status = "Progressing"
        hs.message = "EphemeralRunner status: " .. (obj.status.phase or "Unknown")
      end
    else
      hs.status = "Progressing"
      hs.message = "Waiting for EphemeralRunner status"
    end
    return hs    

Method 5: Helm Values

When installing ArgoCD via Helm, add to your values.yaml:

server:
  config:
    # Copy the health check configurations from the YAML files
    resource.customizations.health.actions.summerwind.dev_Runner: |
      # ... (content from YAML file)      

Verifying the Configuration

Check ArgoCD UI

  1. Navigate to your application in ArgoCD UI
  2. Look for Runner resources
  3. Verify health status indicators show correct colors

Using ArgoCD CLI

# Refresh and check application status
argocd app get <your-app-name> --refresh

# Check specific resource health
argocd app resources <your-app-name> --kind Runner

Using kubectl

Verify runner status that ArgoCD reads:

# Check runner status
kubectl get runners -o jsonpath='{.items[*].status.phase}'

# Check ephemeral runner status
kubectl get ephemeralrunners -o jsonpath='{.items[*].status.phase}'

# Check autoscaling runner set
kubectl get autoscalingrunnersets -o jsonpath='{.items[*].status.currentReplicas}'

Troubleshooting

Health Status Not Updating

  1. Verify ConfigMap is applied:

    kubectl get configmap argocd-cm -n argocd -o yaml | grep actions
    
  2. Ensure ArgoCD server was restarted:

    kubectl rollout status deployment argocd-server -n argocd
    
  3. Check ArgoCD logs:

    kubectl logs -n argocd deployment/argocd-server | grep health
    

Incorrect Health Status

If runners show as "Progressing" when they should be "Healthy":

  1. Check runner pod status:

    kubectl get pods -l app.kubernetes.io/name=runner
    
  2. Verify runner registration:

    kubectl describe runner <runner-name>
    
  3. Look for status fields:

    • status.phase should be "Running"
    • status.ready should be "true"