Fix listener startup race advertising phantom capacity to GitHub

**Impact:** All ARC scale sets with the capacity monitor enabled
**Risk:** low

## What
Seed the listener with MaxRunners=0 when the capacity monitor is enabled, preventing the startup race where the listener advertises unbounded capacity before the monitor's first reconcile cycle.

## Why
When `max_runners` is omitted from the runner definition, the controller defaults to math.MaxInt32 (~2.1B). The listener and capacity monitor start as concurrent goroutines with no synchronization — the listener's first GetMessage poll fires before the monitor completes its initial reconcile, advertising MaxInt32 capacity to GitHub's brokerage. GitHub dispatches jobs that cannot be fulfilled, causing phantom capacity failures. This was identified during the ARC phantom capacity investigation.

## How
- Moved `capacity.ConfigFromEnv()` earlier so the capacity-enabled flag is available before listener construction
- Introduced `listenerInitialMaxRunners()` to gate the seed value: returns 0 when the capacity monitor owns advertised capacity, passes through configMax otherwise
- Zero is safe because the capacity monitor calls `SetMaxRunners` with the real value on its first reconcile, before any meaningful dispatch window

## Changes
- `cmd/ghalistener/main.go`: Hoisted `capConfig := capacity.ConfigFromEnv()` above listener construction; replaced raw `config.MaxRunners` with `listenerInitialMaxRunners(config.MaxRunners, capConfig.Enabled)` in the listener config
- `cmd/ghalistener/main.go`: Added `listenerInitialMaxRunners()` helper function
- `cmd/ghalistener/main_test.go`: New table-driven tests covering all combinations — monitor disabled (passthrough for 5, 0, MaxInt32) and monitor enabled (always seeds 0)

## Testing
- Unit tests: `go test ./cmd/ghalistener/... -run TestListenerInitialMaxRunners`
- Verify no regression when capacity monitor is disabled: listener should still advertise the configured MaxRunners value
- Integration: deploy to a scale set with the capacity monitor enabled and confirm the first GetMessage poll shows MaxRunners=0 in logs, then the monitor's reconcile updates it to the real capacity

Signed-off-by: Jean Schmidt <contato@jschmidt.me>
This commit is contained in:
Jean Schmidt 2026-05-20 10:14:20 -07:00
parent d5d94fba48
commit 12dd52ce1b
2 changed files with 78 additions and 2 deletions

View File

@ -104,11 +104,12 @@ func run(ctx context.Context, config *config.Config) error {
metricsExporter.RecordStatic(config.MinRunners, config.MaxRunners)
}
capConfig := capacity.ConfigFromEnv()
listener, err := listener.New(
sessionClient,
listener.Config{
ScaleSetID: config.RunnerScaleSetID,
MaxRunners: config.MaxRunners,
MaxRunners: listenerInitialMaxRunners(config.MaxRunners, capConfig.Enabled),
Logger: logger.With("component", "listener"),
},
listenerOptions...,
@ -131,7 +132,6 @@ func run(ctx context.Context, config *config.Config) error {
}
// Capacity monitor (optional).
capConfig := capacity.ConfigFromEnv()
if capConfig.Enabled {
scaleSet, err := scalesetClient.GetRunnerScaleSetByID(ctx, config.RunnerScaleSetID)
if err != nil {
@ -221,3 +221,15 @@ func run(ctx context.Context, config *config.Config) error {
return g.Wait()
}
// listenerInitialMaxRunners returns the MaxRunners value to seed the listener
// with. When the capacity monitor is enabled it owns advertised capacity via
// SetMaxRunners; returning 0 ensures the listener's first GetMessage poll
// advertises 0 instead of configMax (which defaults to math.MaxInt32 upstream
// when unset) before the monitor's first reconcileReporting cycle has run.
func listenerInitialMaxRunners(configMax int, capacityEnabled bool) int {
if capacityEnabled {
return 0
}
return configMax
}

View File

@ -0,0 +1,64 @@
package main
import (
"math"
"testing"
)
func TestListenerInitialMaxRunners(t *testing.T) {
tests := []struct {
name string
configMax int
capacityEnabled bool
want int
}{
{
name: "monitor disabled passes configMax through",
configMax: 5,
capacityEnabled: false,
want: 5,
},
{
name: "monitor disabled with 0 configMax stays 0",
configMax: 0,
capacityEnabled: false,
want: 0,
},
{
name: "monitor disabled passes MaxInt32 through",
configMax: math.MaxInt32,
capacityEnabled: false,
want: math.MaxInt32,
},
{
name: "monitor enabled seeds at 0 regardless of configMax",
configMax: 5,
capacityEnabled: true,
want: 0,
},
{
name: "monitor enabled seeds at 0 even when configMax is MaxInt32",
configMax: math.MaxInt32,
capacityEnabled: true,
want: 0,
},
{
name: "monitor enabled seeds at 0 even when configMax is already 0",
configMax: 0,
capacityEnabled: true,
want: 0,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := listenerInitialMaxRunners(tt.configMax, tt.capacityEnabled)
if got != tt.want {
t.Errorf(
"listenerInitialMaxRunners(configMax=%d, capacityEnabled=%v) = %d, want %d",
tt.configMax, tt.capacityEnabled, got, tt.want,
)
}
})
}
}