fix: resolve issues #2295, #2296, and #2297 (#2298)

* fix: resolve issues #2295, #2296, #2297 and OCI registry login

This PR fixes four related bugs affecting chart preparation, caching,
and OCI registry authentication.

Issue #2295: OCI chart cache conflicts with parallel helmfile processes
- Added filesystem-level locking using flock for cross-process sync
- Implements double-check locking pattern for efficiency
- Retry logic with 5-minute timeout and 3 retries
- Refactored into reusable acquireChartLock() helper function
- Added refresh marker coordination for cross-process cache management

Issue #2296: helmDefaults.skipDeps and helmDefaults.skipRefresh ignored
- Check both CLI options AND helmDefaults when deciding to skip repo sync

Issue #2297: Local chart + transformers causes panic
- Normalize local chart paths to absolute before calling chartify

OCI Registry Login URL Fix:
- Added extractRegistryHost() to extract just the registry host from URLs
- Fixed SyncRepos to use extracted host for OCI registry login
- e.g., "account.dkr.ecr.region.amazonaws.com/charts" ->
        "account.dkr.ecr.region.amazonaws.com"

Test Plan:
- Unit tests for issues #2295, #2296, #2297
- Unit tests for OCI registry login (extractRegistryHost, SyncRepos_OCI)
- Integration tests for issues #2295 and #2297
- All existing unit tests pass (including TestLint)

Fixes #2295
Fixes #2296
Fixes #2297

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: replace 60s timeout with reader-writer locks for OCI chart caching

Address PR review feedback from @champtar about the OCI chart caching
mechanism. The previous implementation used a 60-second timeout which
was arbitrary and caused race conditions when helm deployments took
longer (e.g., deployments triggering scaling up/down).

Changes:
- Replace 60s refresh marker timeout with proper reader-writer locks
- Use shared locks (RLock) when using cached charts (allows concurrent reads)
- Use exclusive locks (Lock) when refreshing/downloading charts
- Hold locks during entire helm operation lifecycle (not just during download)
- Add getNamedRWMutex() for in-process RW coordination
- Update PrepareCharts() to return locks map for lifecycle management
- Add chartLockReleaser in run.go to release locks after helm callback
- Remove unused mutexMap and getNamedMutex (replaced by RW versions)
- Add comprehensive tests for shared/exclusive lock behavior

This eliminates the race condition where one process could delete a
cached chart while another process's helm command was still using it.

Fixes review comment on PR #2298

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: prevent deadlock when multiple releases share the same chart

When multiple releases use the same OCI chart (e.g., same chart different
values), workers in PrepareCharts would deadlock:

1. Worker 1 acquires lock for chart/path, downloads, adds to cache
2. Worker 2 finds chart in cache, tries to acquire lock on same path
3. Worker 2 blocks waiting for Worker 1's lock
4. Collector waits for Worker 2's result
5. Worker 1's lock held until PrepareCharts finishes -> deadlock

The fix: when using the in-memory chart cache (which means another worker
in the same process already downloaded the chart), don't acquire another
lock. This is safe because:
- The in-memory cache is only used within a single helmfile process
- The tempDir cleanup is deferred until after helm callback completes
- Cross-process coordination is still handled by file locks during downloads

This fixes the "signal: killed" test failures in CI for:
- oci_chart_pull_direct
- oci_chart_pull_once
- oci_chart_pull_once2

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: resolve deadlock by releasing OCI chart locks immediately after download

This commit simplifies the OCI chart locking mechanism to fix deadlock
issues that occurred when multiple releases shared the same chart.

Problem:
When multiple releases used the same OCI chart, workers in PrepareCharts
would deadlock because:
1. Worker 1 acquires lock for chart/path, downloads chart
2. Worker 2 tries to acquire lock on same path, blocks waiting
3. PrepareCharts waits for all workers to complete
4. Worker 1's lock held until PrepareCharts finishes -> deadlock

Solution:
Release locks immediately after chart download completes. This is safe
because:
- The tempDir cleanup is deferred until after helm operations complete
  in withPreparedCharts(), so charts won't be deleted mid-use
- The in-memory chart cache prevents redundant downloads within a process
- Cross-process coordination via file locks still works during download

Changes:
- Remove chartLock field from chartPrepareResult struct
- Release locks immediately in getOCIChart() and forcedDownloadChart()
- Simplify PrepareCharts() by removing lock collection and release logic
- Update function signatures to return only (path, error)

This also fixes the "signal: killed" test failures in CI for:
- oci_chart_pull_direct
- oci_chart_pull_once
- oci_chart_pull_once2

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: add double-check locking for in-memory chart cache

When multiple workers concurrently process releases using the same chart,
they all check the in-memory cache before acquiring locks. If none have
populated the cache yet, all workers miss and try to download.

Previously, even after acquiring the exclusive lock, the code would
re-download the chart when needsRefresh=true (the default). This caused
multiple "Pulling" messages in tests like oci_chart_pull_once.

The fix adds a second in-memory cache check AFTER acquiring the lock.
This implements proper double-check locking:

1. Check cache (outside lock) → miss
2. Acquire lock
3. Check cache again (inside lock) → hit if another worker populated it
4. If still miss, download and add to cache

This ensures only one worker downloads the chart, while others use
the cached version populated by the first worker.

Changes:
- Add in-memory cache double-check in getOCIChart() after acquiring lock
- Add in-memory cache double-check in forcedDownloadChart() after acquiring lock

This fixes the oci_chart_pull_once and oci_chart_pull_direct test failures
where charts were being pulled multiple times instead of once.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: use callback to prevent redundant chart downloads within a process

When multiple workers concurrently process releases using the same chart,
they need to coordinate to avoid redundant downloads. The previous fix
set SkipRefresh=true for OCI charts, which prevented legitimate refresh
scenarios (e.g., floating tags).

This commit implements a better solution using a callback mechanism:

1. acquireChartLock() now accepts an optional skipRefreshCheck callback
2. Before deleting a cached chart for refresh, the callback is invoked
3. If the callback returns true (in-memory cache has the chart), skip refresh
4. This allows deduplication within a process while respecting cross-run refresh

The flow is now:
- Worker 1 downloads chart, adds to in-memory cache, releases lock
- Worker 2 acquires lock, sees needsRefresh=true, but callback sees
  in-memory cache is populated → uses cached instead of deleting

This correctly handles:
- Within-process deduplication: only one download per chart
- Cross-run refresh: respects --skip-refresh flag for floating tags
- Immutable versions: cached and reused as expected

Changes:
- Add skipRefreshCheck callback parameter to acquireChartLock()
- Update getOCIChart() to pass in-memory cache check callback
- Update forcedDownloadChart() to pass in-memory cache check callback
- Remove SkipRefresh=true workaround for OCI charts

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: address Copilot review comments on PR #2298

This commit addresses the automated review comments from GitHub Copilot:

1. pkg/state/state.go: Add nil check for logger in Release() method
   to prevent potential nil pointer dereference when logger is nil.

2. pkg/state/state.go: Fix misleading comment about "external callers"
   to accurately reflect that Logger() is used by the app package.

3. pkg/state/issue_2296_test.go: Add comment noting that boolPtr helper
   is already defined in skip_test.go (shared across test files).

4. test/integration/test-cases/oci-parallel-pull.sh: Replace hardcoded
   /tmp paths with a dedicated temp directory for test outputs. Add
   cleanup for the output directory in the cleanup function.

5. test/integration/test-cases/issue-2297-local-chart-transformers.sh:
   Add cleanup trap to remove temp directory on exit, preventing
   leftover files from accumulating.

6. Remove dead code: The chartLocks map in PrepareCharts was always
   empty since locks are released immediately after download. Removed
   the unused return value and corresponding handling in run.go to
   improve code clarity and maintainability.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: make oci-parallel-pull test resilient to registry issues

The integration test was intermittently failing in CI due to Docker Hub
rate limiting or network issues. These failures are not helmfile bugs.

Changes:
- Add is_registry_error() function to detect external registry issues
  (rate limits, network timeouts, connection refused, etc.)
- Check for the race condition bug (issue #2295) first and fail fast
- If other failures occur, check if they're registry-related
- Skip test gracefully when registry issues are detected instead of
  failing CI on external infrastructure problems

This ensures the test still catches the actual race condition bug while
not causing false failures due to Docker Hub rate limits in CI.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: make oci-parallel-pull test resilient to registry issues

The integration test was failing in CI for two reasons:

1. Docker Hub rate limiting or network issues causing helmfile to fail
2. The test script exits early due to `set -e` when `wait` returns non-zero

Changes:
- Use `wait $pid || exit=$?` pattern to capture exit codes without triggering
  set -e. When wait returns non-zero, the || branch captures the exit code
  into the variable, preventing script termination.
- Add is_registry_error() function to detect external registry issues
  (rate limits, network timeouts, connection refused, etc.)
- Check for the race condition bug (issue #2295) first and fail fast
- Skip test gracefully when registry issues are detected instead of
  failing CI on external infrastructure problems

This ensures the test still catches the actual race condition bug while
not causing false failures due to Docker Hub rate limits in CI.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: address PR #2298 review - reinitialize fileLock after release

Address Copilot review comments:

1. pkg/state/state.go: Reinitialize fileLock after releasing shared lock
   When upgrading from shared to exclusive lock, the fileLock needs to be
   reinitialized with flock.New() after calling Release(). This ensures
   a fresh flock object is used for the exclusive lock acquisition.

2. test/integration/test-cases/oci-parallel-pull.sh: Add lock file
   verification warning if no lock files are found, to ensure the
   locking mechanism is actually being tested.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: address PR #2298 Copilot review comments (round 4)

Address 8 Copilot review comments:

1. pkg/state/state.go: Release in-process mutex during retry backoff
   to avoid blocking other goroutines for up to 90 seconds.

2. pkg/state/state.go: Include chartPath in shared lock error message
   for better debugging.

3. pkg/state/state.go: Document that extractRegistryHost does not handle
   URLs with query parameters or fragments (uncommon for OCI registries).

4. pkg/state/state.go: Document that skipRefreshCheck callback should be
   fast and non-blocking since it runs while holding exclusive lock.

5. oci-parallel-pull.sh: Use case-insensitive grep (-i flag) to catch
   error variations like "I/O timeout".

6. helmfile.yaml: Expand comment explaining why library charts can't be
   used for this test (they can't be templated by Helm).

Skipped (with justification):
- PrepareChartKey helper: Only 2 usages with different source structs
- Context reuse in retry: Per-attempt contexts provide clearer semantics

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: address PR #2298 Copilot review comments (round 5)

1. Make race condition detection grep more robust (oci-parallel-pull.sh)
   - Use case-insensitive extended regex (-iqE)
   - Add multiple pattern variations to catch different tar/helm versions

2. Remove unused Logger() method from HelmState (state.go)
   - Method was never called; all lock releases use st.logger directly

3. Add clarifying comments for lock retry behavior (state.go)
   - Document why file system errors are retried but timeouts are not
   - Explain flock returns (false, nil) on context deadline exceeded

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: clarify lock file check is informational only

Lock files are ephemeral and may be cleaned up immediately after
helmfile processes complete. Update comments and warning message
to make clear their absence doesn't indicate locking wasn't used.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: add HELM_BIN env var to Dockerfiles

The helm-git plugin requires HELM_BIN environment variable to be set.
Without it, the plugin fails with "HELM_BIN: parameter not set".

Add HELM_BIN=/usr/local/bin/helm to all Dockerfile variants.

Fixes #2303

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

---------

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
This commit is contained in:
Aditya Menon 2025-11-27 15:13:03 +01:00 committed by GitHub
parent 6c93fc7e4d
commit 9c70adc038
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
19 changed files with 1570 additions and 78 deletions

View File

@ -32,6 +32,7 @@ ENV HELM_DATA_HOME="${HELM_DATA_HOME}"
ARG HELM_VERSION="v4.0.0"
ENV HELM_VERSION="${HELM_VERSION}"
ENV HELM_BIN="/usr/local/bin/helm"
ARG HELM_LOCATION="https://get.helm.sh"
ARG HELM_FILENAME="helm-${HELM_VERSION}-${TARGETOS}-${TARGETARCH}.tar.gz"
RUN set -x && \

View File

@ -40,6 +40,7 @@ ENV HELM_DATA_HOME="${HELM_DATA_HOME}"
ARG HELM_VERSION="v4.0.0"
ENV HELM_VERSION="${HELM_VERSION}"
ENV HELM_BIN="/usr/local/bin/helm"
ARG HELM_LOCATION="https://get.helm.sh"
ARG HELM_FILENAME="helm-${HELM_VERSION}-${TARGETOS}-${TARGETARCH}.tar.gz"
RUN set -x && \

View File

@ -40,6 +40,7 @@ ENV HELM_DATA_HOME="${HELM_DATA_HOME}"
ARG HELM_VERSION="v4.0.0"
ENV HELM_VERSION="${HELM_VERSION}"
ENV HELM_BIN="/usr/local/bin/helm"
ARG HELM_LOCATION="https://get.helm.sh"
ARG HELM_FILENAME="helm-${HELM_VERSION}-${TARGETOS}-${TARGETARCH}.tar.gz"
RUN set -x && \

2
go.mod
View File

@ -10,6 +10,7 @@ require (
github.com/aws/aws-sdk-go-v2/service/s3 v1.92.1
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc
github.com/go-test/deep v1.1.1
github.com/gofrs/flock v0.13.0
github.com/golang/mock v1.6.0
github.com/google/go-cmp v0.7.0
github.com/gosuri/uitable v0.0.4
@ -228,7 +229,6 @@ require (
github.com/gobwas/glob v0.2.3 // indirect
github.com/goccy/go-yaml v1.17.1 // indirect
github.com/godbus/dbus/v5 v5.1.0 // indirect
github.com/gofrs/flock v0.13.0 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang-jwt/jwt/v5 v5.3.0 // indirect
github.com/google/btree v1.1.3 // indirect

View File

@ -62,7 +62,13 @@ func (r *Run) withPreparedCharts(helmfileCommand string, opts state.ChartPrepare
panic("Run.PrepareCharts can be called only once")
}
if !opts.SkipRepos {
// Check both CLI options and helmDefaults for skipping repos (issue #2296)
// Both skipDeps and skipRefresh cause repo sync to be skipped because:
// - skipRefresh explicitly means "don't update repos"
// - skipDeps implies "I have all dependencies locally" which means repo data isn't needed
// This matches the CLI behavior where --skip-deps and --skip-refresh both skip repo operations.
skipRepos := opts.SkipRepos || r.state.HelmDefaults.SkipDeps || r.state.HelmDefaults.SkipRefresh
if !skipRepos {
ctx := r.ctx
if err := ctx.SyncReposOnce(r.state, r.helm); err != nil {
return err
@ -120,7 +126,10 @@ func (r *Run) withPreparedCharts(helmfileCommand string, opts state.ChartPrepare
}
func (r *Run) Deps(c DepsConfigProvider) []error {
if !c.SkipRepos() {
// Check both CLI options and helmDefaults for skipping repos (issue #2296)
// Both skipDeps and skipRefresh cause repo sync to be skipped (see withPreparedCharts for rationale)
skipRepos := c.SkipRepos() || r.state.HelmDefaults.SkipDeps || r.state.HelmDefaults.SkipRefresh
if !skipRepos {
if err := r.ctx.SyncReposOnce(r.state, r.helm); err != nil {
return []error{err}
}

View File

@ -33,6 +33,7 @@ type DiffKey struct {
type Helm struct {
Charts []string
Repo []string
RegistryLoginHost string // Captures the host passed to RegistryLogin
Releases []Release
Deleted []Release
Linted []Release
@ -109,6 +110,7 @@ func (helm *Helm) UpdateRepo() error {
return nil
}
func (helm *Helm) RegistryLogin(name, username, password, caFile, certFile, keyFile string, skipTLSVerify bool) error {
helm.RegistryLoginHost = name
return nil
}
func (helm *Helm) SyncRelease(context helmexec.HelmContext, name, chart, namespace string, flags ...string) error {

View File

@ -0,0 +1,196 @@
package state
import (
"os"
"path/filepath"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// Note: boolPtr helper is already defined in skip_test.go and shared across test files in this package.
// TestHelmDefaultsSkipDepsAndSkipRefreshIntegration tests that helmDefaults.skipDeps
// and helmDefaults.skipRefresh are properly respected when preparing charts.
// This is a regression test for issue #2296.
//
// This test verifies the skipDeps/skipRefresh calculation in state.go that affects:
// 1. buildDeps - whether to run `helm dep build` for local charts
// 2. skipRefresh - whether to skip chart refresh operations
//
// Note: The skipRepos logic in pkg/app/run.go uses both helmDefaults.skipDeps and
// helmDefaults.skipRefresh to determine whether to sync repos. Both flags cause
// repo sync to be skipped because:
// - skipRefresh explicitly means "don't update repos"
// - skipDeps implies "I have all dependencies locally" which means repo data isn't needed
// The skipRepos behavior is documented in run.go comments and tested via integration tests.
func TestHelmDefaultsSkipDepsAndSkipRefreshIntegration(t *testing.T) {
// Create a temporary directory with a local chart
tempDir := t.TempDir()
chartDir := filepath.Join(tempDir, "chart")
require.NoError(t, os.MkdirAll(chartDir, 0755))
require.NoError(t, os.WriteFile(filepath.Join(chartDir, "Chart.yaml"), []byte(`
apiVersion: v2
name: test-chart
version: 0.1.0
`), 0644))
tests := []struct {
name string
helmDefaultsSkipDeps bool
helmDefaultsSkipRefresh bool
releaseSkipDeps *bool
releaseSkipRefresh *bool
optsSkipDeps bool
optsSkipRefresh bool
chartPath string
isLocal bool
expectedBuildDeps bool // buildDeps = !skipDeps for local charts
expectedSkipRefresh bool
}{
{
name: "helmDefaults.skipDeps=true should result in buildDeps=false",
helmDefaultsSkipDeps: true,
helmDefaultsSkipRefresh: false,
chartPath: "./chart",
isLocal: true,
expectedBuildDeps: false, // skipDeps=true means buildDeps=false
expectedSkipRefresh: false,
},
{
name: "helmDefaults.skipRefresh=true should result in skipRefresh=true",
helmDefaultsSkipDeps: false,
helmDefaultsSkipRefresh: true,
chartPath: "./chart",
isLocal: true,
expectedBuildDeps: true, // skipDeps=false means buildDeps=true
expectedSkipRefresh: true,
},
{
name: "both helmDefaults set should affect both flags",
helmDefaultsSkipDeps: true,
helmDefaultsSkipRefresh: true,
chartPath: "./chart",
isLocal: true,
expectedBuildDeps: false,
expectedSkipRefresh: true,
},
{
name: "release-level skipDeps overrides helmDefaults",
helmDefaultsSkipDeps: false,
helmDefaultsSkipRefresh: false,
releaseSkipDeps: boolPtr(true),
chartPath: "./chart",
isLocal: true,
expectedBuildDeps: false, // release-level skipDeps=true
expectedSkipRefresh: false,
},
{
name: "CLI opts skipDeps has priority",
helmDefaultsSkipDeps: false,
helmDefaultsSkipRefresh: false,
optsSkipDeps: true,
chartPath: "./chart",
isLocal: true,
expectedBuildDeps: false, // opts skipDeps=true
expectedSkipRefresh: false,
},
{
name: "non-local chart always has buildDeps=false",
helmDefaultsSkipDeps: false,
helmDefaultsSkipRefresh: false,
chartPath: "stable/nginx",
isLocal: false,
expectedBuildDeps: false, // non-local charts don't build deps
expectedSkipRefresh: true, // non-local charts skip refresh
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Calculate skipDeps using the actual logic from state.go
skipDepsGlobal := tt.optsSkipDeps
skipDepsRelease := tt.releaseSkipDeps != nil && *tt.releaseSkipDeps
skipDepsDefault := tt.releaseSkipDeps == nil && tt.helmDefaultsSkipDeps
chartFetchedByGoGetter := false
skipDeps := (!tt.isLocal && !chartFetchedByGoGetter) || skipDepsGlobal || skipDepsRelease || skipDepsDefault
// Calculate skipRefresh using the actual logic from state.go
skipRefreshGlobal := tt.optsSkipRefresh
skipRefreshRelease := tt.releaseSkipRefresh != nil && *tt.releaseSkipRefresh
skipRefreshDefault := tt.releaseSkipRefresh == nil && tt.helmDefaultsSkipRefresh
skipRefresh := !tt.isLocal || skipRefreshGlobal || skipRefreshRelease || skipRefreshDefault
// buildDeps = !skipDeps (for local charts processed normally)
buildDeps := !skipDeps
assert.Equal(t, tt.expectedBuildDeps, buildDeps,
"buildDeps mismatch: expected %v, got %v (skipDeps=%v)", tt.expectedBuildDeps, buildDeps, skipDeps)
assert.Equal(t, tt.expectedSkipRefresh, skipRefresh,
"skipRefresh mismatch: expected %v, got %v", tt.expectedSkipRefresh, skipRefresh)
})
}
}
// TestSkipReposLogic tests the skipRepos calculation used in pkg/app/run.go.
// This documents and verifies the expected behavior: both helmDefaults.skipDeps
// and helmDefaults.skipRefresh should cause repo sync to be skipped.
//
// The actual skipRepos logic is in run.go:
//
// skipRepos := opts.SkipRepos || r.state.HelmDefaults.SkipDeps || r.state.HelmDefaults.SkipRefresh
func TestSkipReposLogic(t *testing.T) {
tests := []struct {
name string
optsSkipRepos bool
skipDeps bool
skipRefresh bool
expectedSkipRepo bool
}{
{
name: "all false - repos should sync",
optsSkipRepos: false,
skipDeps: false,
skipRefresh: false,
expectedSkipRepo: false,
},
{
name: "opts.SkipRepos=true - repos should be skipped",
optsSkipRepos: true,
skipDeps: false,
skipRefresh: false,
expectedSkipRepo: true,
},
{
name: "helmDefaults.skipDeps=true - repos should be skipped",
optsSkipRepos: false,
skipDeps: true,
skipRefresh: false,
expectedSkipRepo: true,
},
{
name: "helmDefaults.skipRefresh=true - repos should be skipped",
optsSkipRepos: false,
skipDeps: false,
skipRefresh: true,
expectedSkipRepo: true,
},
{
name: "both helmDefaults set - repos should be skipped",
optsSkipRepos: false,
skipDeps: true,
skipRefresh: true,
expectedSkipRepo: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// This mirrors the logic in pkg/app/run.go withPreparedCharts and Deps
skipRepos := tt.optsSkipRepos || tt.skipDeps || tt.skipRefresh
assert.Equal(t, tt.expectedSkipRepo, skipRepos,
"skipRepos mismatch: expected %v, got %v", tt.expectedSkipRepo, skipRepos)
})
}
}

View File

@ -0,0 +1,164 @@
package state
import (
"os"
"path/filepath"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/helmfile/helmfile/pkg/filesystem"
)
// TestLocalChartWithTransformersPathNormalization tests that relative chart paths
// like "../chart" are normalized to absolute paths when transformers are present.
// This is a regression test for issue #2297.
//
// Background: When using local charts with transformers in helmfile.d/, the chartify
// process would receive the unnormalized relative path and try to run "helm pull ../chart"
// which fails because "../chart" is not a valid repo reference.
func TestLocalChartWithTransformersPathNormalization(t *testing.T) {
// Create a temporary directory structure that mimics the issue:
// tempDir/
// helmfile.d/
// with-transformers.yaml (chart: ../chart)
// chart/
// Chart.yaml
tempDir := t.TempDir()
// Create the chart directory
chartDir := filepath.Join(tempDir, "chart")
require.NoError(t, os.MkdirAll(chartDir, 0755))
require.NoError(t, os.WriteFile(filepath.Join(chartDir, "Chart.yaml"), []byte(`
apiVersion: v2
name: test-chart
version: 0.1.0
`), 0644))
// Create the helmfile.d directory (this is the basePath)
helmfileDir := filepath.Join(tempDir, "helmfile.d")
require.NoError(t, os.MkdirAll(helmfileDir, 0755))
tests := []struct {
name string
chartPath string // Relative chart path as specified in helmfile.yaml
basePath string // basePath is helmfile.d/
hasTransformer bool
expectAbsPath bool // Should the chart path be normalized to absolute?
}{
{
name: "local chart with transformers should be normalized",
chartPath: "../chart",
basePath: helmfileDir,
hasTransformer: true,
expectAbsPath: true,
},
{
name: "local chart without transformers should also work",
chartPath: "../chart",
basePath: helmfileDir,
hasTransformer: false,
expectAbsPath: true,
},
{
name: "absolute local chart path should remain unchanged",
chartPath: chartDir, // Already absolute
basePath: helmfileDir,
hasTransformer: true,
expectAbsPath: true, // Already absolute
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Create a HelmState with the appropriate settings
state := &HelmState{
basePath: tt.basePath,
fs: &filesystem.FileSystem{
DirectoryExistsAt: func(path string) bool {
// Normalize the path and check if it matches our chart directory
absPath := path
if !filepath.IsAbs(path) {
absPath = filepath.Join(tt.basePath, path)
}
return absPath == chartDir || path == chartDir
},
FileExistsAt: func(path string) bool {
// Check for Chart.yaml
if filepath.Base(path) == "Chart.yaml" {
dir := filepath.Dir(path)
if !filepath.IsAbs(dir) {
dir = filepath.Join(tt.basePath, dir)
}
return dir == chartDir
}
return false
},
},
logger: logger,
}
// Test the normalizeChart function directly
normalizedPath := normalizeChart(tt.basePath, tt.chartPath)
if tt.expectAbsPath {
// The normalized path should be absolute and point to the chart directory
assert.True(t, filepath.IsAbs(normalizedPath) || normalizedPath == chartDir,
"Expected absolute path, got: %s", normalizedPath)
// Verify the normalized path actually exists
if filepath.IsAbs(normalizedPath) {
assert.True(t, state.fs.DirectoryExistsAt(normalizedPath),
"Normalized path should exist: %s", normalizedPath)
}
}
// Additional test: verify isLocalChart detection
isLocal := state.fs.DirectoryExistsAt(normalizeChart(tt.basePath, tt.chartPath))
assert.True(t, isLocal, "Chart should be detected as local")
})
}
}
// TestChartPathNormalizationBeforeChartification specifically tests that
// when chartification is needed (transformers present), local chart paths
// are normalized to absolute paths before being passed to processChartification.
func TestChartPathNormalizationBeforeChartification(t *testing.T) {
tempDir := t.TempDir()
// Create directory structure:
// tempDir/
// helmfile.d/
// chart/
// Chart.yaml
helmfileDir := filepath.Join(tempDir, "helmfile.d")
chartDir := filepath.Join(tempDir, "chart")
require.NoError(t, os.MkdirAll(helmfileDir, 0755))
require.NoError(t, os.MkdirAll(chartDir, 0755))
require.NoError(t, os.WriteFile(filepath.Join(chartDir, "Chart.yaml"), []byte(`
apiVersion: v2
name: test-chart
version: 0.1.0
`), 0644))
basePath := helmfileDir
relativeChartPath := "../chart"
// Normalize the path as the fix should do
normalizedChartPath := normalizeChart(basePath, relativeChartPath)
// The normalized path should be absolute
assert.True(t, filepath.IsAbs(normalizedChartPath),
"Normalized chart path should be absolute, got: %s", normalizedChartPath)
// The normalized path should point to the actual chart directory
expectedPath := chartDir
assert.Equal(t, expectedPath, normalizedChartPath,
"Normalized path should equal chart directory")
// Verify the chart exists at the normalized path
_, err := os.Stat(filepath.Join(normalizedChartPath, "Chart.yaml"))
assert.NoError(t, err, "Chart.yaml should exist at normalized path")
}

View File

@ -0,0 +1,487 @@
package state
import (
"context"
"os"
"path/filepath"
"sync"
"sync/atomic"
"testing"
"time"
"github.com/gofrs/flock"
"github.com/stretchr/testify/require"
)
// TestOCIChartFileLock tests that the file locking mechanism works correctly
// to prevent race conditions when multiple processes/goroutines try to access
// the same chart cache path concurrently.
func TestOCIChartFileLock(t *testing.T) {
t.Run("concurrent lock acquisition is serialized", func(t *testing.T) {
// Create a temporary directory for the test
tempDir, err := os.MkdirTemp("", "helmfile-lock-test-*")
require.NoError(t, err)
defer os.RemoveAll(tempDir)
lockFilePath := filepath.Join(tempDir, "test-chart.lock")
// Track the order of lock acquisition
var lockOrder []int
var mu sync.Mutex
var wg sync.WaitGroup
// Number of concurrent goroutines trying to acquire the lock
numGoroutines := 5
for i := 0; i < numGoroutines; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
fileLock := flock.New(lockFilePath)
err := fileLock.Lock()
require.NoError(t, err)
// Record the order this goroutine acquired the lock
mu.Lock()
lockOrder = append(lockOrder, id)
mu.Unlock()
// Simulate some work while holding the lock
time.Sleep(10 * time.Millisecond)
err = fileLock.Unlock()
require.NoError(t, err)
}(i)
}
wg.Wait()
// Verify all goroutines acquired the lock exactly once
require.Len(t, lockOrder, numGoroutines, "all goroutines should have acquired the lock")
})
t.Run("lock prevents concurrent writes", func(t *testing.T) {
tempDir, err := os.MkdirTemp("", "helmfile-lock-test-*")
require.NoError(t, err)
defer os.RemoveAll(tempDir)
lockFilePath := filepath.Join(tempDir, "test-chart.lock")
dataFilePath := filepath.Join(tempDir, "data.txt")
var wg sync.WaitGroup
var writeCount atomic.Int32
// Multiple goroutines try to write to the same file
numGoroutines := 10
for i := 0; i < numGoroutines; i++ {
wg.Add(1)
go func() {
defer wg.Done()
fileLock := flock.New(lockFilePath)
err := fileLock.Lock()
require.NoError(t, err)
defer fileLock.Unlock()
// Check if file exists (like double-check locking pattern)
if _, err := os.Stat(dataFilePath); os.IsNotExist(err) {
// Only first goroutine to acquire lock should write
err = os.WriteFile(dataFilePath, []byte("written"), 0644)
require.NoError(t, err)
writeCount.Add(1)
}
}()
}
wg.Wait()
// Only one goroutine should have written the file
require.Equal(t, int32(1), writeCount.Load(), "only one goroutine should write when using double-check locking")
// Verify file exists with correct content
content, err := os.ReadFile(dataFilePath)
require.NoError(t, err)
require.Equal(t, "written", string(content))
})
t.Run("lock file is created in correct directory", func(t *testing.T) {
tempDir, err := os.MkdirTemp("", "helmfile-lock-test-*")
require.NoError(t, err)
defer os.RemoveAll(tempDir)
// Simulate nested chart path structure
chartPath := filepath.Join(tempDir, "registry", "charts", "myapp", "1.0.0")
lockFilePath := chartPath + ".lock"
// Create parent directories (like in getOCIChart)
lockFileDir := filepath.Dir(lockFilePath)
err = os.MkdirAll(lockFileDir, 0755)
require.NoError(t, err)
// Acquire and release lock
fileLock := flock.New(lockFilePath)
err = fileLock.Lock()
require.NoError(t, err)
// Verify lock file was created
_, err = os.Stat(lockFilePath)
require.NoError(t, err, "lock file should be created")
err = fileLock.Unlock()
require.NoError(t, err)
})
t.Run("TryLockContext respects timeout", func(t *testing.T) {
tempDir, err := os.MkdirTemp("", "helmfile-lock-test-*")
require.NoError(t, err)
defer os.RemoveAll(tempDir)
lockFilePath := filepath.Join(tempDir, "test-chart.lock")
// First goroutine acquires lock and holds it
fileLock1 := flock.New(lockFilePath)
err = fileLock1.Lock()
require.NoError(t, err)
// Second goroutine tries to acquire with short timeout
done := make(chan bool)
go func() {
fileLock2 := flock.New(lockFilePath)
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
locked, err := fileLock2.TryLockContext(ctx, 10*time.Millisecond)
// TryLockContext returns error when context times out
// Either locked is false, or we get a context deadline exceeded error
if err != nil {
require.ErrorIs(t, err, context.DeadlineExceeded, "should timeout with deadline exceeded")
} else {
require.False(t, locked, "should not acquire lock within timeout")
}
done <- true
}()
select {
case <-done:
// Success - timeout worked
case <-time.After(2 * time.Second):
t.Fatal("timeout test took too long")
}
// Release the lock
err = fileLock1.Unlock()
require.NoError(t, err)
})
}
// TestOCIChartSharedExclusiveLocks tests the reader-writer lock pattern
// where shared locks allow concurrent reads and exclusive locks are used for writes.
func TestOCIChartSharedExclusiveLocks(t *testing.T) {
t.Run("multiple shared locks can be acquired simultaneously", func(t *testing.T) {
tempDir, err := os.MkdirTemp("", "helmfile-shared-lock-test-*")
require.NoError(t, err)
defer os.RemoveAll(tempDir)
lockFilePath := filepath.Join(tempDir, "test-chart.lock")
// Number of concurrent readers
numReaders := 5
var wg sync.WaitGroup
var activeReaders atomic.Int32
var maxConcurrentReaders atomic.Int32
for i := 0; i < numReaders; i++ {
wg.Add(1)
go func() {
defer wg.Done()
fileLock := flock.New(lockFilePath)
// Acquire shared (read) lock
err := fileLock.RLock()
require.NoError(t, err)
// Track how many readers are active simultaneously
current := activeReaders.Add(1)
// Update max concurrent readers
for {
max := maxConcurrentReaders.Load()
if current <= max {
break
}
if maxConcurrentReaders.CompareAndSwap(max, current) {
break
}
}
// Simulate reading while holding shared lock
time.Sleep(50 * time.Millisecond)
activeReaders.Add(-1)
err = fileLock.Unlock()
require.NoError(t, err)
}()
}
wg.Wait()
// Multiple readers should have been active at the same time
require.Greater(t, maxConcurrentReaders.Load(), int32(1),
"multiple shared locks should be held concurrently")
})
t.Run("exclusive lock blocks shared locks", func(t *testing.T) {
tempDir, err := os.MkdirTemp("", "helmfile-excl-block-test-*")
require.NoError(t, err)
defer os.RemoveAll(tempDir)
lockFilePath := filepath.Join(tempDir, "test-chart.lock")
// First, acquire exclusive lock
writerLock := flock.New(lockFilePath)
err = writerLock.Lock()
require.NoError(t, err)
// Try to acquire shared lock with timeout - should fail
readerResult := make(chan bool)
go func() {
readerLock := flock.New(lockFilePath)
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
locked, err := readerLock.TryRLockContext(ctx, 10*time.Millisecond)
if err != nil {
readerResult <- false
return
}
if locked {
readerLock.Unlock()
}
readerResult <- locked
}()
select {
case acquired := <-readerResult:
require.False(t, acquired, "shared lock should not be acquired while exclusive lock is held")
case <-time.After(2 * time.Second):
t.Fatal("test timed out")
}
// Release exclusive lock
err = writerLock.Unlock()
require.NoError(t, err)
})
t.Run("shared locks block exclusive lock", func(t *testing.T) {
tempDir, err := os.MkdirTemp("", "helmfile-shared-block-test-*")
require.NoError(t, err)
defer os.RemoveAll(tempDir)
lockFilePath := filepath.Join(tempDir, "test-chart.lock")
// First, acquire shared lock
readerLock := flock.New(lockFilePath)
err = readerLock.RLock()
require.NoError(t, err)
// Try to acquire exclusive lock with timeout - should fail
writerResult := make(chan bool)
go func() {
writerLock := flock.New(lockFilePath)
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
locked, err := writerLock.TryLockContext(ctx, 10*time.Millisecond)
if err != nil {
writerResult <- false
return
}
if locked {
writerLock.Unlock()
}
writerResult <- locked
}()
select {
case acquired := <-writerResult:
require.False(t, acquired, "exclusive lock should not be acquired while shared lock is held")
case <-time.After(2 * time.Second):
t.Fatal("test timed out")
}
// Release shared lock
err = readerLock.Unlock()
require.NoError(t, err)
})
t.Run("exclusive lock acquired after shared lock released", func(t *testing.T) {
tempDir, err := os.MkdirTemp("", "helmfile-lock-release-test-*")
require.NoError(t, err)
defer os.RemoveAll(tempDir)
lockFilePath := filepath.Join(tempDir, "test-chart.lock")
// Acquire shared lock
readerLock := flock.New(lockFilePath)
err = readerLock.RLock()
require.NoError(t, err)
// Start writer waiting for lock
writerDone := make(chan bool)
go func() {
writerLock := flock.New(lockFilePath)
// Use longer timeout
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
locked, err := writerLock.TryLockContext(ctx, 10*time.Millisecond)
if err == nil && locked {
writerLock.Unlock()
writerDone <- true
return
}
writerDone <- false
}()
// Release shared lock after a short delay
time.Sleep(50 * time.Millisecond)
err = readerLock.Unlock()
require.NoError(t, err)
// Writer should now succeed
select {
case success := <-writerDone:
require.True(t, success, "writer should acquire lock after reader releases")
case <-time.After(3 * time.Second):
t.Fatal("test timed out waiting for writer")
}
})
}
// TestChartLockResultRelease tests the Release method of chartLockResult
// handles both shared and exclusive locks correctly.
func TestChartLockResultRelease(t *testing.T) {
t.Run("release exclusive lock", func(t *testing.T) {
tempDir, err := os.MkdirTemp("", "helmfile-release-excl-*")
require.NoError(t, err)
defer os.RemoveAll(tempDir)
lockFilePath := filepath.Join(tempDir, "test.lock")
fileLock := flock.New(lockFilePath)
err = fileLock.Lock()
require.NoError(t, err)
rwMutex := &sync.RWMutex{}
rwMutex.Lock()
result := &chartLockResult{
fileLock: fileLock,
inProcessMutex: rwMutex,
isExclusive: true,
chartPath: lockFilePath,
}
// Release should not panic
result.Release(nil)
// Verify lock is released - should be able to acquire again
fileLock2 := flock.New(lockFilePath)
locked, err := fileLock2.TryLock()
require.NoError(t, err)
require.True(t, locked, "should be able to acquire lock after release")
fileLock2.Unlock()
})
t.Run("release shared lock", func(t *testing.T) {
tempDir, err := os.MkdirTemp("", "helmfile-release-shared-*")
require.NoError(t, err)
defer os.RemoveAll(tempDir)
lockFilePath := filepath.Join(tempDir, "test.lock")
fileLock := flock.New(lockFilePath)
err = fileLock.RLock()
require.NoError(t, err)
rwMutex := &sync.RWMutex{}
rwMutex.RLock()
result := &chartLockResult{
fileLock: fileLock,
inProcessMutex: rwMutex,
isExclusive: false,
chartPath: lockFilePath,
}
// Release should not panic
result.Release(nil)
// Verify lock is released - should be able to acquire exclusive lock
fileLock2 := flock.New(lockFilePath)
locked, err := fileLock2.TryLock()
require.NoError(t, err)
require.True(t, locked, "should be able to acquire exclusive lock after shared release")
fileLock2.Unlock()
})
t.Run("release nil result is safe", func(t *testing.T) {
var result *chartLockResult
// Should not panic
result.Release(nil)
})
}
// TestOCIChartDoubleCheckLocking verifies the double-check locking pattern
// works correctly to avoid unnecessary work when the cache is populated
// by another process while waiting for the lock.
func TestOCIChartDoubleCheckLocking(t *testing.T) {
t.Run("second waiter uses cache populated by first", func(t *testing.T) {
tempDir, err := os.MkdirTemp("", "helmfile-double-check-*")
require.NoError(t, err)
defer os.RemoveAll(tempDir)
chartPath := filepath.Join(tempDir, "myrepo", "mychart", "1.0.0")
lockFilePath := chartPath + ".lock"
// Create lock directory
err = os.MkdirAll(filepath.Dir(lockFilePath), 0755)
require.NoError(t, err)
var pullCount atomic.Int32
var wg sync.WaitGroup
// Simulate two processes trying to download the same chart
for i := 0; i < 2; i++ {
wg.Add(1)
go func() {
defer wg.Done()
fileLock := flock.New(lockFilePath)
err := fileLock.Lock()
require.NoError(t, err)
defer fileLock.Unlock()
// Double-check: after acquiring lock, check if directory exists
if _, err := os.Stat(chartPath); os.IsNotExist(err) {
// Simulate chart pull
time.Sleep(50 * time.Millisecond) // Simulate download time
err = os.MkdirAll(chartPath, 0755)
require.NoError(t, err)
err = os.WriteFile(filepath.Join(chartPath, "Chart.yaml"), []byte("name: mychart"), 0644)
require.NoError(t, err)
pullCount.Add(1)
}
// If directory exists, skip pull (use cached)
}()
}
wg.Wait()
// Only one process should have actually pulled the chart
require.Equal(t, int32(1), pullCount.Load(), "only one process should pull the chart")
// Chart should exist
_, err = os.Stat(filepath.Join(chartPath, "Chart.yaml"))
require.NoError(t, err, "chart should exist in cache")
})
}

View File

@ -2,6 +2,7 @@ package state
import (
"bytes"
"context"
"crypto/sha1"
"encoding/hex"
"errors"
@ -21,6 +22,7 @@ import (
"dario.cat/mergo"
"github.com/Masterminds/semver/v3"
"github.com/gofrs/flock"
"github.com/helmfile/chartify"
"github.com/helmfile/vals"
"github.com/tatsushid/go-prettytable"
@ -608,7 +610,10 @@ func (st *HelmState) SyncRepos(helm RepoUpdater, shouldSkip map[string]bool) ([]
username, password := gatherUsernamePassword(repo.Name, repo.Username, repo.Password)
var err error
if repo.OCI {
err = helm.RegistryLogin(repo.URL, username, password, repo.CaFile, repo.CertFile, repo.KeyFile, repo.SkipTLSVerify)
// For OCI registries, we only need the registry host for login, not the full URL with chart path.
// e.g., "123456789012.dkr.ecr.us-east-1.amazonaws.com/charts" -> "123456789012.dkr.ecr.us-east-1.amazonaws.com"
registryHost := extractRegistryHost(repo.URL)
err = helm.RegistryLogin(registryHost, username, password, repo.CaFile, repo.CertFile, repo.KeyFile, repo.SkipTLSVerify)
} else {
err = helm.AddRepo(repo.Name, repo.URL, repo.CaFile, repo.CertFile, repo.KeyFile, username, password, repo.Managed, repo.PassCredentials, repo.SkipTLSVerify)
}
@ -642,6 +647,29 @@ func gatherUsernamePassword(repoName string, username string, password string) (
return user, pass
}
// extractRegistryHost extracts just the registry host from an OCI repository URL.
// For "helm registry login", we only need the registry host, not the full path including chart location.
// Examples:
// - "123456789012.dkr.ecr.us-west-2.amazonaws.com/charts" -> "123456789012.dkr.ecr.us-west-2.amazonaws.com"
// - "ghcr.io/deliveryhero/helm-charts" -> "ghcr.io"
// - "registry.example.com:5000/charts" -> "registry.example.com:5000"
//
// Note: This function does not handle URLs with query parameters or fragments, as these
// are not typically used in OCI registry URLs.
func extractRegistryHost(url string) string {
// Remove any protocol prefix if present
url = strings.TrimPrefix(url, "oci://")
url = strings.TrimPrefix(url, "https://")
url = strings.TrimPrefix(url, "http://")
// Find the first slash that separates the host from the path
if idx := strings.Index(url, "/"); idx != -1 {
return url[:idx]
}
// No path, return as-is
return url
}
type syncResult struct {
errors []*ReleaseError
}
@ -1294,17 +1322,19 @@ func (st *HelmState) GetRepositoryAndNameFromChartName(chartName string) (*Repos
return nil, chartName
}
var mutexMap sync.Map
var rwMutexMap sync.Map
// retrieves or creates a sync.Mutex for a given name
func (st *HelmState) getNamedMutex(name string) *sync.Mutex {
mu, ok := mutexMap.Load(name)
// getNamedRWMutex retrieves or creates a sync.RWMutex for a given name.
// This is used for reader-writer locking where multiple readers can access
// the resource simultaneously, but writers need exclusive access.
func (st *HelmState) getNamedRWMutex(name string) *sync.RWMutex {
mu, ok := rwMutexMap.Load(name)
if ok {
return mu.(*sync.Mutex)
return mu.(*sync.RWMutex)
}
newMu := &sync.Mutex{}
actualMu, _ := mutexMap.LoadOrStore(name, newMu)
return actualMu.(*sync.Mutex)
newMu := &sync.RWMutex{}
actualMu, _ := rwMutexMap.LoadOrStore(name, newMu)
return actualMu.(*sync.RWMutex)
}
type PrepareChartKey struct {
@ -1427,9 +1457,13 @@ func (st *HelmState) processLocalChart(normalizedChart, dir string, release *Rel
return chartPath, nil
}
// forcedDownloadChart handles forced chart downloads
// forcedDownloadChart handles forced chart downloads.
// Locks are acquired during download and released immediately after.
func (st *HelmState) forcedDownloadChart(chartName, dir string, release *ReleaseSpec, helm helmexec.Interface, opts ChartPrepareOptions) (string, error) {
// Check global chart cache first for non-OCI charts
// If found, another worker in this process already downloaded the chart.
// We don't need to acquire a lock - the tempDir won't be deleted until
// after helm operations complete (cleanup is deferred in withPreparedCharts).
cacheKey := st.getChartCacheKey(release)
if cachedPath, exists := st.checkChartCache(cacheKey); exists && st.fs.DirectoryExistsAt(cachedPath) {
st.logger.Debugf("Chart %s:%s already downloaded, using cached version at %s", chartName, release.Version, cachedPath)
@ -1441,16 +1475,31 @@ func (st *HelmState) forcedDownloadChart(chartName, dir string, release *Release
return "", err
}
// only fetch chart if it is not already fetched
if _, err := os.Stat(chartPath); os.IsNotExist(err) {
var fetchFlags []string
fetchFlags = st.appendChartVersionFlags(fetchFlags, release)
fetchFlags = append(fetchFlags, "--untar", "--untardir", chartPath)
if err := helm.Fetch(chartName, fetchFlags...); err != nil {
return "", err
}
} else {
st.logger.Infof("\"%s\" has not been downloaded because the output directory \"%s\" already exists", chartName, chartPath)
// Acquire locks and determine action
// Pass a callback that checks if another worker in this process has already cached the chart.
// This prevents deleting and re-downloading a chart that was just downloaded by another worker.
lockResult, err := st.acquireChartLock(chartPath, opts, func() bool {
_, exists := st.checkChartCache(cacheKey)
return exists
})
if err != nil {
return "", err
}
// If cached on filesystem, release lock and return
if lockResult.action == chartActionUseCached {
st.addToChartCache(cacheKey, lockResult.cachedPath)
lockResult.Release(st.logger)
return lockResult.cachedPath, nil
}
// Download the chart
var fetchFlags []string
fetchFlags = st.appendChartVersionFlags(fetchFlags, release)
fetchFlags = append(fetchFlags, "--untar", "--untardir", chartPath)
if err := helm.Fetch(chartName, fetchFlags...); err != nil {
lockResult.Release(st.logger)
return "", err
}
// Set chartPath to be the path containing Chart.yaml, if found
@ -1462,6 +1511,10 @@ func (st *HelmState) forcedDownloadChart(chartName, dir string, release *Release
// Add to global chart cache
st.addToChartCache(cacheKey, chartPath)
// Release lock immediately after download - the tempDir cleanup is deferred
// until after helm operations complete, so the chart won't be deleted mid-use.
lockResult.Release(st.logger)
return chartPath, nil
}
@ -1525,6 +1578,12 @@ func (st *HelmState) prepareChartForRelease(release *ReleaseSpec, helm helmexec.
skipRefresh := !isLocal || skipRefreshGlobal || skipRefreshRelease || skipRefreshDefault
if chartification != nil && helmfileCommand != "pull" {
// Issue #2297: Normalize local chart paths before chartification
// When using transformers with local charts like "../chart", the chartify process
// needs an absolute path, otherwise it tries "helm pull ../chart" which fails
if isLocal {
chartPath = normalizeChart(st.basePath, chartPath)
}
chartPath, buildDeps, err = st.processChartification(chartification, release, chartPath, opts, skipDeps, helmfileCommand)
if err != nil {
return &chartPrepareResult{err: err}
@ -1566,6 +1625,12 @@ func (st *HelmState) prepareChartForRelease(release *ReleaseSpec, helm helmexec.
}
}
// PrepareCharts downloads and prepares all charts for the selected releases.
// Returns the chart paths and any errors encountered.
//
// Note: OCI chart locks are acquired and released during chart download within this function.
// The tempDir cleanup is deferred until after helm operations complete in the caller,
// so charts remain available during helm commands even though locks are released.
func (st *HelmState) PrepareCharts(helm helmexec.Interface, dir string, concurrency int, helmfileCommand string, opts ChartPrepareOptions) (map[PrepareChartKey]string, []error) {
if !opts.SkipResolve {
updated, err := st.ResolveDeps()
@ -1619,17 +1684,17 @@ func (st *HelmState) PrepareCharts(helm helmexec.Interface, dir string, concurre
if downloadRes.err != nil {
errs = append(errs, downloadRes.err)
return
}
func() {
prepareChartInfoMutex.Lock()
defer prepareChartInfoMutex.Unlock()
prepareChartInfo[PrepareChartKey{
key := PrepareChartKey{
Namespace: downloadRes.releaseNamespace,
KubeContext: downloadRes.releaseContext,
Name: downloadRes.releaseName,
}] = downloadRes.chartPath
}
prepareChartInfo[key] = downloadRes.chartPath
}()
if downloadRes.buildDeps {
@ -4294,10 +4359,6 @@ type ChartCacheKey struct {
var downloadedCharts = make(map[ChartCacheKey]string) // key -> chart path
var downloadedChartsMutex sync.RWMutex
// Legacy OCI-specific cache (kept for backward compatibility)
var downloadedOCICharts = make(map[string]bool)
var downloadedOCIMutex sync.RWMutex
// getChartCacheKey creates a cache key for a chart and version
func (st *HelmState) getChartCacheKey(release *ReleaseSpec) ChartCacheKey {
version := release.Version
@ -4326,6 +4387,222 @@ func (st *HelmState) addToChartCache(key ChartCacheKey, path string) {
downloadedCharts[key] = path
}
// chartDownloadAction represents what action should be taken after acquiring locks
type chartDownloadAction int
const (
chartActionUseCached chartDownloadAction = iota // Use the cached chart
chartActionDownload // Download the chart
chartActionRefresh // Delete and re-download (refresh)
)
// chartLockResult contains the result of acquiring locks and checking cache.
// It supports both shared (read) and exclusive (write) locks using reader-writer semantics.
// Multiple processes can hold shared locks simultaneously, but exclusive locks block all others.
type chartLockResult struct {
action chartDownloadAction
cachedPath string // Path to use if action is chartActionUseCached
fileLock *flock.Flock // File lock for cross-process synchronization
inProcessMutex *sync.RWMutex // In-process RWMutex for goroutine coordination
isExclusive bool // True if holding exclusive lock, false if shared
chartPath string // The chart path this lock is for (used for logging)
}
// Release releases all acquired locks (both file and in-process).
// This method is safe to call even if locks were not acquired.
func (r *chartLockResult) Release(logger *zap.SugaredLogger) {
if r == nil {
return
}
if r.inProcessMutex != nil {
if r.isExclusive {
r.inProcessMutex.Unlock()
} else {
r.inProcessMutex.RUnlock()
}
}
if r.fileLock != nil {
if err := r.fileLock.Unlock(); err != nil && logger != nil {
logger.Warnf("Failed to release file lock for %s: %v", r.chartPath, err)
}
}
}
// acquireChartLock acquires file and in-process locks for chart download coordination.
// - Shared (read) lock: Used when chart already exists in cache
// - Exclusive (write) lock: Used when downloading or refreshing a chart
//
// The optional skipRefreshCheck callback, if provided, is called before deleting an existing
// chart for refresh. If it returns true, the existing chart is used instead of being deleted.
// This allows callers to check in-memory caches to prevent redundant re-downloads within a process.
// NOTE: The callback is executed while holding an exclusive lock, so it should be fast and non-blocking.
//
// IMPORTANT: Locks are released immediately after download completes.
// The tempDir cleanup is deferred until after helm operations, so charts won't be deleted mid-use.
func (st *HelmState) acquireChartLock(chartPath string, opts ChartPrepareOptions, skipRefreshCheck func() bool) (*chartLockResult, error) {
result := &chartLockResult{
chartPath: chartPath,
}
// Ensure lock directory exists
lockFilePath := chartPath + ".lock"
lockFileDir := filepath.Dir(lockFilePath)
if err := os.MkdirAll(lockFileDir, 0755); err != nil {
return nil, fmt.Errorf("failed to create lock directory %s: %w", lockFileDir, err)
}
result.fileLock = flock.New(lockFilePath)
needsRefresh := !opts.SkipDeps && !opts.SkipRefresh
// Optimistic path: Try shared lock first if cache likely exists and no refresh needed
if st.fs.DirectoryExistsAt(chartPath) && !needsRefresh {
if err := st.acquireSharedLock(result, chartPath); err != nil {
// Failed to acquire shared lock, will try exclusive below
st.logger.Debugf("Failed to acquire shared lock for %s: %v, trying exclusive", chartPath, err)
} else {
// Got shared lock, validate cache
if fullChartPath, err := findChartDirectory(chartPath); err == nil {
result.action = chartActionUseCached
result.cachedPath = filepath.Dir(fullChartPath)
st.logger.Debugf("Using cached chart at %s (shared lock)", result.cachedPath)
return result, nil
}
// Invalid cache - release shared lock and acquire exclusive
st.logger.Debugf("Chart cache at %s is invalid, upgrading to exclusive lock", chartPath)
result.Release(st.logger)
// Reinitialize fileLock after releasing shared lock for exclusive lock acquisition
result.fileLock = flock.New(lockFilePath)
}
}
// Need exclusive lock for: download, refresh, or cache validation failure
if err := st.acquireExclusiveLock(result, chartPath); err != nil {
return nil, err
}
// Double-check after acquiring exclusive lock (another process may have populated cache)
if st.fs.DirectoryExistsAt(chartPath) {
fullChartPath, err := findChartDirectory(chartPath)
if err == nil {
// Valid cache exists now
if !needsRefresh {
// Another process populated the cache while we waited for lock
result.action = chartActionUseCached
result.cachedPath = filepath.Dir(fullChartPath)
st.logger.Debugf("Using cached chart at %s (populated by another process)", result.cachedPath)
return result, nil
}
// Refresh mode requested - check if we should skip (e.g., in-memory cache populated by another worker)
if skipRefreshCheck != nil && skipRefreshCheck() {
result.action = chartActionUseCached
result.cachedPath = filepath.Dir(fullChartPath)
st.logger.Debugf("Skipping refresh for chart at %s (another worker already cached)", result.cachedPath)
return result, nil
}
// Proceed with refresh: delete and re-download
st.logger.Debugf("Refreshing chart at %s (exclusive lock)", chartPath)
if err := os.RemoveAll(chartPath); err != nil {
result.Release(st.logger)
return nil, fmt.Errorf("failed to remove chart directory for refresh: %w", err)
}
result.action = chartActionRefresh
} else {
// Directory exists but invalid (no Chart.yaml) - corrupted cache
st.logger.Debugf("Chart directory at %s is corrupted: %v, will re-download", chartPath, err)
if err := os.RemoveAll(chartPath); err != nil {
result.Release(st.logger)
return nil, fmt.Errorf("failed to remove corrupted chart directory: %w", err)
}
result.action = chartActionDownload
}
} else {
result.action = chartActionDownload
}
return result, nil
}
// acquireSharedLock acquires a shared (read) lock for concurrent read access.
// Multiple processes can hold shared locks simultaneously, but shared locks
// block exclusive lock acquisition.
// Lock order: in-process mutex first, then file lock (to prevent deadlock).
func (st *HelmState) acquireSharedLock(result *chartLockResult, chartPath string) error {
const lockTimeout = 30 * time.Second
// Acquire in-process RLock FIRST (consistent ordering prevents deadlock)
result.inProcessMutex = st.getNamedRWMutex(chartPath)
result.inProcessMutex.RLock()
ctx, cancel := context.WithTimeout(context.Background(), lockTimeout)
defer cancel()
locked, err := result.fileLock.TryRLockContext(ctx, 500*time.Millisecond)
if err != nil {
result.inProcessMutex.RUnlock()
return fmt.Errorf("failed to acquire shared file lock for chart %s: %w", chartPath, err)
}
if !locked {
result.inProcessMutex.RUnlock()
return fmt.Errorf("timeout waiting for shared file lock on chart %s", chartPath)
}
result.isExclusive = false
return nil
}
// acquireExclusiveLock acquires an exclusive (write) lock with retry logic.
// Exclusive locks block all other locks (both shared and exclusive).
// Lock order: in-process mutex first, then file lock (to prevent deadlock).
func (st *HelmState) acquireExclusiveLock(result *chartLockResult, chartPath string) error {
const (
lockTimeout = 30 * time.Second
maxRetries = 3
retryBackoff = 1 * time.Second
)
// Acquire in-process Lock FIRST (consistent ordering prevents deadlock)
result.inProcessMutex = st.getNamedRWMutex(chartPath)
result.inProcessMutex.Lock()
var locked bool
var lockErr error
for attempt := 1; attempt <= maxRetries; attempt++ {
ctx, cancel := context.WithTimeout(context.Background(), lockTimeout)
locked, lockErr = result.fileLock.TryLockContext(ctx, 500*time.Millisecond)
cancel()
if locked {
break
}
if lockErr != nil {
// Actual file system error (not timeout) - worth retrying
st.logger.Warnf("Failed to acquire exclusive file lock (attempt %d/%d): %v", attempt, maxRetries, lockErr)
if attempt < maxRetries {
// Release in-process mutex during backoff to avoid blocking other goroutines
result.inProcessMutex.Unlock()
time.Sleep(retryBackoff)
result.inProcessMutex.Lock()
}
} else {
// Timeout (flock returns false, nil when context deadline exceeded)
// Don't retry on timeout - 30s is already a long wait, another process likely holds the lock
result.inProcessMutex.Unlock()
return fmt.Errorf("timeout waiting for exclusive file lock on chart %s after %v", chartPath, lockTimeout)
}
}
if !locked {
result.inProcessMutex.Unlock()
return fmt.Errorf("failed to acquire exclusive file lock for chart %s after %d attempts: %w", chartPath, maxRetries, lockErr)
}
result.isExclusive = true
return nil
}
// getOCIChart downloads or retrieves an OCI chart from cache.
// Locks are acquired during download and released immediately after.
func (st *HelmState) getOCIChart(release *ReleaseSpec, tempDir string, helm helmexec.Interface, opts ChartPrepareOptions) (*string, error) {
qualifiedChartName, chartName, chartVersion, err := st.getOCIQualifiedChartName(release)
if err != nil {
@ -4336,7 +4613,10 @@ func (st *HelmState) getOCIChart(release *ReleaseSpec, tempDir string, helm helm
return nil, nil
}
// Check global chart cache first
// Check global chart cache first (in-memory cache)
// If found, another worker in this process already downloaded the chart.
// We don't need to acquire a lock - the tempDir won't be deleted until
// after helm operations complete (cleanup is deferred in withPreparedCharts).
cacheKey := st.getChartCacheKey(release)
if cachedPath, exists := st.checkChartCache(cacheKey); exists && st.fs.DirectoryExistsAt(cachedPath) {
st.logger.Debugf("OCI chart %s:%s already downloaded, using cached version at %s", chartName, chartVersion, cachedPath)
@ -4349,63 +4629,64 @@ func (st *HelmState) getOCIChart(release *ReleaseSpec, tempDir string, helm helm
chartPath, _ := st.getOCIChartPath(tempDir, release, chartName, chartVersion, opts.OutputDirTemplate)
mu := st.getNamedMutex(chartPath)
mu.Lock()
defer mu.Unlock()
_, err = os.Stat(tempDir)
if err != nil {
err = os.MkdirAll(tempDir, 0755)
if err != nil {
return nil, err
}
}
downloadedOCIMutex.RLock()
alreadyDownloadedFlag := downloadedOCICharts[chartPath]
downloadedOCIMutex.RUnlock()
if !opts.SkipDeps && !opts.SkipRefresh && !alreadyDownloadedFlag {
err = os.RemoveAll(chartPath)
if err != nil {
return nil, err
}
}
if st.fs.DirectoryExistsAt(chartPath) {
st.logger.Debugf("chart already exists at %s", chartPath)
} else {
flags := st.chartOCIFlags(release)
flags = st.appendVerifyFlags(flags, release)
flags = st.appendKeyringFlags(flags, release)
flags = st.appendChartDownloadFlags(flags, release)
flags = st.appendChartVersionFlags(flags, release)
err := helm.ChartPull(qualifiedChartName, chartPath, flags...)
if err != nil {
return nil, err
}
err = helm.ChartExport(qualifiedChartName, chartPath)
if err != nil {
return nil, err
}
}
fullChartPath, err := findChartDirectory(chartPath)
// Acquire locks and determine action
// Pass a callback that checks if another worker in this process has already cached the chart.
// This prevents deleting and re-downloading a chart that was just downloaded by another worker.
lockResult, err := st.acquireChartLock(chartPath, opts, func() bool {
_, exists := st.checkChartCache(cacheKey)
return exists
})
if err != nil {
return nil, err
}
downloadedOCIMutex.Lock()
downloadedOCICharts[chartPath] = true
downloadedOCIMutex.Unlock()
// If cached on filesystem, release lock and return
if lockResult.action == chartActionUseCached {
st.addToChartCache(cacheKey, lockResult.cachedPath)
lockResult.Release(st.logger)
return &lockResult.cachedPath, nil
}
// Ensure temp directory exists
if _, err := os.Stat(tempDir); err != nil {
if err := os.MkdirAll(tempDir, 0755); err != nil {
lockResult.Release(st.logger)
return nil, err
}
}
// Download the chart
flags := st.chartOCIFlags(release)
flags = st.appendVerifyFlags(flags, release)
flags = st.appendKeyringFlags(flags, release)
flags = st.appendChartDownloadFlags(flags, release)
flags = st.appendChartVersionFlags(flags, release)
if err := helm.ChartPull(qualifiedChartName, chartPath, flags...); err != nil {
lockResult.Release(st.logger)
return nil, err
}
if err := helm.ChartExport(qualifiedChartName, chartPath); err != nil {
lockResult.Release(st.logger)
return nil, err
}
fullChartPath, err := findChartDirectory(chartPath)
if err != nil {
lockResult.Release(st.logger)
return nil, err
}
chartPath = filepath.Dir(fullChartPath)
// Add to global chart cache
st.addToChartCache(cacheKey, chartPath)
// Release lock immediately after download - the tempDir cleanup is deferred
// until after helm operations complete, so the chart won't be deleted mid-use.
lockResult.Release(st.logger)
return &chartPath, nil
}

View File

@ -3274,6 +3274,118 @@ func Test_gatherUsernamePassword(t *testing.T) {
}
}
func Test_extractRegistryHost(t *testing.T) {
tests := []struct {
name string
url string
want string
}{
{
name: "ECR with chart path",
url: "123456789012.dkr.ecr.us-east-1.amazonaws.com/helm-charts",
want: "123456789012.dkr.ecr.us-east-1.amazonaws.com",
},
{
name: "GHCR with nested path",
url: "ghcr.io/deliveryhero/helm-charts",
want: "ghcr.io",
},
{
name: "registry with port and path",
url: "registry.example.com:5000/helm-charts",
want: "registry.example.com:5000",
},
{
name: "registry without path",
url: "registry.example.com",
want: "registry.example.com",
},
{
name: "with oci:// prefix and path",
url: "oci://ghcr.io/charts/nginx",
want: "ghcr.io",
},
{
name: "with https:// prefix and path",
url: "https://registry.example.com/helm-charts",
want: "registry.example.com",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := extractRegistryHost(tt.url); got != tt.want {
t.Errorf("extractRegistryHost() = %v, want %v", got, tt.want)
}
})
}
}
func TestHelmState_SyncRepos_OCI(t *testing.T) {
tests := []struct {
name string
repos []RepositorySpec
wantRegistryLoginHost string
}{
{
name: "OCI registry with chart path should extract host only",
repos: []RepositorySpec{
{
Name: "ecr",
URL: "123456789012.dkr.ecr.us-east-1.amazonaws.com/helm-charts",
OCI: true,
Username: "AWS",
Password: "token",
},
},
wantRegistryLoginHost: "123456789012.dkr.ecr.us-east-1.amazonaws.com",
},
{
name: "OCI registry without path should pass URL as-is",
repos: []RepositorySpec{
{
Name: "ghcr",
URL: "ghcr.io",
OCI: true,
Username: "user",
Password: "pass",
},
},
wantRegistryLoginHost: "ghcr.io",
},
{
name: "OCI registry with nested path",
repos: []RepositorySpec{
{
Name: "docker",
URL: "registry-1.docker.io/bitnamicharts",
OCI: true,
Username: "user",
Password: "pass",
},
},
wantRegistryLoginHost: "registry-1.docker.io",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
helm := &exectest.Helm{}
state := &HelmState{
ReleaseSetSpec: ReleaseSetSpec{
Repositories: tt.repos,
},
}
_, err := state.SyncRepos(helm, map[string]bool{})
if err != nil {
t.Errorf("SyncRepos() error = %v", err)
return
}
if helm.RegistryLoginHost != tt.wantRegistryLoginHost {
t.Errorf("RegistryLogin was called with host = %q, want %q", helm.RegistryLoginHost, tt.wantRegistryLoginHost)
}
})
}
}
func TestGenerateOutputFilePath(t *testing.T) {
tests := []struct {
envName string

View File

@ -117,6 +117,8 @@ ${kubectl} create namespace ${test_ns} || fail "Could not create namespace ${tes
. ${dir}/test-cases/issue-2281-array-merge.sh
. ${dir}/test-cases/issue-2247.sh
. ${dir}/test-cases/issue-2291.sh
. ${dir}/test-cases/oci-parallel-pull.sh
. ${dir}/test-cases/issue-2297-local-chart-transformers.sh
# ALL DONE -----------------------------------------------------------------------------------------------------------

View File

@ -0,0 +1,56 @@
# Integration test for issue #2297: Local chart + transformers causes panic
# https://github.com/helmfile/helmfile/issues/2297
#
# This test verifies that local charts with relative paths (like "../chart")
# work correctly when transformers are present. The bug manifests when there
# are MULTIPLE release files in helmfile.d and one uses local chart + transformers.
# Before the fix, chartify would try "helm pull ../chart" which fails because
# the path wasn't normalized.
issue_2297_input_dir="${cases_dir}/issue-2297-local-chart-transformers/input"
issue_2297_tmp=""
# Cleanup function for this test
cleanup_issue_2297() {
if [ -n "${issue_2297_tmp}" ] && [ -d "${issue_2297_tmp}" ]; then
rm -rf "${issue_2297_tmp}"
fi
}
trap cleanup_issue_2297 EXIT
issue_2297_tmp=$(mktemp -d)
# Convert helmfile to absolute path before cd (otherwise ./helmfile won't be found)
helmfile_real="$(pwd)/${helmfile}"
test_start "issue #2297: local chart with transformers"
info "Testing helmfile template with local chart and transformers"
# Run from the input directory where helmfile.d/ contains the helmfile with "../chart" reference
cd "${issue_2297_input_dir}"
# This should succeed - before the fix it would fail with:
# "helm pull ../chart --untar" fails with "repo .. not found"
${helmfile_real} template > "${issue_2297_tmp}/output.yaml" 2>&1
result=$?
cd - > /dev/null
if [ $result -ne 0 ]; then
cat "${issue_2297_tmp}/output.yaml"
fail "helmfile template with local chart and transformers should not fail"
fi
# Verify the output contains the expected configmap with transformer annotation
if ! grep -q "test-annotation" "${issue_2297_tmp}/output.yaml"; then
cat "${issue_2297_tmp}/output.yaml"
fail "Output should contain the transformer annotation"
fi
info "Local chart with transformers works correctly"
# Cleanup and restore the original trap
cleanup_issue_2297
trap - EXIT
test_pass "issue #2297: local chart with transformers"

View File

@ -0,0 +1,4 @@
apiVersion: v2
name: test-chart
version: 0.1.0
description: A test chart for issue #2297

View File

@ -0,0 +1,6 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Release.Name }}-config
data:
key: value

View File

@ -0,0 +1,6 @@
# Issue #2297: Another release file in helmfile.d
# The bug manifests when there are MULTIPLE release files in helmfile.d
# This file represents another release that doesn't use transformers
releases:
- name: simple-release
chart: ../chart

View File

@ -0,0 +1,17 @@
# Issue #2297: Local chart with transformers
# The bug manifests when there are MULTIPLE release files in helmfile.d
# and one of them uses a local chart with transformers.
# The chart path should be normalized before being passed to chartify.
releases:
- name: test-with-transformers
chart: ../chart
transformers:
- apiVersion: builtin
kind: AnnotationsTransformer
metadata:
name: add-annotations
annotations:
test-annotation: "added-by-transformer"
fieldSpecs:
- path: metadata/annotations
create: true

View File

@ -0,0 +1,133 @@
# Integration test for issue #2295: Cache conflicts running multiple helmfile using the same chart in parallel
# This test verifies that file locking prevents race conditions when multiple helmfile processes
# try to pull the same OCI chart simultaneously.
oci_parallel_pull_case_input_dir="${cases_dir}/oci-parallel-pull/input"
config_file="helmfile.yaml"
oci_parallel_pull_cache_dir=""
oci_parallel_pull_output_dir=""
# Cleanup function for this test
cleanup_oci_parallel_pull() {
if [ -n "${oci_parallel_pull_cache_dir}" ] && [ -d "${oci_parallel_pull_cache_dir}" ]; then
rm -rf "${oci_parallel_pull_cache_dir}"
fi
if [ -n "${oci_parallel_pull_output_dir}" ] && [ -d "${oci_parallel_pull_output_dir}" ]; then
rm -rf "${oci_parallel_pull_output_dir}"
fi
}
trap cleanup_oci_parallel_pull EXIT
test_start "oci-parallel-pull: verify file locking prevents race conditions (issue #2295)"
# Create a temporary cache directory for this test
oci_parallel_pull_cache_dir=$(mktemp -d)
export HELMFILE_CACHE_HOME="${oci_parallel_pull_cache_dir}"
# Create a temporary output directory for test outputs
oci_parallel_pull_output_dir=$(mktemp -d)
info "Using temporary cache directory: ${HELMFILE_CACHE_HOME}"
info "Using temporary output directory: ${oci_parallel_pull_output_dir}"
# Function to check if failure is due to registry issues (not a race condition bug)
is_registry_error() {
local output_dir="$1"
# Check for common registry-related errors that are not race condition bugs
if grep -iqE "(rate limit|too many requests|unauthorized|connection refused|timeout|no such host|i/o timeout)" "${output_dir}"/oci-parallel-*.out 2>/dev/null; then
return 0 # true - it's a registry error
fi
return 1 # false - not a registry error
}
# Run multiple helmfile template commands in parallel using the same chart
# This simulates the scenario described in issue #2295
info "Running 3 parallel helmfile template commands..."
# Start all processes in background
${helmfile} -f ${oci_parallel_pull_case_input_dir}/${config_file} template > "${oci_parallel_pull_output_dir}/oci-parallel-1.out" 2>&1 &
pid1=$!
${helmfile} -f ${oci_parallel_pull_case_input_dir}/${config_file} template > "${oci_parallel_pull_output_dir}/oci-parallel-2.out" 2>&1 &
pid2=$!
${helmfile} -f ${oci_parallel_pull_case_input_dir}/${config_file} template > "${oci_parallel_pull_output_dir}/oci-parallel-3.out" 2>&1 &
pid3=$!
# Wait for all processes and capture exit codes
# Note: We capture exit codes using "|| exit1=$?" pattern to prevent set -e from exiting
# the script when wait returns non-zero (which happens when the process fails)
exit1=0
exit2=0
exit3=0
wait $pid1 || exit1=$?
wait $pid2 || exit2=$?
wait $pid3 || exit3=$?
info "Process 1 exit code: ${exit1}"
info "Process 2 exit code: ${exit2}"
info "Process 3 exit code: ${exit3}"
# Check for the specific error from issue #2295 (race condition bug)
# Use case-insensitive extended regex to catch variations from different tar/helm versions
if grep -iqE "(failed to untar.*already exists|file or directory.*already exists|a file.*already exists)" "${oci_parallel_pull_output_dir}"/oci-parallel-*.out 2>/dev/null; then
warn "Race condition detected! Found 'file already exists' error in output"
cat "${oci_parallel_pull_output_dir}"/oci-parallel-*.out
fail "oci-parallel-pull test failed due to race condition (issue #2295)"
fi
# Check for failures
failed=0
if [ $exit1 -ne 0 ]; then
warn "Process 1 failed:"
cat "${oci_parallel_pull_output_dir}/oci-parallel-1.out"
failed=1
fi
if [ $exit2 -ne 0 ]; then
warn "Process 2 failed:"
cat "${oci_parallel_pull_output_dir}/oci-parallel-2.out"
failed=1
fi
if [ $exit3 -ne 0 ]; then
warn "Process 3 failed:"
cat "${oci_parallel_pull_output_dir}/oci-parallel-3.out"
failed=1
fi
if [ $failed -eq 1 ]; then
# Check if this is a registry error (rate limit, network issue, etc.)
# These are not bugs in helmfile, so we should skip the test rather than fail
if is_registry_error "${oci_parallel_pull_output_dir}"; then
warn "Test skipped due to external registry issues (rate limit, network, etc.)"
warn "This is not a helmfile bug - the file locking mechanism cannot be tested"
# Clean up and exit successfully to not fail CI on external issues
cleanup_oci_parallel_pull
trap - EXIT
test_pass "oci-parallel-pull: skipped due to external registry issues"
return 0 2>/dev/null || exit 0
fi
fail "oci-parallel-pull test failed"
fi
# Verify the chart was cached
if [ ! -d "${HELMFILE_CACHE_HOME}" ]; then
fail "Cache directory was not created"
fi
# Check for lock files in the cache directory.
# NOTE: Lock files are ephemeral and may be cleaned up immediately after helmfile processes complete.
# Their absence does NOT mean locking was not used; this check is informational only.
lock_files=$(find "${HELMFILE_CACHE_HOME}" -name "*.lock" 2>/dev/null | wc -l)
info "Found ${lock_files} lock file(s) in cache directory"
if [ "${lock_files}" -lt 1 ]; then
warn "No lock files found - this does NOT mean locking was not used (lock files are ephemeral)"
fi
# Cleanup and restore the original trap
cleanup_oci_parallel_pull
trap - EXIT
test_pass "oci-parallel-pull: file locking prevents race conditions"

View File

@ -0,0 +1,14 @@
# Test helmfile for issue #2295: Parallel OCI chart pulls
# This helmfile uses an OCI chart to test that parallel pulls
# don't cause race conditions with the file locking mechanism.
repositories:
- name: docker
url: registry-1.docker.io
oci: true
releases:
# Using a chart that can be templated (not a library chart).
# Library charts cannot be templated by Helm and would skip the file locking logic being tested.
- name: test-parallel-pull
chart: docker/bitnamicharts/nginx
version: "18.2.5"