- Add IncludeNeeds to diff, template, lint, unittest, sync, apply ChartPrepareOptions
- All ConfigProviders already have DAGConfig embedded which includes IncludeNeeds()
Signed-off-by: yxxhero <aiopsclub@163.com>
- Add IncludeNeeds() to WriteValuesConfigProvider interface
- Implement IncludeNeeds() in WriteValuesImpl
- Update WriteValues function to use IncludeNeeds and IncludeTransitiveNeeds
Signed-off-by: yxxhero <aiopsclub@163.com>
- Add IncludeNeeds() to DepsConfigProvider and ReposConfigProvider
- Update depsConfig in tests
- Use c.IncludeNeeds() in run.go instead of hardcoded false
Signed-off-by: yxxhero <aiopsclub@163.com>
- Add IncludeNeeds() method to DepsConfigProvider interface
- Implement IncludeNeeds() in DepsImpl
- Update depsConfig in tests
- Use c.IncludeNeeds() in run.go instead of hardcoded false
Signed-off-by: yxxhero <aiopsclub@163.com>
Fixes#1003
Previously, --include-needs was incorrectly including transitive
dependencies (dependencies of dependencies). This fix ensures that:
- --include-needs only includes direct needs
- --include-transitive-needs includes both direct and transitive needs
Changes:
- Add separate handling for direct vs transitive needs in state.go
- Add IncludeNeeds field to ChartPrepareOptions
- Add unmarkNeedsDirectOnly() and collectDirectNeedsOnly() functions
- Update ForEachState and related functions to accept both flags
- Fix incorrect usage of c.IncludeNeeds() for IncludeTransitiveNeeds
- Update tests to verify the correct behavior
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: use --force-replace flag for Helm 4 instead of deprecated --force
Helm 4 deprecated the --force flag in favor of --force-replace.
This fix detects the Helm version and uses the appropriate flag:
- Helm 4: --force-replace
- Helm 3: --force
Also fixed a nil pointer panic in appendHideNotesFlags when called
with nil SyncOpts.
Fixes#2476
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix(ci): pin semver to v2.12.0 for Go 1.25 compatibility
semver@latest requires Go 1.26.1 but the project uses Go 1.25.4.
Pinning to v2.12.0 which is compatible with Go 1.25.
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: add test cases for force flag from defaults with nil release
Add test cases to cover the scenario where release.Force is nil and
HelmDefaults.Force enables force for both Helm 3 and Helm 4.
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: add nil ops test and rename misleading test names
- Add test case for appendHideNotesFlags with ops=nil to prevent
regression
- Rename force-from-default-nil-release-* to
force-from-default-nil-force-* for clarity (release.Force is nil,
not the release itself)
Signed-off-by: yxxhero <aiopsclub@163.com>
* refactor: add explicit parentheses for force condition
Add explicit parentheses around the two disjuncts in the force
condition to make the intended grouping unambiguous and easier
to read.
Signed-off-by: yxxhero <aiopsclub@163.com>
* refactor: check ops nil before Helm version in appendHideNotesFlags
- Swap the order to check ops == nil first to avoid unnecessary
IsVersionAtLeast call
- Restore the "see Helm release" comment for consistency with other
flag helpers
Signed-off-by: yxxhero <aiopsclub@163.com>
---------
Signed-off-by: yxxhero <aiopsclub@163.com>
PR #2367 introduced CLIOverrides to give --state-values-set element-by-element
array merge semantics. However, nested helmfile values (helmfiles[].values:)
were also routed into CLIOverrides, causing their arrays to merge instead of
replace. This broke the pre-v1.3.0 behavior where passing an array via
helmfiles[].values: would fully replace the child's default array.
Add OverrideValuesAreCLI flag to SubhelmfileEnvironmentSpec so the loader can
distinguish CLI flags from nested helmfile values. CLI values continue using
CLIOverrides (element-by-element merge); nested helmfile values now use Values
(Sparse merge strategy → full array replacement).
Fixes#2451
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
Add support for trackMode: helm-legacy to use Helm v4's --wait=legacy flag,
which maintains compatibility with Helm v3's wait behavior during migration.
Helm v4 changed the default --wait behavior from polling to a watcher-based
approach. This can cause issues with charts that have broken livenessProbe
configurations without startupProbe. The --wait=legacy flag preserves the
Helm v3 polling behavior for smoother migration.
Changes:
- Add TrackModeHelmLegacy constant in pkg/kubedog/options.go
- Use kubedog.TrackMode constants instead of raw strings in helmx.go
- Enhance appendWaitFlags to use --wait=legacy for Helm v4 when trackMode
is helm-legacy
- Add nil check for logger before logging warning
- Add version check with warning when helm-legacy is used with Helm v3
- Update validation in pkg/config to accept helm-legacy track mode
- Update command-line flags in cmd/apply.go and cmd/sync.go
- Add comprehensive documentation in docs/advanced-features.md
- Add thorough test coverage including warning message verification
Behavior:
- Helm v4 + helm-legacy: Uses --wait=legacy
- Helm v3 + helm-legacy: Falls back to --wait with warning
- Helm v4 + helm: Uses --wait (watcher mode)
- Any + kubedog: Skips --wait flag
Fixes#2464
Signed-off-by: yxxhero <aiopsclub@163.com>
Co-authored-by: Copilot <copilot@github.com>
When using jsonPatches or kustomize patches with helmfile, chartify runs
"helm template" internally to render the chart before applying patches.
The lookup() helm function requires cluster access (--dry-run=server).
Previously, --kubeconfig was passed to helm diff and helm upgrade commands,
but not to chartify's internal helm template call. This caused failures
when users specified --kubeconfig flag with a non-default kubeconfig location.
This fix ensures --kubeconfig is passed to chartify's TemplateArgs for
cluster-requiring commands (sync, apply, diff, etc.), alongside the existing
--kube-context and --dry-run=server flags.
Fixes#2444
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: local chart with external dependencies error when repos configured
When helm repo update was run, the code unconditionally set skipRefresh=true
for all builds, causing helm dep build --skip-refresh to fail for local charts
with external dependencies not listed in helmfile.yaml.
Now only non-local charts (precomputed skipRefresh=true) get --skip-refresh,
while local charts preserve their skipRefresh=false to allow refreshing repos
for external dependencies.
Fixes#2431
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: update snapshot tests for local chart refresh behavior
Local charts now run helm repo update during helm dep build to support
external dependencies not listed in helmfile.yaml (fixes#2431).
Signed-off-by: yxxhero <aiopsclub@163.com>
* refactor: remove redundant skipRefresh assignment
The condition 'if didUpdateRepo && r.skipRefresh { r.skipRefresh = true }'
was a no-op since setting true to true has no effect. The precomputed
skipRefresh value from prepareChartForRelease is already correct, so we
simply preserve it without modification.
Signed-off-by: yxxhero <aiopsclub@163.com>
* refactor: only call UpdateRepo when at least one build uses --skip-refresh
Avoid redundant helm repo update when all builds have skipRefresh=false,
as each helm dep build will refresh repos itself in that case.
Co-authored-by: Copilot <copilot@github.com>
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: update release_template_inheritance snapshot for skipRefresh optimization
UpdateRepo is now only called when at least one build uses --skip-refresh,
so local charts without skipRefresh no longer trigger the global repo update.
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: add regression test for issue #2431
Add TestIssue2431_LocalChartWithExternalDependency to verify that local
charts with external dependencies on repos NOT in helmfile.yaml work
correctly. The test ensures:
- UpdateRepo is NOT called when all builds have skipRefresh=false
- helm dep build does NOT receive --skip-refresh flag
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: add integration test for issue #2431
Add test case to verify that local charts with repos configured in
helmfile.yaml work correctly. The test ensures that helmfile template
does not fail with 'no cached repository' or 'no repository definition'
errors when:
- helmfile.yaml has non-OCI repos configured
- Local chart is used (which may have external dependencies not in helmfile.yaml)
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: update issue #2431 integration test to match issue scenario
Add external dependency (karma chart from wiremind repo) to local chart's
Chart.yaml, matching the exact scenario described in issue #2431 where:
- helmfile.yaml has repos configured (vector)
- Local chart depends on a repo NOT in helmfile.yaml (wiremind)
Signed-off-by: yxxhero <aiopsclub@163.com>
* revert: remove unit tests and restore e2e snapshot outputs
Remove pkg/state/run_helm_dep_builds_skip_refresh_test.go and restore
chart_need snapshot outputs to original state. The fix is verified by
the integration test for issue #2431.
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: remove snapshot outputs to regenerate them
Remove chart_need snapshot outputs so they can be regenerated by tests.
Signed-off-by: yxxhero <aiopsclub@163.com>
* revert: restore release_template_inheritance snapshot output
Signed-off-by: yxxhero <aiopsclub@163.com>
* restore: add back unit tests for skipRefresh behavior
Signed-off-by: yxxhero <aiopsclub@163.com>
* restore: add back chart_need snapshot outputs
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: update snapshot outputs for skipRefresh optimization
- Remove TestIssue2431_LocalChartWithExternalDependency unit test
- Update chart_need outputs: local chart runs helm dep build with repo refresh
- Update release_template_inheritance: no deps so no repo refresh output
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: update test comments and names per review feedback
- Update TestRunHelmDepBuilds_MultipleBuilds comment to remove reference
to removed didUpdateRepo variable
- Rename test case to accurately describe condition being tested
(build with skipRefresh=true instead of misleading 'non-local chart')
Signed-off-by: yxxhero <aiopsclub@163.com>
---------
Signed-off-by: yxxhero <aiopsclub@163.com>
Co-authored-by: Copilot <copilot@github.com>
* Add IP Network to supported HCL Functions
This patch adds CIDR functions from the `go-cty-funcs` package to
supported HCL functions
Signed-off-by: Oleh Neichev <oleg.neichev@gmail.com>
* Test HCL CIDR Functions
Signed-off-by: Oleh Neichev <oleg.neichev@gmail.com>
---------
Signed-off-by: Oleh Neichev <oleg.neichev@gmail.com>
* fix: use absolute baseDir in sequential helmfiles for correct values path resolution (#2424)
PR #2410 introduced a regression where a relative directory was passed as
baseDir instead of an absolute one, causing values and secrets file paths
to resolve incorrectly when using --sequential-helmfiles with helmfile.d/.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: mirror reporter's bases/templates/inherit setup in issue-2424 integration test
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
---------
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
When no repositories are defined in helmfile.yaml, local charts with
external dependencies need to refresh the repo cache. Previously, we
always passed --skip-refresh to helm dep build, which broke this case.
Now --skip-refresh is only passed when we actually ran helm repo update,
meaning repos are configured AND no skip refresh flags are set. This
preserves the precomputed skipRefresh value from prepareChartForRelease
which accounts for CLI flags, helmDefaults.skipRefresh, and release.skipRefresh.
Fixes#2417
Signed-off-by: yxxhero <aiopsclub@163.com>
When using only OCI repositories, helmfile would attempt to run
'helm repo update' which fails with 'no repositories found' error.
OCI repositories don't need 'helm repo update' as they use
'helm registry login' instead.
This fix adds a HasNonOCIRepositories() helper function and uses it
to determine whether to run 'helm repo update'.
Fixes#2418
Signed-off-by: yxxhero <aiopsclub@163.com>
* feat: allow for HCL values override
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* fix: ensure overriden HCL expression uses range from latest defined block vars
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: implement HCL cty values override tests
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* docs: better describe new behavior
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* fix: add extra parenthesis for better readability
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: implement variable override in decodeGraph() function, AFTER interpolation, providing back access to hv.* and local.* accessors
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: implement better HCL test to override values using local.* and hv.* accessors and pre-processing function calls
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: remove deprecated hclParseError() function (and test)
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: don't let HCL override with null value win
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: proper test condition on HCL map type merge (and tests)
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: more accurate HCL test error statement
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: ensure HCL DAG graph collects dependencies from ALL definitions to ensure proper evaluation order even if only earlier definitions have dependencies
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: verify HCL mixed-types merges are correctly supported
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* docs: improved environment values precedence section with HCL override support
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: HCL test spell-check, linter failure
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: implement HCL override e2e tests
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* fix: correct hcl_loader test error message
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* fix: ensure correct cty type is returned in case of object/map hcl merge
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* fix: ensure hcl locals from a previous definition/file do not leak into this evaluation when merging
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* fix: correct e2e hcl_override test; missing line in output string comparison
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* docs: spell-check on HCL doc
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
* chore: update comment for accuracy in HCL read routine
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
---------
Signed-off-by: Benjamin Zores <benjamin.zores@gmail.com>
Co-authored-by: yxxhero <11087727+yxxhero@users.noreply.github.com>
fix: helmDefaults.skipRefresh ignored in runHelmDepBuilds (#2269)
`runHelmDepBuilds()` only checked the CLI flag (`opts.SkipRefresh`) when
deciding whether to run `helm repo update` before building dependencies.
This meant that setting `helmDefaults.skipRefresh: true` in helmfile.yaml
had no effect on the repo update call inside dep builds.
Add `!st.HelmDefaults.SkipRefresh` to the guard condition so that
`helmDefaults.skipRefresh: true` is respected alongside the CLI flag.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: eliminate os.Chdir in sequential helmfiles to fix relative path resolution
The sequential code path used within() → os.Chdir() to change the
process-wide working directory when processing helmfile.d files.
This broke relative environment variable paths (e.g. KUBECONFIG=kubeconfig.yaml)
because they resolved from the wrong directory after chdir.
Replace the chdir-based approach with the same baseDir parameter pattern
used by the parallel code path, passing explicit directory context through
loadDesiredStateFromYamlWithBaseDir() instead of mutating global process state.
Closes#2409
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: restore within() for single-file sequential to preserve chart path format
The previous approach used baseDir for all sequential processing, which
changed chart path format in output (e.g. from "../../../../charts/raw"
to "test/integration/charts/raw"). This broke integration tests that
compare chart paths in expected output.
Now the sequential branch uses two strategies:
- Single file: use os.Chdir via within() to preserve backward-compatible
relative chart paths in output
- Multiple files with --sequential-helmfiles: use baseDir parameter to
avoid os.Chdir, fixing relative env var paths like KUBECONFIG (#2409)
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: revert e2e snapshot outputs to match within() behavior
The previous commit restored within() for single-file sequential
processing, which produces relative chart paths (e.g. ../../charts/raw)
and filename-only FilePath. Revert the e2e snapshot expected outputs
to match main branch since single-file behavior is now identical.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: restructure integration test for multi-file sequential processing
- Point -f at helmfile.d/ directly (not parent dir) so findDesiredStateFiles
discovers the yaml files
- Add second helmfile to trigger baseDir path (len > 1)
- Inline environment config to avoid base file relative path issues
- Verify both releases appear in output instead of comparing with parallel
(which may differ in ordering)
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: reduce cognitive complexity and improve accuracy of sequential helmfiles
Replace inline visitSubHelmfiles closure with calls to the existing
processNestedHelmfiles() method, matching the parallel path. This
eliminates duplicated nested logic and reduces gocognit complexity
below the CI threshold of 110. Also fixes help text and docs to
accurately describe that single-file processing still uses within(),
and adds kubeContext verification to the integration test.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* test: validate kubeContext resolution in sequential helmfiles integration test
Restructure the integration test to replicate the exact user scenario
from issue #2409:
- Multiple files in helmfile.d/ using bases: with relative paths
(../bases/) for environments and defaults
- Environment values set kubeContext via .Environment.Values
- helmDefaults.kubeContext rendered from gotmpl
- Local chart references (../../../../charts/raw) from helmfile.d/
- Run diff against the minikube cluster to exercise kubeContext
resolution, which would fail with "context does not exist" if
os.Chdir() broke relative path resolution
- Also verify template output for both releases and relative values
file (values/common.yaml) resolution
Fix normalizeChart() in util.go to be idempotent — skip re-prefixing
when the chart path already starts with basePath. This prevents
double-prefixing of local chart paths (e.g. helmfile.d/test/.../raw)
when normalizeChart is called multiple times (once during chart
preparation and again during diff/sync).
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
---------
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: helmBinary setting ignored in multi-document YAML files
The helmBinary setting in helmfile.yaml was being ignored when using
multi-document YAML files (files with --- separators).
Root Cause:
When processing multi-document YAML files, the load() function splits
the file into parts and processes each part separately. Each part was
calling applyDefaultsAndOverrides() which would set an empty helmBinary
to the default 'helm'. When merging parts, the default value from a
later part would override the correct value from an earlier part.
Fix:
- Added a new applyDefaults parameter to ParseAndLoad() to control when
defaults are applied
- Modified rawLoad() to pass applyDefaults=false when processing
individual parts
- Added a call to ApplyDefaultsAndOverrides() after all parts are merged
to apply defaults once on the final merged state
- Exported ApplyDefaultsAndOverrides() method for use by the app package
Fixes: #2319
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: update comment per PR review
Change 're-apply' to 'apply' since defaults are never applied during
part processing (applyDefaults=false is passed), so this is the first
and only time defaults are applied to the merged state.
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: clarify applyDefaults logic in test LoadFile callbacks
Add explicit applyDefaults variable with comment explaining why it
equals evaluateBases: base files shouldn't apply defaults, only the
main file should after all parts/bases are merged.
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: address PR review comments
- Remove applyDefaults parameter from rawLoad() since it's always false
- Add regression test for multi-document YAML with helmBinary (issue #2319)
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: add integration test for helmBinary in multi-document YAML
Add TestHelmBinaryPreservedInMultiDocumentYAML that exercises the full
loadDesiredStateFromYaml path to ensure helmBinary from the first
document is preserved when merging multi-document YAML files.
This is a regression test at the load() orchestration level for issue #2319.
Signed-off-by: yxxhero <aiopsclub@163.com>
---------
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: support XDG-style multiple paths in HELM_PLUGINS
Use filepath.SplitList to properly handle XDG-style paths with multiple
directories (e.g., HELM_PLUGINS=/path/one:/path/two) when looking up
plugin versions. Previously, the code only scanned a single directory.
Fixes#2411
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: address PR review comments for XDG plugins path support
- Track and return first non-IsNotExist error from os.ReadDir
- Skip empty path elements from filepath.SplitList
- Use os.PathListSeparator for cross-platform test compatibility
Signed-off-by: yxxhero <aiopsclub@163.com>
---------
Signed-off-by: yxxhero <aiopsclub@163.com>
toCLIError() panics on unhandled error types (e.g. helmexec.ExitError
from a failed helm plugin install). On Windows, plugin install hooks
often fail due to missing 'sh', causing helmfile init to crash even
when the plugin binary was placed correctly.
- Add helmexec.ExitError case to toCLIError and replace panic in the
default case with a graceful error return
- After AddPlugin/UpdatePlugin errors, verify whether the plugin is
actually present before failing; log a warning and continue if so
Fixes#1983
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
Adds a new `helmfile unittest` command that integrates the helm-unittest
plugin, allowing users to define unit test paths per release and run them
via helmfile.
Closes#2376
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* feat: support .Environment.* in --output-dir-template
This commit adds support for accessing environment values in the --output-dir-template flag.
Previously, users could only access .OutputDir, .State.*, and .Release.* in the template.
Now .Environment.* is also available, allowing users to use environment values in the
output directory path.
Example usage:
helmfile template -e test-1 --output-dir-template='{{ .OutputDir }}/{{ .Environment.cluster.name }}/{{ .Environment.Name }}/{{ .Release.Name }}'
This produces output like: ./gitops/my-test-cluster/test-1/release-name/
Changes:
- Add Environment field to GenerateOutputDir template data
- Add Environment field to generateChartPath template data (now a method on HelmState)
- Update help text for --output-dir-template flag in template and fetch commands
- Add test cases for Environment in template
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: address PR review comments for --output-dir-template
- Clarify .Environment.Name, .Environment.KubeContext, .Environment.Values.* in help text
- Update generateChartPath comment to reflect broader usage (fetch, pull, OCI)
- Add tests for GenerateOutputDir with Environment fields
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: address additional PR review comments
- Move HelmState setup outside test loop to reduce duplication
- Document Environment field (.Name, .KubeContext, .Values) in template data structs
Signed-off-by: yxxhero <aiopsclub@163.com>
---------
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: skip cache refresh for shared cache paths to prevent race conditions
When multiple helmfile processes run in parallel (e.g., as ArgoCD plugin),
they share the same OCI chart cache in ~/.cache/helmfile. One process could
delete and re-download (refresh) a cached chart while another process was
still using it, causing "path not found" errors.
This fix:
- Adds isSharedCachePath() helper to detect shared cache paths
- Skips chart deletion/refresh for paths in the shared cache directory
- Users can force refresh by running `helmfile cache cleanup` first
Fixes#2387
Signed-off-by: yxxhero <aiopsclub@163.com>
* docs: document OCI chart caching behavior and multi-process safety
Add documentation for:
- OCI chart cache location and behavior
- How to force cache refresh with `helmfile cache cleanup`
- Multi-process safety when using shared cache
- Cache management commands (info, cleanup)
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: address review comments for shared cache handling
- Return error instead of chartActionDownload for corrupted shared cache
- Change refresh skip log from Debugf to Infof for user visibility
- Add t.Helper() to createTestLogger test helper
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: handle symlinks and add debug logging in isSharedCachePath
- Use filepath.EvalSymlinks to resolve symlinks before path comparison
- Add debug logging when filepath.Abs fails
- Add test case for symlink to shared cache directory
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: address copilot review comments
- Include underlying error in corrupted cache error message
- Add cleanup for test directories created in shared cache
- Clarify --skip-refresh flag documentation
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: handle edge case when chartPath equals sharedCacheDir
- isSharedCachePath now returns true for exact match with cache dir
- Add test case for exact match with shared cache directory
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: add integration test for acquireChartLock shared cache behavior
Add TestAcquireChartLockSharedCacheSkipRefresh to verify that
acquireChartLock returns chartActionUseCached instead of
chartActionRefresh when the chart exists in the shared cache,
even when refresh is requested. This tests the core fix for
the race condition issue #2387.
Signed-off-by: yxxhero <aiopsclub@163.com>
---------
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: include query params in HTTP getter cache key (#2103)
When helmfile caches remote HTTP files fetched via the "normal" getter
(plain https:// URLs without a git:: prefix), the cache key did not
include query parameters. This caused URLs that differ only in query
params (e.g. ?ref=commit1 vs ?ref=commit2) to share the same cache
directory, silently returning the wrong file version.
The root cause was in Fetch() where the "normal" getter branch
overwrote the cache key with only scheme + host, discarding query
params that were correctly computed earlier.
Fix: extract the query-params suffix into a reusable variable and
apply it in both the default and "normal" getter cache key paths.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* Update pkg/remote/remote.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* Update pkg/remote/remote_test.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
---------
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
fix: support OCI chart digest syntax in chart URLs and version fields
Helm supports pinning OCI chart images by digest (@sha256:...), version
tag (:version), or both (:version@sha256:digest) since helm/helm#12690.
Helmfile failed to parse these formats, incorrectly constructing helm
commands and losing version/digest information embedded in chart URLs.
Root causes:
- resolveOciChart() used last ":" to find version tag, but sha256:abc
contains ":", so digest URLs were split incorrectly
- getOCIQualifiedChartName() included :version and @digest in chartName
with no parsing of either source
- appendChartVersionFlags() passed release.Version verbatim to --version
flag, including any digest suffix
- ChartPull() discarded the tag from resolveOciChart but did not
preserve digest in the URL
This commit adds parseOCIChartRef() and parseVersionDigest() utilities,
then updates the OCI chart handling pipeline so that:
- Digests are preserved in the chart URL passed to helm pull
- Version tags are extracted cleanly for the --version flag
- Both chart URL and version field are parsed for version/digest info
Fixes#2097
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
Add filepath.IsAbs guard to IsRemote and Parse to prevent Windows drive
letter paths (e.g., C:\path) from being misinterpreted as remote URLs
by go-getter's url.Parse, which prepends file:// to drive letter paths.
Fixes#2384
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* feat: Ensure repo update is only run once
Perform a single Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "glm-bitnami" chart repository
...Unable to get an update from the "fluent" chart repository (https://fluent.github.io/helm-charts):
Get "https://fluent.github.io/helm-charts/index.yaml": read tcp 192.168.0.104:51893->185.199.108.153:443: read: connection reset by peer
...Unable to get an update from the "grafana" chart repository (https://grafana.github.io/helm-charts):
Get "https://grafana.github.io/helm-charts/index.yaml": read tcp 192.168.0.104:51897->185.199.109.153:443: read: connection reset by peer
...Unable to get an update from the "ingress-nginx" chart repository (https://kubernetes.github.io/ingress-nginx):
Get "https://kubernetes.github.io/ingress-nginx/index.yaml": read tcp 192.168.0.104:51894->185.199.110.153:443: read: connection reset by peer
...Unable to get an update from the "chartmuseum" chart repository (https://chartmuseum.github.io/charts):
Get "https://chartmuseum.github.io/charts/index.yaml": read tcp 192.168.0.104:51896->185.199.110.153:443: read: connection reset by peer
...Successfully got an update from the "glm-chartmuseum" chart repository
...Successfully got an update from the "apollo" chart repository
...Successfully got an update from the "kyverno" chart repository
...Unable to get an update from the "mysql-operator" chart repository (https://mysql.github.io/mysql-operator/):
Get "https://mysql.github.io/mysql-operator/index.yaml": read tcp 192.168.0.104:51903->185.199.111.153:443: read: connection reset by peer
...Unable to get an update from the "metallb" chart repository (https://metallb.github.io/metallb):
Get "https://metallb.github.io/metallb/index.yaml": read tcp 192.168.0.104:51904->185.199.111.153:443: read: connection reset by peer
...Unable to get an update from the "dragonfly" chart repository (https://dragonflyoss.github.io/helm-charts/):
Get "https://dragonflyoss.github.io/helm-charts/index.yaml": read tcp 192.168.0.104:51905->185.199.108.153:443: read: connection reset by peer
...Unable to get an update from the "openfga" chart repository (https://openfga.github.io/helm-charts):
Get "https://openfga.github.io/helm-charts/index.yaml": read tcp 192.168.0.104:51907->185.199.111.153:443: read: connection reset by peer
...Unable to get an update from the "cnpg" chart repository (https://cloudnative-pg.github.io/charts):
Get "https://cloudnative-pg.github.io/charts/index.yaml": read tcp 192.168.0.104:51910->185.199.111.153:443: read: connection reset by peer
...Unable to get an update from the "metrics-server" chart repository (https://kubernetes-sigs.github.io/metrics-server/):
Get "https://kubernetes-sigs.github.io/metrics-server/index.yaml": read tcp 192.168.0.104:51913->185.199.111.153:443: read: connection reset by peer
...Unable to get an update from the "ot-helm" chart repository (https://ot-container-kit.github.io/helm-charts/):
Get "https://ot-container-kit.github.io/helm-charts/index.yaml": read tcp 192.168.0.104:51914->185.199.111.153:443: read: connection reset by peer
...Unable to get an update from the "coredns" chart repository (https://coredns.github.io/helm):
Get "https://coredns.github.io/helm/index.yaml": read tcp 192.168.0.104:51917->185.199.111.153:443: read: connection reset by peer
...Unable to get an update from the "redis-operator" chart repository (https://ot-container-kit.github.io/helm-charts/):
Get "https://ot-container-kit.github.io/helm-charts/index.yaml": read tcp 192.168.0.104:51912->185.199.111.153:443: read: connection reset by peer
...Unable to get an update from the "andrcuns" chart repository (https://andrcuns.github.io/charts):
Get "https://andrcuns.github.io/charts/index.yaml": read tcp 192.168.0.104:51915->185.199.111.153:443: read: connection reset by peer
...Successfully got an update from the "gitlab-jh" chart repository
...Successfully got an update from the "hashicorp" chart repository
...Successfully got an update from the "incubator" chart repository
...Successfully got an update from the "jenkins" chart repository
...Successfully got an update from the "nvidia" chart repository
...Successfully got an update from the "elastic" chart repository
...Successfully got an update from the "projectcalico" chart repository
...Unable to get an update from the "juicefs" chart repository (https://juicedata.github.io/charts/):
Get "https://juicedata.github.io/charts/index.yaml": read tcp 192.168.0.104:51919->185.199.111.153:443: read: connection reset by peer
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈ before running any
commands, allowing us to safely pass --skip-refresh to avoid redundant
repo updates for each chart with external dependencies.
This reduces the number of repository refresh operations from O(n) to O(1)
where n is the number of charts with remote dependencies.
Co-authored-by: Javex <github@javex.eu>
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: ensure repo update only runs when repositories are configured
This fixes CI issues where tests fail with 'no repositories found' error.
The PR #2378 adds a single helm.UpdateRepo() call before running helm dep
build commands. However, when no repositories are configured, this call
fails. The fix adds a check for len(st.Repositories) > 0 before calling
UpdateRepo().
Additionally, updated snapshot files to reflect the new output ordering
where repo update happens before building dependencies.
Signed-off-by: yxxhero <aiopsclub@163.com>
* feat: Update test snapshots for single repo update
The code changes in PR #2378 ensure that helm repo update is only run once
before building dependencies. This requires updating test snapshots to include
the 'Updating repo' output that now appears before 'Building dependency' messages.
Updated snapshots:
- chart_need/output.yaml
- chart_need_enable_live_output/output.yaml
- release_template_inheritance/output.yaml
- environments_releases_without_same_yaml_part/output.yaml
- environment_missing_in_subhelmfile/output.yaml
- pr_560/output.yaml
- environments_values_gotmpl_with_environment_name/output.yaml
- postrenderer/output.yaml (fixed YAML structure)
- oci_need/output.yaml
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: Correctly update test snapshots based on repository configuration
Only update snapshots for tests that have repositories defined:
- chart_need/output.yaml (has repositories - shows 'Updating repo')
- chart_need_enable_live_output/output.yaml (has repositories - shows 'Updating repo')
- release_template_inheritance/output.yaml (has repositories - shows 'Updating repo')
Tests without repositories should NOT show 'Updating repo':
- environments_releases_without_same_yaml_part/output.yaml
- environments_values_gotmpl_with_environment_name/output.yaml
- pr_560/output.yaml
- environment_missing_in_subhelmfile/output.yaml
- postrenderer/output.yaml (uses OCI dependencies)
- oci_need/output.yaml (uses OCI dependencies)
This matches the conditional logic in the code that only runs
helm.UpdateRepo() when len(st.Repositories) > 0.
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: correct snapshot test expectations for repo update optimization
- Re-add trailing newlines to environment_missing_in_subhelmfile output
- Restore correct chart paths (/... instead of ../../...)
- Restore postrenderer output with cm2 ConfigMap and correct field order
- Fixes CI test failures introduced by incorrect snapshot updates
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: update integration test expected lint output for repo update
Include 'Updating repo' messages in expected lint output files
to match the new behavior where helm repo update is run once
before building dependencies.
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: remove extra blank line from lint output files
Integration test output files had an extra blank line that was
not present in the expected output, causing test failures.
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: update lint output for single repo update
With the repo update optimization, lint runs only once
with 'Updating repo' messages instead of running twice.
Update expected output to match new single-run behavior.
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: filter out repo update messages in lint test
Update test runner to filter out repo update messages that are
now generated by the single helm.UpdateRepo() call, keeping
the expected lint output consistent with the original behavior.
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: filter repo update messages from diff test
Filter out repo update messages in diff test output to
match new behavior where helm.UpdateRepo() is called once.
Signed-off-by: yxxhero <aiopsclub@163.com>
* Fix missing closing parenthesis in grep command
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: prevent --args flags from being passed to helm repo commands
When helmfile template --args is used, the extra flags were being
passed to helm repo update and helm repo add commands, which don't
support all flags that helm template/install support. This caused
failures when flags like --dry-run were passed via --args.
The fix saves the extra flags before executing helm repo commands,
clears them, and restores them afterwards to ensure repo commands
run without unsupported flags.
Fixes CI issue in PR #2378 where test issue-1749 fails
with "Error: unknown flag: --dry-run" during helm repo update.
Signed-off-by: yxxhero <aiopsclub@163.com>
---------
Signed-off-by: yxxhero <aiopsclub@163.com>
Co-authored-by: Javex <github@javex.eu>
* feat: upgrade Helm version to v3.20.0 and v4.1.0
This commit updates the recommended Helm version from v3.19.5/v4.0.5 to
v3.20.0/v4.1.0 across all workflows, Dockerfiles, and application constants.
Changes:
- Update CI matrix to test with Helm v3.20.0 and v4.1.0
- Update .github/workflows/Makefile HELM_VERSION to v4.1.0
- Update Dockerfiles with new version and SHA256 checksums
- Update pkg/app/init.go HelmRecommendedVersion to v4.1.0
- Update go.mod helm.sh/helm/v3 to v3.20.0 and helm.sh/helm/v4 to v4.1.0
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: remove source field from e2e test helm plugin configs
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: remove source field from integration test helm plugin config
Signed-off-by: yxxhero <aiopsclub@163.com>
---------
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: add --force-update flag for Helm 4 to prevent stale repository indexes
Fixes#2337
Problem:
Helmfile with Helm v4 doesn't update repository indexes when adding repos,
leading to stale indexes and errors like:
"chart matching version not found in example index. (try 'helm repo update')"
This happens because Helm 4 changed behavior compared to Helm 3:
- Helm 3: Always downloads index when running "helm repo add", even if repo exists
- Helm 4: Skips downloading index if repo already exists with same config
(see: https://github.com/helm/helm/blob/v4.0.4/pkg/cmd/repo_add.go#L200)
Without --force-update, helmfile only works initially because Helm 4
downloads index on fresh repo setup, but subsequent "helmfile repos"
commands result in stale indexes.
Root Cause:
The code only added --force-update for Helm 3.3.2+, but not for Helm 4,
since it was believed to be default behavior in Helm 4. However, Helm 4
requires explicit --force-update flag to update indexes for existing repos.
Solution:
Add --force-update flag for Helm 4 in AddRepo function to ensure
repository indexes are updated even when repository already exists.
Refactoring:
Simplified the conditional logic from nested if statements to a single
readable condition using existing IsVersionAtLeast() helper:
if !helm.options.DisableForceUpdate &&
(helm.IsHelm4() || helm.IsVersionAtLeast("3.3.2")) {
args = append(args, "--force-update")
}
Changes:
- pkg/helmexec/exec.go: Add --force-update for Helm 4
- pkg/helmexec/exec_test.go: Update test expectations for both Helm 3.3.2+ and Helm 4
- AGENTS.md: Add development guide for the repository
Testing:
- All helmexec package tests pass
- Verified build succeeds
- Tested against Helm 3.2.0 (no force-update)
- Tested against Helm 3.3.2+ (with force-update)
- Tested against Helm 4.0.1 (with force-update)
Signed-off-by: opencode <opencode@users.noreply.github.com>
Signed-off-by: yxxhero <aiopsclub@163.com>
* test: update expected output for Helm 4 repo add message
Update integration test expectations to match Helm 4 behavior with --force-update flag.
When --force-update is used, Helm 4 now outputs "has been added to your
repositories" instead of "already exists with the same configuration, skipping",
because it forcibly updates the repository index.
Related to #2337
Signed-off-by: opencode <opencode@users.noreply.github.com>
Signed-off-by: yxxhero <aiopsclub@163.com>
---------
Signed-off-by: opencode <opencode@users.noreply.github.com>
Signed-off-by: yxxhero <aiopsclub@163.com>
* fix: array merge regression - layer arrays now replace defaults (#2353)
PR #2288 introduced element-by-element array merging to fix#2281, but this
caused a regression where layer/environment arrays were merged instead of
replacing base arrays entirely.
This fix uses automatic sparse array detection:
- Arrays with nil values (from --state-values-set) merge element-by-element
- Arrays without nils (from layer YAML) replace entirely
This follows Helm's documented behavior where arrays replace rather than merge.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: use separate CLIOverrides field for element-by-element array merging
The previous approach using ArrayMergeStrategySparse detection didn't work
for --state-values-set array[0]=value because setting index 0 produces no
nils in the array.
This fix adds a CLIOverrides field to Environment that keeps CLI values
separate from layer values. CLI overrides are merged last using
ArrayMergeStrategyMerge (always element-by-element), while layer values
use the default strategy (arrays replace).
This ensures:
- --state-values-set array[0]=x only changes index 0, preserving other elements
- Layer/environment file arrays still replace base arrays entirely
- Issue #2281 fix is preserved (--state-values-set array[1].field=x works)
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: correct comment about array merge strategy in test
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: propagate Defaults in multi-part helmfiles and fix merge order
- Add Defaults field merging from ctxEnv to preserve base values across
helmfile parts separated by ---
- Fix merge order: current part values now correctly override previous
parts (was reversed, causing older values to win)
- Update 147 snapshot test files for new Environment log format with
CLIOverrides field
This completes the fix for issue #2353 by ensuring:
1. Layer arrays replace entirely (not element-by-element merge)
2. CLI --state-values-set sparse arrays still merge element-by-element
3. Multi-part helmfiles properly inherit and override values
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: address Copilot review comments
- Initialize EmptyEnvironment with empty maps to match New() constructor
- Update test comment to accurately describe ArrayMergeStrategySparse
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: ensure templates access merged values via .Environment.Values
This commit fixes a regression in the CLIOverrides integration where
templates accessing .Environment.Values couldn't see CLI override values.
Changes:
- Remove CLIOverrides-into-Values merge from Merge() to keep proper
layering order (Defaults → Values → CLIOverrides) in GetMergedValues()
- Update NewEnvironmentTemplateData to set envCopy.Values to the merged
values, ensuring templates see the same values via both .Values and
.Environment.Values
This ensures:
- Issue #2353: Layer arrays still replace entirely (Sparse strategy)
- Issue #2281: CLI sparse arrays still merge element-by-element
- Templates can access CLI overrides via .Environment.Values
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* docs: improve mergeSlices documentation per Copilot review
Address Copilot review comments on PR #2367:
- Document empty array edge case: explicitly setting [] clears base array
- Document recursive strategy propagation for nested map merging
- Add comprehensive behavior description for all array merge strategies
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: use merged values when rendering environment value files
Environment value files (*.yaml.gotmpl) can reference CLI values via
.Values. Previously, only env.Values was passed to template rendering,
which didn't include CLIOverrides.
Now we call env.GetMergedValues() to get Defaults + Values + CLIOverrides
before rendering, so templates can access CLI values like:
--state-values-set foo=bar
This fixes the state-values-set-cli-args-in-environments integration test.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
---------
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: resolve --validate flag conflict with kustomize in Helm 4
Fixes#2355
In Helm 4, the --validate and --dry-run flags are mutually exclusive.
When using kustomize/chartify charts with helmfile diff --validate,
the code was adding both --validate AND --dry-run=server to the
helm template command, causing the error:
Error: if any flags in the group [validate dry-run] are set none
of the others can be; [dry-run validate] were all set
The fix checks if --validate is already set before adding --dry-run=server.
Since --validate already provides server-side validation (it was deprecated
in favor of --dry-run=server in Helm 4), adding --dry-run=server is
redundant when --validate is present.
Changes:
- Add !opts.Validate condition to processChartification() in state.go
- Add comprehensive unit tests for validate/dry-run mutual exclusion
- Add integration test with kustomize chart to prevent regression
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* address review feedback from Copilot
- Add missing test cases for destroy, delete, test, status WITH --validate
- Update integration test to use 'diff' instead of 'template' to properly
exercise the cluster-requiring code path that triggers --dry-run=server
- Add sync warning comments to the test helper function noting it must be
kept in sync with processChartification() in state.go
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* add missing 'build with validate' test case
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* address additional review feedback from Copilot
- Fix integration test to capture output and exit code in single execution
instead of running helmfile twice (more efficient)
- Add detailed documentation explaining why test helper duplication is
intentional: extracting shared function would require exposing internal
API and complex refactoring of processChartification dependencies
- Note that integration test exercises actual code path end-to-end
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: correct go doc comment formatting for gci linter
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: update line number reference from 1497-1523 to 1497-1524
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
---------
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
fix: pass --kube-context to helm template when using jsonPatches (#2309)
When using jsonPatches or strategicMergePatches in helmfile, the
`helm template` command was not receiving the `--kube-context` flag.
This caused issues when `--dry-run=server` was used (introduced in
PR #2271 to support lookup() functions), because helm would connect
to the wrong cluster context.
Root Cause:
1. `flagsForTemplate()` did not call `appendConnectionFlags()`, unlike
`flagsForUpgrade()` and `flagsForDiff()` which both include this call.
2. `processChartification()` did not include `--kube-context` when
setting `chartifyOpts.TemplateArgs` for internal helm template calls.
Fix:
1. Added `appendConnectionFlags()` call to `flagsForTemplate()` to ensure
kube-context and other connection flags are passed to helm template.
2. Added `getKubeContext()` helper function that resolves kube-context
with proper priority: release > environment > helmDefaults.
3. Modified `processChartification()` to include `--kube-context` in
chartifyOpts.TemplateArgs when chartify needs to run helm template.
4. Added compatibility check for `--validate` flag to avoid Helm 4
mutual exclusion error between --validate and --dry-run (Issue #2355).
Fixes#2309
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* Initial plan
* Fix helmfile.d parallel processing to respect alphabetical order
Changed multiple helmfile.d file processing from parallel to sequential
to match the documented behavior. Files are now processed in alphabetical
order as stated in the documentation, ensuring predictable deployment order.
This fixes issues where deployments would break because parallel processing
did not respect the intended ordering (e.g., databases before applications).
- Modified visitStatesWithContext to process files sequentially
- Removed unused processStateFileParallel function
- Updated test snapshots to reflect deterministic ordering
Co-authored-by: yxxhero <11087727+yxxhero@users.noreply.github.com>
* Add --sequential-helmfiles flag for opt-in sequential processing
Based on feedback, added a flag to control processing mode instead of
forcing sequential behavior. Parallel processing is now the default for
performance, with an opt-in --sequential-helmfiles flag for when order matters.
Changes:
- Added SequentialHelmfiles field to GlobalOptions and App
- Added --sequential-helmfiles CLI flag
- Modified visitStatesWithContext to check flag and choose parallel vs sequential
- Updated documentation to clarify parallel is default, sequential is opt-in
- Restored original parallel processing as default behavior
This gives users control over the behavior based on their needs.
Co-authored-by: yxxhero <11087727+yxxhero@users.noreply.github.com>
* Add documentation and improve code readability
- Added documentation to processStateFileParallel function
- Extracted shouldProcessInParallel variable for clarity
- All tests pass, security scan clean
Co-authored-by: yxxhero <11087727+yxxhero@users.noreply.github.com>
* Fix lint issue: remove trailing whitespace
Removed trailing whitespace from line 983 in pkg/app/app.go to fix
formatting lint error.
Co-authored-by: yxxhero <11087727+yxxhero@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yxxhero <11087727+yxxhero@users.noreply.github.com>
Fixes an issue where Chart.yaml dependencies with relative file:// paths
fail during chartification because the paths become invalid when the chart
is copied to chartify's temporary directory.
The rewriteChartDependencies function now converts relative file://
dependencies to absolute paths before chartification, then restores the
original Chart.yaml afterwards. Absolute file:// and other repository
types (https, oci) are left unchanged.
Includes comprehensive test coverage for various dependency scenarios.
Signed-off-by: Shane Starcher <shane.starcher@gmail.com>
Co-authored-by: Shane Starcher <shane.starcher@gmail.com>
* build(deps): update Helm v4 from 4.0.0 to 4.0.1
Update Helm v4 binary and Go library dependency to version 4.0.1.
Changes:
- Update helm.sh/helm/v4 Go module from v4.0.0 to v4.0.1
- Update Helm binary version in all Dockerfiles (alpine, ubuntu, debian)
- Update SHA256 checksums for linux/amd64 and linux/arm64
- Update CI workflow matrix to test against v4.0.1
- Update HelmRecommendedVersion constant in pkg/app/init.go
- Update test mocks to return v4.0.1 version string
- Update test plugin fixture version
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* build(deps): update helm-secrets from 4.7.0 to 4.7.4
Update helm-secrets plugin version across all configurations:
- Docker images (all 3 variants) - use ARG variable for version
- CI test matrix
- Integration test defaults
- Unit test fixtures and expectations
- HelmSecretsRecommendedVersion constant
- Dynamic plugin installation in exec.go
Also update plugin filename format from helm-secrets-*.tgz to
secrets-{version}.tgz to match the new release naming convention.
Update suppress-output-line-regex test expected output for Helm 4.0.1
which now suppresses Service diff after ipFamily normalization.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
---------
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: resolve issues #2295, #2296, #2297 and OCI registry login
This PR fixes four related bugs affecting chart preparation, caching,
and OCI registry authentication.
Issue #2295: OCI chart cache conflicts with parallel helmfile processes
- Added filesystem-level locking using flock for cross-process sync
- Implements double-check locking pattern for efficiency
- Retry logic with 5-minute timeout and 3 retries
- Refactored into reusable acquireChartLock() helper function
- Added refresh marker coordination for cross-process cache management
Issue #2296: helmDefaults.skipDeps and helmDefaults.skipRefresh ignored
- Check both CLI options AND helmDefaults when deciding to skip repo sync
Issue #2297: Local chart + transformers causes panic
- Normalize local chart paths to absolute before calling chartify
OCI Registry Login URL Fix:
- Added extractRegistryHost() to extract just the registry host from URLs
- Fixed SyncRepos to use extracted host for OCI registry login
- e.g., "account.dkr.ecr.region.amazonaws.com/charts" ->
"account.dkr.ecr.region.amazonaws.com"
Test Plan:
- Unit tests for issues #2295, #2296, #2297
- Unit tests for OCI registry login (extractRegistryHost, SyncRepos_OCI)
- Integration tests for issues #2295 and #2297
- All existing unit tests pass (including TestLint)
Fixes#2295Fixes#2296Fixes#2297
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: replace 60s timeout with reader-writer locks for OCI chart caching
Address PR review feedback from @champtar about the OCI chart caching
mechanism. The previous implementation used a 60-second timeout which
was arbitrary and caused race conditions when helm deployments took
longer (e.g., deployments triggering scaling up/down).
Changes:
- Replace 60s refresh marker timeout with proper reader-writer locks
- Use shared locks (RLock) when using cached charts (allows concurrent reads)
- Use exclusive locks (Lock) when refreshing/downloading charts
- Hold locks during entire helm operation lifecycle (not just during download)
- Add getNamedRWMutex() for in-process RW coordination
- Update PrepareCharts() to return locks map for lifecycle management
- Add chartLockReleaser in run.go to release locks after helm callback
- Remove unused mutexMap and getNamedMutex (replaced by RW versions)
- Add comprehensive tests for shared/exclusive lock behavior
This eliminates the race condition where one process could delete a
cached chart while another process's helm command was still using it.
Fixes review comment on PR #2298
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: prevent deadlock when multiple releases share the same chart
When multiple releases use the same OCI chart (e.g., same chart different
values), workers in PrepareCharts would deadlock:
1. Worker 1 acquires lock for chart/path, downloads, adds to cache
2. Worker 2 finds chart in cache, tries to acquire lock on same path
3. Worker 2 blocks waiting for Worker 1's lock
4. Collector waits for Worker 2's result
5. Worker 1's lock held until PrepareCharts finishes -> deadlock
The fix: when using the in-memory chart cache (which means another worker
in the same process already downloaded the chart), don't acquire another
lock. This is safe because:
- The in-memory cache is only used within a single helmfile process
- The tempDir cleanup is deferred until after helm callback completes
- Cross-process coordination is still handled by file locks during downloads
This fixes the "signal: killed" test failures in CI for:
- oci_chart_pull_direct
- oci_chart_pull_once
- oci_chart_pull_once2
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: resolve deadlock by releasing OCI chart locks immediately after download
This commit simplifies the OCI chart locking mechanism to fix deadlock
issues that occurred when multiple releases shared the same chart.
Problem:
When multiple releases used the same OCI chart, workers in PrepareCharts
would deadlock because:
1. Worker 1 acquires lock for chart/path, downloads chart
2. Worker 2 tries to acquire lock on same path, blocks waiting
3. PrepareCharts waits for all workers to complete
4. Worker 1's lock held until PrepareCharts finishes -> deadlock
Solution:
Release locks immediately after chart download completes. This is safe
because:
- The tempDir cleanup is deferred until after helm operations complete
in withPreparedCharts(), so charts won't be deleted mid-use
- The in-memory chart cache prevents redundant downloads within a process
- Cross-process coordination via file locks still works during download
Changes:
- Remove chartLock field from chartPrepareResult struct
- Release locks immediately in getOCIChart() and forcedDownloadChart()
- Simplify PrepareCharts() by removing lock collection and release logic
- Update function signatures to return only (path, error)
This also fixes the "signal: killed" test failures in CI for:
- oci_chart_pull_direct
- oci_chart_pull_once
- oci_chart_pull_once2
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: add double-check locking for in-memory chart cache
When multiple workers concurrently process releases using the same chart,
they all check the in-memory cache before acquiring locks. If none have
populated the cache yet, all workers miss and try to download.
Previously, even after acquiring the exclusive lock, the code would
re-download the chart when needsRefresh=true (the default). This caused
multiple "Pulling" messages in tests like oci_chart_pull_once.
The fix adds a second in-memory cache check AFTER acquiring the lock.
This implements proper double-check locking:
1. Check cache (outside lock) → miss
2. Acquire lock
3. Check cache again (inside lock) → hit if another worker populated it
4. If still miss, download and add to cache
This ensures only one worker downloads the chart, while others use
the cached version populated by the first worker.
Changes:
- Add in-memory cache double-check in getOCIChart() after acquiring lock
- Add in-memory cache double-check in forcedDownloadChart() after acquiring lock
This fixes the oci_chart_pull_once and oci_chart_pull_direct test failures
where charts were being pulled multiple times instead of once.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: use callback to prevent redundant chart downloads within a process
When multiple workers concurrently process releases using the same chart,
they need to coordinate to avoid redundant downloads. The previous fix
set SkipRefresh=true for OCI charts, which prevented legitimate refresh
scenarios (e.g., floating tags).
This commit implements a better solution using a callback mechanism:
1. acquireChartLock() now accepts an optional skipRefreshCheck callback
2. Before deleting a cached chart for refresh, the callback is invoked
3. If the callback returns true (in-memory cache has the chart), skip refresh
4. This allows deduplication within a process while respecting cross-run refresh
The flow is now:
- Worker 1 downloads chart, adds to in-memory cache, releases lock
- Worker 2 acquires lock, sees needsRefresh=true, but callback sees
in-memory cache is populated → uses cached instead of deleting
This correctly handles:
- Within-process deduplication: only one download per chart
- Cross-run refresh: respects --skip-refresh flag for floating tags
- Immutable versions: cached and reused as expected
Changes:
- Add skipRefreshCheck callback parameter to acquireChartLock()
- Update getOCIChart() to pass in-memory cache check callback
- Update forcedDownloadChart() to pass in-memory cache check callback
- Remove SkipRefresh=true workaround for OCI charts
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: address Copilot review comments on PR #2298
This commit addresses the automated review comments from GitHub Copilot:
1. pkg/state/state.go: Add nil check for logger in Release() method
to prevent potential nil pointer dereference when logger is nil.
2. pkg/state/state.go: Fix misleading comment about "external callers"
to accurately reflect that Logger() is used by the app package.
3. pkg/state/issue_2296_test.go: Add comment noting that boolPtr helper
is already defined in skip_test.go (shared across test files).
4. test/integration/test-cases/oci-parallel-pull.sh: Replace hardcoded
/tmp paths with a dedicated temp directory for test outputs. Add
cleanup for the output directory in the cleanup function.
5. test/integration/test-cases/issue-2297-local-chart-transformers.sh:
Add cleanup trap to remove temp directory on exit, preventing
leftover files from accumulating.
6. Remove dead code: The chartLocks map in PrepareCharts was always
empty since locks are released immediately after download. Removed
the unused return value and corresponding handling in run.go to
improve code clarity and maintainability.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: make oci-parallel-pull test resilient to registry issues
The integration test was intermittently failing in CI due to Docker Hub
rate limiting or network issues. These failures are not helmfile bugs.
Changes:
- Add is_registry_error() function to detect external registry issues
(rate limits, network timeouts, connection refused, etc.)
- Check for the race condition bug (issue #2295) first and fail fast
- If other failures occur, check if they're registry-related
- Skip test gracefully when registry issues are detected instead of
failing CI on external infrastructure problems
This ensures the test still catches the actual race condition bug while
not causing false failures due to Docker Hub rate limits in CI.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: make oci-parallel-pull test resilient to registry issues
The integration test was failing in CI for two reasons:
1. Docker Hub rate limiting or network issues causing helmfile to fail
2. The test script exits early due to `set -e` when `wait` returns non-zero
Changes:
- Use `wait $pid || exit=$?` pattern to capture exit codes without triggering
set -e. When wait returns non-zero, the || branch captures the exit code
into the variable, preventing script termination.
- Add is_registry_error() function to detect external registry issues
(rate limits, network timeouts, connection refused, etc.)
- Check for the race condition bug (issue #2295) first and fail fast
- Skip test gracefully when registry issues are detected instead of
failing CI on external infrastructure problems
This ensures the test still catches the actual race condition bug while
not causing false failures due to Docker Hub rate limits in CI.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: address PR #2298 review - reinitialize fileLock after release
Address Copilot review comments:
1. pkg/state/state.go: Reinitialize fileLock after releasing shared lock
When upgrading from shared to exclusive lock, the fileLock needs to be
reinitialized with flock.New() after calling Release(). This ensures
a fresh flock object is used for the exclusive lock acquisition.
2. test/integration/test-cases/oci-parallel-pull.sh: Add lock file
verification warning if no lock files are found, to ensure the
locking mechanism is actually being tested.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: address PR #2298 Copilot review comments (round 4)
Address 8 Copilot review comments:
1. pkg/state/state.go: Release in-process mutex during retry backoff
to avoid blocking other goroutines for up to 90 seconds.
2. pkg/state/state.go: Include chartPath in shared lock error message
for better debugging.
3. pkg/state/state.go: Document that extractRegistryHost does not handle
URLs with query parameters or fragments (uncommon for OCI registries).
4. pkg/state/state.go: Document that skipRefreshCheck callback should be
fast and non-blocking since it runs while holding exclusive lock.
5. oci-parallel-pull.sh: Use case-insensitive grep (-i flag) to catch
error variations like "I/O timeout".
6. helmfile.yaml: Expand comment explaining why library charts can't be
used for this test (they can't be templated by Helm).
Skipped (with justification):
- PrepareChartKey helper: Only 2 usages with different source structs
- Context reuse in retry: Per-attempt contexts provide clearer semantics
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: address PR #2298 Copilot review comments (round 5)
1. Make race condition detection grep more robust (oci-parallel-pull.sh)
- Use case-insensitive extended regex (-iqE)
- Add multiple pattern variations to catch different tar/helm versions
2. Remove unused Logger() method from HelmState (state.go)
- Method was never called; all lock releases use st.logger directly
3. Add clarifying comments for lock retry behavior (state.go)
- Document why file system errors are retried but timeouts are not
- Explain flock returns (false, nil) on context deadline exceeded
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: clarify lock file check is informational only
Lock files are ephemeral and may be cleaned up immediately after
helmfile processes complete. Update comments and warning message
to make clear their absence doesn't indicate locking wasn't used.
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
* fix: add HELM_BIN env var to Dockerfiles
The helm-git plugin requires HELM_BIN environment variable to be set.
Without it, the plugin fails with "HELM_BIN: parameter not set".
Add HELM_BIN=/usr/local/bin/helm to all Dockerfile variants.
Fixes#2303
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
---------
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>