Commit Graph

113 Commits

Author SHA1 Message Date
Aditya Menon 534d0b618c
build(deps): update Helm v4 to 4.0.1 and helm-secrets to 4.7.4 (#2304)
* build(deps): update Helm v4 from 4.0.0 to 4.0.1

Update Helm v4 binary and Go library dependency to version 4.0.1.

Changes:
- Update helm.sh/helm/v4 Go module from v4.0.0 to v4.0.1
- Update Helm binary version in all Dockerfiles (alpine, ubuntu, debian)
- Update SHA256 checksums for linux/amd64 and linux/arm64
- Update CI workflow matrix to test against v4.0.1
- Update HelmRecommendedVersion constant in pkg/app/init.go
- Update test mocks to return v4.0.1 version string
- Update test plugin fixture version

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* build(deps): update helm-secrets from 4.7.0 to 4.7.4

Update helm-secrets plugin version across all configurations:
- Docker images (all 3 variants) - use ARG variable for version
- CI test matrix
- Integration test defaults
- Unit test fixtures and expectations
- HelmSecretsRecommendedVersion constant
- Dynamic plugin installation in exec.go

Also update plugin filename format from helm-secrets-*.tgz to
secrets-{version}.tgz to match the new release naming convention.

Update suppress-output-line-regex test expected output for Helm 4.0.1
which now suppresses Service diff after ipFamily normalization.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

---------

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
2025-11-28 08:43:54 +08:00
Aditya Menon 9c70adc038
fix: resolve issues #2295, #2296, and #2297 (#2298)
* fix: resolve issues #2295, #2296, #2297 and OCI registry login

This PR fixes four related bugs affecting chart preparation, caching,
and OCI registry authentication.

Issue #2295: OCI chart cache conflicts with parallel helmfile processes
- Added filesystem-level locking using flock for cross-process sync
- Implements double-check locking pattern for efficiency
- Retry logic with 5-minute timeout and 3 retries
- Refactored into reusable acquireChartLock() helper function
- Added refresh marker coordination for cross-process cache management

Issue #2296: helmDefaults.skipDeps and helmDefaults.skipRefresh ignored
- Check both CLI options AND helmDefaults when deciding to skip repo sync

Issue #2297: Local chart + transformers causes panic
- Normalize local chart paths to absolute before calling chartify

OCI Registry Login URL Fix:
- Added extractRegistryHost() to extract just the registry host from URLs
- Fixed SyncRepos to use extracted host for OCI registry login
- e.g., "account.dkr.ecr.region.amazonaws.com/charts" ->
        "account.dkr.ecr.region.amazonaws.com"

Test Plan:
- Unit tests for issues #2295, #2296, #2297
- Unit tests for OCI registry login (extractRegistryHost, SyncRepos_OCI)
- Integration tests for issues #2295 and #2297
- All existing unit tests pass (including TestLint)

Fixes #2295
Fixes #2296
Fixes #2297

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: replace 60s timeout with reader-writer locks for OCI chart caching

Address PR review feedback from @champtar about the OCI chart caching
mechanism. The previous implementation used a 60-second timeout which
was arbitrary and caused race conditions when helm deployments took
longer (e.g., deployments triggering scaling up/down).

Changes:
- Replace 60s refresh marker timeout with proper reader-writer locks
- Use shared locks (RLock) when using cached charts (allows concurrent reads)
- Use exclusive locks (Lock) when refreshing/downloading charts
- Hold locks during entire helm operation lifecycle (not just during download)
- Add getNamedRWMutex() for in-process RW coordination
- Update PrepareCharts() to return locks map for lifecycle management
- Add chartLockReleaser in run.go to release locks after helm callback
- Remove unused mutexMap and getNamedMutex (replaced by RW versions)
- Add comprehensive tests for shared/exclusive lock behavior

This eliminates the race condition where one process could delete a
cached chart while another process's helm command was still using it.

Fixes review comment on PR #2298

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: prevent deadlock when multiple releases share the same chart

When multiple releases use the same OCI chart (e.g., same chart different
values), workers in PrepareCharts would deadlock:

1. Worker 1 acquires lock for chart/path, downloads, adds to cache
2. Worker 2 finds chart in cache, tries to acquire lock on same path
3. Worker 2 blocks waiting for Worker 1's lock
4. Collector waits for Worker 2's result
5. Worker 1's lock held until PrepareCharts finishes -> deadlock

The fix: when using the in-memory chart cache (which means another worker
in the same process already downloaded the chart), don't acquire another
lock. This is safe because:
- The in-memory cache is only used within a single helmfile process
- The tempDir cleanup is deferred until after helm callback completes
- Cross-process coordination is still handled by file locks during downloads

This fixes the "signal: killed" test failures in CI for:
- oci_chart_pull_direct
- oci_chart_pull_once
- oci_chart_pull_once2

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: resolve deadlock by releasing OCI chart locks immediately after download

This commit simplifies the OCI chart locking mechanism to fix deadlock
issues that occurred when multiple releases shared the same chart.

Problem:
When multiple releases used the same OCI chart, workers in PrepareCharts
would deadlock because:
1. Worker 1 acquires lock for chart/path, downloads chart
2. Worker 2 tries to acquire lock on same path, blocks waiting
3. PrepareCharts waits for all workers to complete
4. Worker 1's lock held until PrepareCharts finishes -> deadlock

Solution:
Release locks immediately after chart download completes. This is safe
because:
- The tempDir cleanup is deferred until after helm operations complete
  in withPreparedCharts(), so charts won't be deleted mid-use
- The in-memory chart cache prevents redundant downloads within a process
- Cross-process coordination via file locks still works during download

Changes:
- Remove chartLock field from chartPrepareResult struct
- Release locks immediately in getOCIChart() and forcedDownloadChart()
- Simplify PrepareCharts() by removing lock collection and release logic
- Update function signatures to return only (path, error)

This also fixes the "signal: killed" test failures in CI for:
- oci_chart_pull_direct
- oci_chart_pull_once
- oci_chart_pull_once2

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: add double-check locking for in-memory chart cache

When multiple workers concurrently process releases using the same chart,
they all check the in-memory cache before acquiring locks. If none have
populated the cache yet, all workers miss and try to download.

Previously, even after acquiring the exclusive lock, the code would
re-download the chart when needsRefresh=true (the default). This caused
multiple "Pulling" messages in tests like oci_chart_pull_once.

The fix adds a second in-memory cache check AFTER acquiring the lock.
This implements proper double-check locking:

1. Check cache (outside lock) → miss
2. Acquire lock
3. Check cache again (inside lock) → hit if another worker populated it
4. If still miss, download and add to cache

This ensures only one worker downloads the chart, while others use
the cached version populated by the first worker.

Changes:
- Add in-memory cache double-check in getOCIChart() after acquiring lock
- Add in-memory cache double-check in forcedDownloadChart() after acquiring lock

This fixes the oci_chart_pull_once and oci_chart_pull_direct test failures
where charts were being pulled multiple times instead of once.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: use callback to prevent redundant chart downloads within a process

When multiple workers concurrently process releases using the same chart,
they need to coordinate to avoid redundant downloads. The previous fix
set SkipRefresh=true for OCI charts, which prevented legitimate refresh
scenarios (e.g., floating tags).

This commit implements a better solution using a callback mechanism:

1. acquireChartLock() now accepts an optional skipRefreshCheck callback
2. Before deleting a cached chart for refresh, the callback is invoked
3. If the callback returns true (in-memory cache has the chart), skip refresh
4. This allows deduplication within a process while respecting cross-run refresh

The flow is now:
- Worker 1 downloads chart, adds to in-memory cache, releases lock
- Worker 2 acquires lock, sees needsRefresh=true, but callback sees
  in-memory cache is populated → uses cached instead of deleting

This correctly handles:
- Within-process deduplication: only one download per chart
- Cross-run refresh: respects --skip-refresh flag for floating tags
- Immutable versions: cached and reused as expected

Changes:
- Add skipRefreshCheck callback parameter to acquireChartLock()
- Update getOCIChart() to pass in-memory cache check callback
- Update forcedDownloadChart() to pass in-memory cache check callback
- Remove SkipRefresh=true workaround for OCI charts

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: address Copilot review comments on PR #2298

This commit addresses the automated review comments from GitHub Copilot:

1. pkg/state/state.go: Add nil check for logger in Release() method
   to prevent potential nil pointer dereference when logger is nil.

2. pkg/state/state.go: Fix misleading comment about "external callers"
   to accurately reflect that Logger() is used by the app package.

3. pkg/state/issue_2296_test.go: Add comment noting that boolPtr helper
   is already defined in skip_test.go (shared across test files).

4. test/integration/test-cases/oci-parallel-pull.sh: Replace hardcoded
   /tmp paths with a dedicated temp directory for test outputs. Add
   cleanup for the output directory in the cleanup function.

5. test/integration/test-cases/issue-2297-local-chart-transformers.sh:
   Add cleanup trap to remove temp directory on exit, preventing
   leftover files from accumulating.

6. Remove dead code: The chartLocks map in PrepareCharts was always
   empty since locks are released immediately after download. Removed
   the unused return value and corresponding handling in run.go to
   improve code clarity and maintainability.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: make oci-parallel-pull test resilient to registry issues

The integration test was intermittently failing in CI due to Docker Hub
rate limiting or network issues. These failures are not helmfile bugs.

Changes:
- Add is_registry_error() function to detect external registry issues
  (rate limits, network timeouts, connection refused, etc.)
- Check for the race condition bug (issue #2295) first and fail fast
- If other failures occur, check if they're registry-related
- Skip test gracefully when registry issues are detected instead of
  failing CI on external infrastructure problems

This ensures the test still catches the actual race condition bug while
not causing false failures due to Docker Hub rate limits in CI.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: make oci-parallel-pull test resilient to registry issues

The integration test was failing in CI for two reasons:

1. Docker Hub rate limiting or network issues causing helmfile to fail
2. The test script exits early due to `set -e` when `wait` returns non-zero

Changes:
- Use `wait $pid || exit=$?` pattern to capture exit codes without triggering
  set -e. When wait returns non-zero, the || branch captures the exit code
  into the variable, preventing script termination.
- Add is_registry_error() function to detect external registry issues
  (rate limits, network timeouts, connection refused, etc.)
- Check for the race condition bug (issue #2295) first and fail fast
- Skip test gracefully when registry issues are detected instead of
  failing CI on external infrastructure problems

This ensures the test still catches the actual race condition bug while
not causing false failures due to Docker Hub rate limits in CI.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: address PR #2298 review - reinitialize fileLock after release

Address Copilot review comments:

1. pkg/state/state.go: Reinitialize fileLock after releasing shared lock
   When upgrading from shared to exclusive lock, the fileLock needs to be
   reinitialized with flock.New() after calling Release(). This ensures
   a fresh flock object is used for the exclusive lock acquisition.

2. test/integration/test-cases/oci-parallel-pull.sh: Add lock file
   verification warning if no lock files are found, to ensure the
   locking mechanism is actually being tested.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: address PR #2298 Copilot review comments (round 4)

Address 8 Copilot review comments:

1. pkg/state/state.go: Release in-process mutex during retry backoff
   to avoid blocking other goroutines for up to 90 seconds.

2. pkg/state/state.go: Include chartPath in shared lock error message
   for better debugging.

3. pkg/state/state.go: Document that extractRegistryHost does not handle
   URLs with query parameters or fragments (uncommon for OCI registries).

4. pkg/state/state.go: Document that skipRefreshCheck callback should be
   fast and non-blocking since it runs while holding exclusive lock.

5. oci-parallel-pull.sh: Use case-insensitive grep (-i flag) to catch
   error variations like "I/O timeout".

6. helmfile.yaml: Expand comment explaining why library charts can't be
   used for this test (they can't be templated by Helm).

Skipped (with justification):
- PrepareChartKey helper: Only 2 usages with different source structs
- Context reuse in retry: Per-attempt contexts provide clearer semantics

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: address PR #2298 Copilot review comments (round 5)

1. Make race condition detection grep more robust (oci-parallel-pull.sh)
   - Use case-insensitive extended regex (-iqE)
   - Add multiple pattern variations to catch different tar/helm versions

2. Remove unused Logger() method from HelmState (state.go)
   - Method was never called; all lock releases use st.logger directly

3. Add clarifying comments for lock retry behavior (state.go)
   - Document why file system errors are retried but timeouts are not
   - Explain flock returns (false, nil) on context deadline exceeded

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: clarify lock file check is informational only

Lock files are ephemeral and may be cleaned up immediately after
helmfile processes complete. Update comments and warning message
to make clear their absence doesn't indicate locking wasn't used.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: add HELM_BIN env var to Dockerfiles

The helm-git plugin requires HELM_BIN environment variable to be set.
Without it, the plugin fails with "HELM_BIN: parameter not set".

Add HELM_BIN=/usr/local/bin/helm to all Dockerfile variants.

Fixes #2303

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

---------

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
2025-11-27 22:13:03 +08:00
Aditya Menon c8bcbcd629
🐛 Fix four critical issues: environment merging, kubeVersion detection, lookup() with kustomize, and Helm 4 color flags (#2276)
* fix: deep merge environments from multiple bases (#2273)

Problem:
When using multiple base helmfiles, environment values were being
completely replaced instead of deep-merged due to mergo.WithOverride
introduced in PR #2228.

Solution:
- Created mergeEnvironments() function for proper deep merging
- Manually merge environment Values and Secrets slices before struct merge
- Preserves all environment values from both base and current helmfile

Testing:
- Added TestEnvironmentMergingWithBases with two scenarios:
  1. Multiple bases with overlapping environment values
  2. Environment values with array merging

Fixes #2273

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: auto-detect Kubernetes version for helm-diff (#2275)

Problem:
When helmfile runs helm-diff without specifying kubeVersion, helm-diff
falls back to v1.20.0. This causes chart compatibility checks to fail
for charts requiring newer Kubernetes versions (e.g., kubeVersion: ">=1.25.0").

Root Cause:
- flagsForDiff() was not passing kubeVersion to helm-diff plugin
- Without --kube-version flag, helm-diff uses default v1.20.0

Solution:
- Created pkg/cluster package with DetectServerVersion() function
- Auto-detect cluster version using k8s.io/client-go discovery API
- Pass detected version to helm-diff via --kube-version flag
- Priority: helmfile.yaml kubeVersion > auto-detected version
- Works with both Helm 3 and Helm 4

Implementation:
- pkg/cluster/version.go: Cluster version detection
- pkg/app/app.go: detectKubeVersion() helper used in diff() and apply()
- pkg/state/state.go: Added DetectedKubeVersion field to DiffOpts
- Integrated into flagsForDiff() with proper precedence

Testing:
- Unit tests for cluster version detection
- Unit tests for kubeVersion precedence logic
- Integration test with chart requiring Kubernetes >=1.25.0
- Tests verify upgrade scenario (critical failure case from issue)
- Validated with both Helm 3 and Helm 4

Fixes #2275

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: enable lookup() function with strategicMergePatches (#2271)

Problem:
When using strategicMergePatches (kustomize), Helm's lookup() function
stops working. Charts like Grafana use lookup() to preserve existing
resource values (e.g., PVC volumeName), which get lost when using patches.

Root Cause:
- Chartify runs "helm template" to render charts before applying patches
- By default, "helm template" runs client-side without cluster access
- The lookup() function requires cluster connectivity to query resources
- Without cluster access, lookup() returns empty values

Solution:
- Pass --dry-run=server to helm template when using kustomize patches
- This enables cluster connectivity for lookup() while keeping client-side rendering
- Only applied to commands requiring cluster access (diff, apply, sync, etc.)
- Offline commands (template, lint, build) remain cluster-independent

Implementation:
- Modified processChartification() to accept helmfileCommand parameter
- Added switch-based logic to determine cluster requirement per command
- Conditionally set chartifyOpts.TemplateArgs = "--dry-run=server"
- Safe default: unknown commands assume cluster access

Command Behavior:
- helmfile diff/apply/sync: Uses --dry-run=server, lookup() works
- helmfile template/lint/build: No cluster requirement, works offline
- Charts without lookup(): Unaffected
- Charts with lookup() + cluster: Lookup values preserved correctly

Testing:
- Integration test with ConfigMap using lookup() to preserve values
- Verifies lookup works with strategicMergePatches
- Tests both with and without cluster access
- Validates offline template command still works

Fixes #2271

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: remove unnecessary error return from mergeEnvironments

The mergeEnvironments function always returns nil, making the error
return value unnecessary. This fixes the unparam linter warning.

- Changed function signature to not return error
- Updated call site to not handle error
- All tests still pass

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: handle nil Environments map in mergeEnvironments

Fixes panic when base helmfile has nil Environments map.
Initialize the destination map if nil before merging to prevent
"assignment to entry in nil map" panic.

- Added nil check in mergeEnvironments to return early
- Initialize layers[0].Environments before merge if nil
- Fixes TestVisitDesiredStatesWithReleasesFiltered_Issue1008_MissingNonDefaultEnvInBase

The panic occurred when a base helmfile didn't define any environments
but a subsequent layer did. Now we properly initialize an empty map
to merge into.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* test: disable kubeVersion auto-detection in unit tests

Add DisableKubeVersionAutoDetection field to App struct to prevent
unit tests from connecting to real Kubernetes clusters during testing.

The kubeVersion auto-detection feature (issue #2275) was causing
unit tests to fail because:
1. Tests use mock helm implementations without real cluster access
2. Auto-detection was connecting to local minikube cluster (v1.34.0)
3. Test expectations didn't include --kube-version flag in diff keys

Solution:
- Add DisableKubeVersionAutoDetection bool field to App struct
- Check this flag in detectKubeVersion() before attempting detection
- Set flag to true in all pkg/app/*_test.go files

This ensures unit tests remain isolated and don't depend on
external cluster state while preserving auto-detection for
production use.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* chore: upgrade helm-diff plugin to v3.14.1

Update helm-diff plugin from v3.14.0 to v3.14.1 across all environments:
- Dockerfiles (main, debian-stable-slim, ubuntu)
- CI workflow matrix configurations
- Integration test default version

This ensures consistency across development, testing, and production
environments.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* test: fix table formatting and improve E2E test infrastructure

This commit addresses multiple test failures and improves the testing
infrastructure for better reliability and maintainability.

Table Formatting Fixes:
- Added trimTrailingWhitespace() helper function to remove trailing
  whitespace from table output in both FormatAsTable() and printDAG()
- Fixes TestList and TestDAG failures caused by tabwriter padding
  empty columns with trailing spaces
- Updated golden file for table output test to match new behavior

E2E Test Infrastructure Improvements:
- Implemented dynamic port allocation for Docker registry tests to
  prevent port conflicts (replaced hardcoded port 5000/5001)
- Added getFreePort() function using kernel-allocated unused ports
- Added waitForRegistry() function with proper health check polling
  of Docker Registry /v2/ endpoint (replaces sleep hack)
- Added prepareInputFile() function to handle port substitution and
  path resolution when copying helmfile configs to temp directories
- Extracted setupLocalDockerRegistry() helper to reduce cognitive
  complexity from 111 to ≤110 (gocognit threshold)
- Added port normalization in test output to replace dynamic ports
  with $REGISTRY_PORT placeholder for deterministic comparisons

Test Configuration Updates:
- Updated OCI chart tests to use dynamic port allocation via
  $REGISTRY_PORT placeholder in helmfile configs
- Converted relative chart paths to absolute paths when input files
  are copied to temp directories (fixes path resolution issues)
- Left postrenderer paths as relative since they're resolved from
  working directory (works for both Helm 3 and Helm 4)

Golden File Updates:
- Updated all OCI-related test expected outputs to use $REGISTRY_PORT
  placeholder instead of hardcoded ports
- Removed trailing whitespace from issue_493 test expected output
- Updated postrenderer test outputs to reflect chart path normalization

Test Cleanup:
- Removed unused fakeInit struct and CheckHelmPlugins() call from
  snapshot tests (not needed for template/fetch/list commands)
- Removed unused imports (app, helmexec packages)

Technical Details:
- Port allocation uses net.Listen with port 0 for kernel assignment
- Registry health check polls with 500ms intervals and 30s timeout
- Chart paths: ../../charts/* → absolute paths (input file moves to temp)
- Postrenderer paths: remain relative (resolved from working directory)
- OCI cache paths normalized: oci__localhost_PORT → oci__localhost_$REGISTRY_PORT

All originally failing tests now pass:
- TestList ✓
- TestDAG ✓
- TestHelmfileTemplateWithBuildCommand (all OCI tests) ✓
- TestFormatAsTable ✓

Fixes three test failures reported in issue.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix(test): convert postrenderer paths to absolute for Helm 3

Helm 3 resolves postrenderer script paths relative to the helmfile
location. When the input file is copied to a temp directory for
port substitution, relative postrenderer paths break.

Solution:
- Added postrenderersDir parameter to prepareInputFile()
- Convert ../../postrenderers/* to absolute paths for Helm 3 only
- Use existing isHelm4() function to detect Helm version
- Helm 4 extracts plugin names from paths, so works with relative

This fixes the postrenderer test failure in CI where Helm 3 could
not find the postrenderer script at the relative path.

Fixes: Error: unable to find binary at ../../postrenderers/add-cm2.bash
Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix(test): remove remaining hardcoded port 5001 in OCI tests

Updated 4 remaining OCI chart tests that still had hardcoded port 5001:
- oci_chart_pull
- oci_chart_pull_once
- oci_chart_pull_once2
- oci_chart_pull_direct

Changes:
- config.yaml: Removed hardcoded port, use dynamic allocation
- input.yaml.gotmpl: Replaced localhost:5001 with localhost:$REGISTRY_PORT

This ensures all OCI chart tests use dynamic port allocation to
prevent port conflicts during parallel test execution.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: prevent helm-diff from normalizing server-side defaults

Problem:
The suppress-output-line-regex integration test was failing because
helm-diff was reporting "has changed, but diff is empty after suppression"
for Service resources when it should have shown ipFamilyPolicy and ipFamilies
fields being removed.

Root Cause:
When auto-detected kubeVersion (e.g., 1.34.0) is passed to helm-diff via
--kube-version flag, helm-diff normalizes server-side defaults. This makes
fields like ipFamilyPolicy and ipFamilies appear unchanged, even though they
don't exist in the chart template and will be removed by the upgrade.

After applying suppressOutputLineRegex patterns, only label changes remained
(helm.sh/chart and app.kubernetes.io/version). These were correctly suppressed,
leaving an empty diff - hence the "diff is empty after suppression" message.

Solution:
Added a new configuration option 'disableAutoDetectedKubeVersionForDiff' to allow
disabling auto-detected kubeVersion being passed to helm-diff. This prevents
helm-diff from normalizing server-side defaults when needed.

Default behavior: Pass auto-detected kubeVersion (fixes issue #2275, existing behavior)
Opt-out behavior: Set flag to true to only use explicit kubeVersion from helmfile.yaml

helmDefaults:
  disableAutoDetectedKubeVersionForDiff: true  # false by default

releases:
- name: myrelease
  disableAutoDetectedKubeVersionForDiff: true  # override per-release

Implementation:
- Added DisableAutoDetectedKubeVersionForDiff field to HelmSpec and ReleaseSpec
- Updated flagsForDiff() to check this flag before passing kubeVersion
- Default (false): pass auto-detected kubeVersion (fixes issue #2275)
- Opt-out (true): only pass explicit kubeVersion from helmfile.yaml
- Updated suppress-output-line-regex test to disable auto-detected kubeVersion

This approach:
- Maintains backward compatibility (default passes auto-detected kubeVersion)
- Fixes issue #2275 for charts requiring newer Kubernetes versions
- Allows users to opt-out when server-side normalization causes issues
- Fixes suppress-output-line-regex test regression

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* test: update hash values in TestGenerateID after adding DisableAutoDetectedKubeVersionForDiff field

The hash values in TestGenerateID needed to be updated because adding the
DisableAutoDetectedKubeVersionForDiff field to ReleaseSpec changed the structure's
hash representation. This is expected behavior as generateValuesID() hashes the
entire ReleaseSpec structure.

Updated all expected hash values to match the new values:
- baseline: foo-values-66f7fd6f7b
- different bytes content: foo-values-6664979cd7
- different map content: foo-values-78897dfd49
- different chart: foo-values-64b7846cb7
- different name: bar-values-576cb7ddc7
- specific ns: myns-foo-values-6c567f54c

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: address PR review comments and resolve issue #2280

This commit addresses all review comments from GitHub Copilot and
resolves issue #2280 regarding --color flag conflict with Helm 4.

Changes:
1. Fixed documentation in pkg/cluster/version.go
   - Updated function comment to reflect error return behavior
   - Corrected version format example and comment

2. Added complete command categorization in pkg/state/state.go
   - Added all helmfile commands to cluster access switch statement
   - Properly categorized 15+ commands based on cluster requirements
   - Added clarifying comments for command groups

3. Resolved issue #2280: --color flag conflict with Helm 4
   - In Helm 4, --color expects a value (never/auto/always)
   - Converts --color to --color=always for Helm 4
   - Converts --no-color to --color=never for Helm 4
   - Prevents Helm from consuming next argument as color value
   - Added comprehensive unit tests
   - Added integration test (Helm 4 only)

Issue #2280 Details:
When running helmfile diff with --color and --context flags on Helm 4,
the --color flag would consume --context as its value, resulting in:
"invalid color mode '--context': must be one of: never, auto, always"

The fix detects Helm 4 and converts boolean color flags to the format
Helm 4 expects, preventing the argument consumption issue.

Fixes #2280

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: correct kubeVersion precedence comment in test

The comment incorrectly stated that state.KubeVersion takes precedence
over paramKubeVersion, but the actual implementation (getKubeVersion in
state.go:3354-3364) shows the correct order is:

1. paramKubeVersion (auto-detected from cluster)
2. release.KubeVersion (per-release override)
3. state.KubeVersion (helmfile.yaml global setting)

Updated the comment to match the implementation and the test cases.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* fix: resolve Helm 4 --color flag conflict (issue #2280)

This commit resolves issue #2280 where the --color flag causes Helm 4
to consume the next argument, resulting in errors like:
"invalid color mode '--context': must be one of: never, auto, always"

Root Cause:
In Helm 4, the --color flag is parsed by the Helm binary before being
passed to plugins like helm-diff. This causes Helm to interpret the
next argument (e.g., --context) as the value for --color.

Solution:
Remove --color and --no-color flags from helm-diff commands when using
Helm 4, and instead use the HELM_DIFF_COLOR environment variable.
The helm-diff plugin supports HELM_DIFF_COLOR=[true|false] as an
alternative to the --color/--no-color flags.

Changes:
1. Added filterColorFlagsForHelm4() function in pkg/helmexec/exec.go
   - Removes --color and --no-color flags from flags slice
   - Sets HELM_DIFF_COLOR=true for --color
   - Sets HELM_DIFF_COLOR=false for --no-color

2. Modified DiffRelease() to call filterColorFlagsForHelm4() on Helm 4

3. Added comprehensive unit tests in pkg/helmexec/exec_test.go
   - Test_DiffRelease_ColorFlagHelm4: Verifies flags are filtered
   - Test_FilterColorFlagsForHelm4: Tests all flag combinations

4. Added integration test in test/integration/test-cases/issue-2280.sh
   - Tests the exact scenario from issue #2280
   - Verifies --color and --context flags work together
   - Helm 4 only test (skipped on Helm 3)

Fixes #2280

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* refactor: apply Copilot code review nitpicks

This commit addresses minor code quality improvements suggested by
GitHub Copilot's automated review.

Changes:
1. pkg/app/formatters.go - Optimize trimTrailingWhitespace()
   - Only modify lines that actually have trailing whitespace
   - Avoids unnecessary string allocations for clean lines
   - Performance optimization for table formatting

2. test/e2e/template/helmfile/snapshot_test.go
   - Use 0600 permissions for temporary input files (was 0644)
   - Improves security by making temp files owner-only read/write
   - Prevents potential exposure of sensitive test data

   - Improve error messages in getFreePort()
   - Wrap errors with context using fmt.Errorf("%w")
   - Better error debugging when port allocation fails

   - Add retry logic to setupLocalDockerRegistry()
   - Handles race condition where port gets taken between allocation and use
   - Retries up to 3 times with new ports on "address already in use" errors
   - Fails fast on other Docker errors for better test diagnostics

All tests passing. These are non-functional improvements that enhance
code quality, performance, security, and test reliability.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* docs: improve code comments based on Copilot feedback

This commit addresses documentation nitpicks from GitHub Copilot's
automated review to improve code clarity and maintainability.

Changes:
1. pkg/app/app.go - Clarify detectKubeVersion() return conditions
   - Updated comment to explicitly list all three cases when empty
     string is returned: kubeVersion already set, auto-detection
     disabled, or detection fails
   - Improves function documentation clarity

2. test/e2e/template/helmfile/snapshot_test.go
   - Added reference to retry logic in getFreePort() comment
   - Points callers to setupLocalDockerRegistry() for proper race
     condition handling example
   - Better guidance for future code maintainers

3. pkg/state/state.go - Explain patches check rationale
   - Added comment explaining why --dry-run=server is only enabled
     when patches are used
   - Clarifies that this is a conservative approach to minimize
     unnecessary cluster connections
   - Documents primary use case (Grafana chart with PVC preservation)

All changes are documentation-only with no functional impact.
All tests passing.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

* refactor: enable lookup() for all cluster commands and add defensive check

This commit addresses two Copilot review suggestions to improve code
robustness and functionality.

Changes:
1. pkg/state/state.go - Remove patches requirement for lookup()
   - Previously only enabled --dry-run=server when patches were present
   - Now enables it for ALL cluster-requiring commands
   - Rationale: lookup() function can be used without patches
   - Improves compatibility with charts using lookup() standalone
   - Trade-off: Slightly more cluster connections vs broader support

2. pkg/helmexec/exec.go - Add defensive check for HELM_DIFF_COLOR
   - Only set environment variable if not already present
   - Makes code more defensive for future implementation changes
   - Note: Changes behavior from "last wins" to "first wins"
   - In practice, env map is freshly created so check is precautionary

3. pkg/helmexec/exec_test.go - Update test expectations
   - Changed test case to reflect "first wins" behavior
   - Updated test name and comment for clarity

Breaking behavior change:
- When both --color and --no-color are present, the FIRST flag now
  wins instead of the LAST flag
- This deviates from standard CLI conventions where later flags
  override earlier ones
- However, this is unlikely to affect real usage as users rarely
  specify conflicting flags

All tests passing.

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>

---------

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
2025-11-21 08:32:54 +08:00
Aditya Menon 4f275b3667
feat: add Helm 4 support while maintaining Helm 3 compatibility (#2262)
This commit adds comprehensive support for Helm 4 while maintaining
full backward compatibility with Helm 3. The implementation includes:

- Updated helm version detection to support both Helm 3 and Helm 4
- Added HELMFILE_HELM4 environment variable to control Helm version
- Modified helm execution paths to handle version-specific binaries
- Updated helm plugin installation to support split architecture

- Helm 4: Uses split plugin architecture (3 separate .tgz files)
  - helm-secrets.tgz
  - helm-secrets-getter.tgz
  - helm-secrets-post-renderer.tgz
- Helm 3: Continues using single plugin installation
- Updated Dockerfiles, CI workflows, and core installation code

- Helm 4 requires post-renderers to be plugins, not executable scripts
- Created Helm plugin structure for integration tests
- Updated helmfile.yaml templates to dynamically select renderer type
- Added test plugins: add-cm, add-cm1, add-cm2

- Updated integration tests for Helm 3/4 compatibility
- Created Helm 4 variant expected output files
- Fixed test determinism issues (repo cleanup between iterations)
- Added version-specific output filtering for warnings/messages

- Updated workflows to test both Helm 3 and Helm 4
- Matrix testing across Helm versions
- Updated helm-diff to v3.14.0 for compatibility

- Updated README and docs with Helm 4 information
- Added migration guidance
- Updated version requirements

All changes are backward compatible - existing Helm 3 users will
see no behavior changes.



fix: update Helm 4 lint expected output to match filtered output

The grep filter removes the semver warning, so the expected output
should not include it. Updated lint-helm4 files to match the filtered
output (warning removed, no extra blank line).

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
2025-11-19 07:49:30 +08:00
Copilot 377ca5c1a2
Bump helm-diff to v3.13.1 (#2223)
* Initial plan

* Bump helm-diff to v3.13.1

Co-authored-by: yxxhero <11087727+yxxhero@users.noreply.github.com>

* Update Dockerfiles to use helm-diff v3.13.1

Co-authored-by: yxxhero <11087727+yxxhero@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yxxhero <11087727+yxxhero@users.noreply.github.com>
2025-10-16 17:55:49 +08:00
Ori Shamir 1b8f2871f6
Add yq to Dockerfile (#2208)
Signed-off-by: Ori Shamir <orishamir04@gmail.com>
2025-10-01 21:51:45 +08:00
yxxhero 3f5d4110f6
build: update helm-diff plugin to v3.13.0 (#2189) 2025-09-13 10:08:15 +08:00
yxxhero c443baa103
build: update Helm to v3.19.0 across all components (#2187)
Signed-off-by: yxxhero <aiopsclub@163.com>
2025-09-12 13:58:08 -04:00
yxxhero a05b93de5c
build: update helm to v3.18.6 (#2144)
Signed-off-by: yxxhero <aiopsclub@163.com>
2025-08-22 14:10:01 +08:00
yxxhero 444275281f
Update recommended Helm versions in init.go and run.sh (#2129)
- Bump HelmDiffRecommendedVersion from v3.12.3 to v3.12.5 in pkg/app/init.go
- Bump default HELM_DIFF_VERSION from 3.12.3 to 3.12.5 in test/integration/run.sh
- Update HelmRecommendedVersion from v3.18.4 to v3.18.5 in pkg/app/init.go

Signed-off-by: yxxhero <aiopsclub@163.com>
2025-08-14 08:41:43 +08:00
yxxhero 687159a65b
build: update Helm and plugin versions to v3.18.4 and v3.12.3 (#2093)
Signed-off-by: yxxhero <aiopsclub@163.com>
2025-07-09 18:13:21 +08:00
yxxhero c03f86de0f
build: update Helm to v3.18.3 and related dependencies (#2082)
Signed-off-by: yxxhero <aiopsclub@163.com>
2025-06-17 15:09:42 +08:00
yxxhero 131e3f3f04
fix: update helm-diff to version 3.12.2 in CI and Dockerfiles (#2073)
Signed-off-by: yxxhero <aiopsclub@163.com>
2025-06-11 21:18:31 +08:00
yxxhero 74985fc54c
build: update Helm to v3.18.2 and adjust related configurations (#2064)
* build: update Helm to v3.18.2 and adjust related configurations

Signed-off-by: yxxhero <aiopsclub@163.com>

* fix tests

Signed-off-by: yxxhero <aiopsclub@163.com>

---------

Signed-off-by: yxxhero <aiopsclub@163.com>
2025-06-04 21:33:00 +08:00
yxxhero fe1e51e5ef
build: update Helm and plugin versions in CI and Dockerfiles (#2059)
* build: update Helm and plugin versions in CI and Dockerfiles

Signed-off-by: yxxhero <aiopsclub@163.com>
2025-05-30 11:45:28 +08:00
yxxhero e197a90597
build(helm) update to v3.18.0 (#2044)
* build(helm) update to v3.18.0

Signed-off-by: yxxhero <aiopsclub@163.com>
2025-05-21 16:57:36 +08:00
Quan TRAN 84bc096576
[sops, age] update to have SSH key support with sops (#2036)
Signed-off-by: Quan TRAN <itscaro@users.noreply.github.com>
2025-05-12 21:22:04 +08:00
yxxhero 7624697b68
build: update Helm to v3.17.3 and update related Dockerfiles (#1993)
fix conflicts

Signed-off-by: yxxhero <aiopsclub@163.com>
2025-04-11 10:13:22 -04:00
yxxhero aa6af7c272
build: update Helm plugin versions in CI and Dockerfiles (#1995)
Signed-off-by: yxxhero <aiopsclub@163.com>
2025-04-11 19:10:56 +08:00
yxxhero a1f2cb3877
build: update Helm to v3.17.2 and related dependencies (#1965)
Signed-off-by: yxxhero <aiopsclub@163.com>
2025-03-17 10:51:10 -04:00
yxxhero 489b6c9362
build: update golang version to 1.24 and golangci-lint to v1.64.5 (#1949)
* build: update golang version to 1.24 and golangci-lint to v1.64.5

Signed-off-by: yxxhero <aiopsclub@163.com>

* build: update golang version to 1.24 in Dockerfiles

Signed-off-by: yxxhero <aiopsclub@163.com>

* fix more issues

Signed-off-by: yxxhero <aiopsclub@163.com>

---------

Signed-off-by: yxxhero <aiopsclub@163.com>
2025-02-28 12:10:16 +08:00
yxxhero 9c380668ec
build: update Helm to v3.17.1 and related dependencies (#1928)
Signed-off-by: yxxhero <aiopsclub@163.com>
2025-02-13 15:48:05 +08:00
Daniel Kugler 486134ca4e
Bump kubectl to current version (1.32.1) (#1924) 2025-02-13 07:33:08 +08:00
yxxhero 2784a4fbd7
build: update helm-diff to v3.9.14 in Dockerfiles and init.go (#1877)
Signed-off-by: yxxhero <aiopsclub@163.com>
2025-01-17 19:54:04 -05:00
yxxhero d0b75412d1
update helm and k8s versions in ci, dockerfiles, and go.mod (#1872) 2025-01-16 09:36:39 +08:00
Zubair Haque 4429e41e1f
update kubectl version (1.30) to stay up to date with new releases (#1867) 2025-01-14 09:40:16 +08:00
Zubair Haque b58ad9e514
update sops versions to 3.9.3 (#1861)
Signed-off-by: zhaque44 <haque.zubair@gmail.com>
2025-01-10 20:33:12 +08:00
yxxhero 14677e288f
build: update helm-diff to v3.9.13 in Dockerfiles and init.go (#1841)
Signed-off-by: yxxhero <aiopsclub@163.com>
2024-12-20 20:21:52 -05:00
yxxhero 53f25a1fd3
build: update Helm version to v3.16.4 in CI and Dockerfiles (#1837)
Signed-off-by: yxxhero <aiopsclub@163.com>
2024-12-18 15:50:35 +08:00
Zubair Haque d383a0fcb6
feat: updating sops version to 3.9.2 (#1834)
updating sops version to 3.9.2

Signed-off-by: zhaque44 <haque.zubair@gmail.com>
2024-12-16 08:49:47 +08:00
yxxhero 0b1746bdf3
build: update Helm version to v3.16.3 in CI and Dockerfiles (#1791)
* build: update Helm version to v3.16.3 in CI and Dockerfiles

Signed-off-by: yxxhero <aiopsclub@163.com>

* fix: update Helm SHA256 checksums in Dockerfiles

Signed-off-by: yxxhero <aiopsclub@163.com>

---------

Signed-off-by: yxxhero <aiopsclub@163.com>
Co-authored-by: Zubair Haque <haque.zubair@gmail.com>
2024-11-17 17:35:51 -05:00
yxxhero 96d716ae00
fix: update helm-diff to version 3.9.12 in CI and Dockerfiles (#1792)
Signed-off-by: yxxhero <aiopsclub@163.com>
2024-11-17 17:34:10 -05:00
Zubair Haque 922cc15c50
feat: update sops version to 3.9.1 (#1742)
Signed-off-by: zhaque44 <haque.zubair@gmail.com>
2024-10-16 18:31:25 +08:00
yxxhero cd0f603d4f
feat(helm-version): Update helm version to v3.16.2 (#1733)
Signed-off-by: yxxhero <aiopsclub@163.com>
2024-10-11 08:03:02 -05:00
Zubair Haque 9471fa29fa
chore: Update AGE var version (#1731) 2024-10-11 06:01:58 +08:00
yxxhero d6f5dbd2a9
feat: Update Docker image to ubuntu:24.10 (#1726)
Signed-off-by: yxxhero <aiopsclub@163.com>
2024-10-06 20:22:12 -05:00
yxxhero b375a31f20
feat: update go version and adjust dependencies in Dockerfile and go.mod (#1722)
* feat: update go version and adjust dependencies in Dockerfile and go.mod

Signed-off-by: yxxhero <aiopsclub@163.com>

* fix lint

Signed-off-by: yxxhero <aiopsclub@163.com>

* fix lint

Signed-off-by: yxxhero <aiopsclub@163.com>

---------

Signed-off-by: yxxhero <aiopsclub@163.com>
2024-09-30 09:21:44 -04:00
yxxhero 04b5151285
feat(pkg/app): Update Helm Diff version to v3.9.11 (#1720)
Signed-off-by: yxxhero <aiopsclub@163.com>
2024-09-26 09:46:58 -05:00
yxxhero 2d863a7910
bump all helm to 3.16.1 (#1708)
* bump all helm to 3.16.1

Signed-off-by: yxxhero <aiopsclub@163.com>

* bump all helm to 3.16.1

Signed-off-by: yxxhero <aiopsclub@163.com>

---------

Signed-off-by: yxxhero <aiopsclub@163.com>
2024-09-13 08:37:33 +08:00
Patrick Hobusch 2ec61d8491
chore: Update Ubuntu image to LTS version 24.04 (#1696)
Signed-off-by: Patrick Hobusch <patrick@hobusch.net>
2024-09-09 21:11:53 +08:00
yxxhero d9eb271ab7
feat: upgrade helm-diff plugin version to 3.9.10 (#1688) 2024-09-03 09:05:22 +08:00
Zubair Haque 55681c206c
feat: update kustomize version (#1677) 2024-08-26 06:26:26 +08:00
Zubair Haque 6374c6368d
fix: CI linting issues with dockerfiles (#1671) 2024-08-18 05:50:28 +08:00
yxxhero 118b949787
build(deps): update Helm version to v3.15.4 (#1668) 2024-08-16 10:07:39 +08:00
yxxhero 86664f57f6
build(deps): helm-s3: v0.16.0 -> v0.16.2 (#1652)
build(deps): update helm plugins to latest versions

Update the versions of the helm plugins in the Dockerfile to the latest stable versions:
- helm-s3: v0.16.0 -> v0.16.2

This update ensures that the Docker image is using the most up-to-date versions of the helm plugins s3.

Signed-off-by: yxxhero <aiopsclub@163.com>
2024-08-03 08:53:06 +09:00
Zubair Haque fec88ed9d8
feat: Update sops version 3.9.0 (#1634) 2024-07-18 10:30:53 +08:00
yxxhero d61df9253d
feat: bump helm to 3.15.3 (#1627) 2024-07-12 08:45:21 +08:00
yxxhero c7f0fe5d14
bump helm-diff to 3.9.8 (#1582)
Signed-off-by: yxxhero <aiopsclub@163.com>
2024-06-21 13:45:16 +08:00
yxxhero a9bf399fa8
bump helm to 3.15.2 (#1581) 2024-06-21 09:27:47 +08:00
yxxhero 2f408163cf
bump helm-diff to 3.9.7 (#1536)
Signed-off-by: yxxhero <aiopsclub@163.com>
2024-05-27 08:48:56 +08:00