Commit Graph

2 Commits

Author SHA1 Message Date
Aditya Menon 7bb2a1f19b
fix: replace 60s timeout with reader-writer locks for OCI chart caching
Address PR review feedback from @champtar about the OCI chart caching
mechanism. The previous implementation used a 60-second timeout which
was arbitrary and caused race conditions when helm deployments took
longer (e.g., deployments triggering scaling up/down).

Changes:
- Replace 60s refresh marker timeout with proper reader-writer locks
- Use shared locks (RLock) when using cached charts (allows concurrent reads)
- Use exclusive locks (Lock) when refreshing/downloading charts
- Hold locks during entire helm operation lifecycle (not just during download)
- Add getNamedRWMutex() for in-process RW coordination
- Update PrepareCharts() to return locks map for lifecycle management
- Add chartLockReleaser in run.go to release locks after helm callback
- Remove unused mutexMap and getNamedMutex (replaced by RW versions)
- Add comprehensive tests for shared/exclusive lock behavior

This eliminates the race condition where one process could delete a
cached chart while another process's helm command was still using it.

Fixes review comment on PR #2298

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
2025-11-27 11:08:47 +01:00
Aditya Menon 5d9db12a99
fix: resolve issues #2295, #2296, #2297 and OCI registry login
This PR fixes four related bugs affecting chart preparation, caching,
and OCI registry authentication.

Issue #2295: OCI chart cache conflicts with parallel helmfile processes
- Added filesystem-level locking using flock for cross-process sync
- Implements double-check locking pattern for efficiency
- Retry logic with 5-minute timeout and 3 retries
- Refactored into reusable acquireChartLock() helper function
- Added refresh marker coordination for cross-process cache management

Issue #2296: helmDefaults.skipDeps and helmDefaults.skipRefresh ignored
- Check both CLI options AND helmDefaults when deciding to skip repo sync

Issue #2297: Local chart + transformers causes panic
- Normalize local chart paths to absolute before calling chartify

OCI Registry Login URL Fix:
- Added extractRegistryHost() to extract just the registry host from URLs
- Fixed SyncRepos to use extracted host for OCI registry login
- e.g., "account.dkr.ecr.region.amazonaws.com/charts" ->
        "account.dkr.ecr.region.amazonaws.com"

Test Plan:
- Unit tests for issues #2295, #2296, #2297
- Unit tests for OCI registry login (extractRegistryHost, SyncRepos_OCI)
- Integration tests for issues #2295 and #2297
- All existing unit tests pass (including TestLint)

Fixes #2295
Fixes #2296
Fixes #2297

Signed-off-by: Aditya Menon <amenon@canarytechnologies.com>
2025-11-27 11:08:47 +01:00