This change improves the responsiveness of the operator when handling
deletion requests by running sync operations in the background and
using context cancellation to interrupt stuck operations.
Changes:
- Add context field to Cluster struct, passed through New()
- Add Cancel() method to cancel cluster's context
- Add StartSync/EndSync/NeedsResync for managing background sync state
- Run Sync() in a background goroutine so worker can process other events
- Add context-aware DB connection methods (initDbConnWithContext)
- Add RetryWithContext() that respects context cancellation
- Cancel cluster context immediately when DeletionTimestamp detected
- Use context-aware connections in syncRoles/syncDatabases
- StartSync/NeedsResync check context cancellation to prevent new syncs
during deletion (no need for separate deleted flag)
Flow:
1. Sync event spawns background goroutine and returns immediately
2. If another sync arrives while one is running, needsResync flag is set
3. When sync completes, it checks needsResync and requeues if needed
4. Delete cancels context -> stuck DB operations return early -> mutex released
5. StartSync/NeedsResync return false when context cancelled
6. Delete proceeds without waiting for slow/stuck sync operations
* extend and improve hasSlotsInSync unit test
* fix sync streams and add diffs for annotations and owner references
* incl. current annotations as desired where we do not fully control them
* added one more unit test and fixed sub test names
* pass maintenance windows to function and update unit test
* sync all resources to cluster fields (CronJob, Streams, Patroni resources)
* separated sync and delete logic for Patroni resources
* align delete streams and secrets logic with other resources
* rename gatherApplicationIds to getDistinctApplicationIds
* improve slot check before syncing streams CRD
* add ownerReferences and annotations diff to Patroni objects
* add extra sync code for config service so it does not get too ugly
* some bugfixes when comparing annotations and return err on found
* sync Patroni resources on update event and extended unit tests
* add config service/endpoint owner references check to e2e tes
* refactor syncing publication section
* update createOrUpdateStream function to allow resource deletion when removed from manifest
* add minimal FES CRD to enable FES resources creation for E2E test
* fix bug of removing manifest slots in syncStream
* e2e test: fixing typo with major upgrade test
* e2e test: should create and delete FES resource
* e2e test: should not delete manual created resources
* e2e test: enable cluster role for FES with patching instead of deploying in manifest
* new config options to specify resources for logical backup jobs
* bug in logical backup script for s3 dumps
* define enum for logical_backup_provider
* changed order of logical backup azure options
* fix unit test for stream comparison
* provide event stream API
* check manifest settings for logical decoding before creating streams
* operator updates Postgres config and creates replication user
* name FES like the Postgres cluster
* add delete case and fix updating streams + update unit test
* check if fes CRD exists before syncing
* existing slot must use the same plugin
* make id and payload columns configurable
* sync streams only when they are defined in manifest
* introduce applicationId for separate stream CRDs
* add FES to RBAC in chart
* disable streams in chart
* switch to pgoutput plugin and let operator create publications
* reflect code review and additional refactoring
Co-authored-by: Paŭlo Ebermann <paul.ebermann@zalando.de>