This change improves the responsiveness of the operator when handling
deletion requests by running sync operations in the background and
using context cancellation to interrupt stuck operations.
Changes:
- Add context field to Cluster struct, passed through New()
- Add Cancel() method to cancel cluster's context
- Add StartSync/EndSync/NeedsResync for managing background sync state
- Run Sync() in a background goroutine so worker can process other events
- Add context-aware DB connection methods (initDbConnWithContext)
- Add RetryWithContext() that respects context cancellation
- Cancel cluster context immediately when DeletionTimestamp detected
- Use context-aware connections in syncRoles/syncDatabases
- StartSync/NeedsResync check context cancellation to prevent new syncs
during deletion (no need for separate deleted flag)
Flow:
1. Sync event spawns background goroutine and returns immediately
2. If another sync arrives while one is running, needsResync flag is set
3. When sync completes, it checks needsResync and requeues if needed
4. Delete cancels context -> stuck DB operations return early -> mutex released
5. StartSync/NeedsResync return false when context cancelled
6. Delete proceeds without waiting for slow/stuck sync operations
When a Postgres cluster has a finalizer, deleting it sets a DeletionTimestamp
but doesn't remove the object until the finalizer is cleared. The operator
was not properly handling these DeletionTimestamp changes:
1. postgresqlUpdate() was filtering out events where only DeletionTimestamp
changed (it only checked Spec and Annotations), causing the delete to
never be processed.
2. EventUpdate case in processEvent() didn't check for DeletionTimestamp,
so even if the event reached the processor, it would run Update() instead
of Delete().
3. removeFinalizer() used a cached object with stale resourceVersion,
causing "object has been modified" errors.
Fixes:
- Add explicit DeletionTimestamp check in postgresqlUpdate() to queue the event
- Add DeletionTimestamp check in EventUpdate to call Delete() when set
- Fetch latest object from API before removing finalizer to avoid conflicts
* fix switchover schedule tests
Previously the tests would fail depending on the local time zone and the
time of day the test was being run.
---------
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
Co-authored-by: Mikkel Oscar Lyderik Larsen <mikkeloscar@users.noreply.github.com>
Add service selector comparison to compareServices
This is necessary for the proper switch of `kubernetes_use_configmaps` configuration value, as master service should have different label selector setup for those.
* do not remove publications of slot defined in manifest
* improve condition to sync streams
* init publication tables map when adding manifest slots
* need to update c.Stream when there is no update
Create the second PDB to cover Pods with a special "critical operation" label set.
This label is going to be assigned to all pg cluster's Pods by the Operator during a PG major version upgrade, by Patroni during a cluster/replica bootstrap. It can also be set manually or by any other automation tool.
* sync all resources to cluster fields (CronJob, Streams, Patroni resources)
* separated sync and delete logic for Patroni resources
* align delete streams and secrets logic with other resources
* rename gatherApplicationIds to getDistinctApplicationIds
* improve slot check before syncing streams CRD
* add ownerReferences and annotations diff to Patroni objects
* add extra sync code for config service so it does not get too ugly
* some bugfixes when comparing annotations and return err on found
* sync Patroni resources on update event and extended unit tests
* add config service/endpoint owner references check to e2e tes
* feat(498): Add ownerReferences to managed entities
* empty owner reference for cross namespace secret and more tests
* update ownerReferences of existing resources
* removing ownerReference requires Update API call
* CR ownerReference on PVC blocks pvc retention policy of statefulset
* make ownerreferences optional and disabled by default
* update unit test to check len ownerReferences
* update codegen
* add owner references e2e test
* update unit test
* add block_owner_deletion field to test owner reference
* fix typos and update docs once more
* reflect code feedback
---------
Co-authored-by: Max Begenau <max@begenau.com>
* Annotate PVC on Sync/Update, not only change PVC template
* Don't rotate pods when only annotations changed
* Annotate Logical Backup's and Pooler's pods
* Annotate PDB, Endpoints created by the Operator, Secrets, Logical Backup jobs
Inherited annotations are only added/updated, not removed
* Secrets deletion config
* Update e2e/tests/test_e2e.py
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
---------
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* make bucket prefix for logical backup configurable
* include container comparison in logical backup diff
* add unit test and update description for compareContainers
* don't rely on users putting / in the config - reflect other comments from review
* add unit test and documentation for finalizers
* error msg with lower case and cover sync case
* try to avoid adding json-patch dependency
* use Update to remove finalizer
* changing status and finalizer during create
* do not call Delete() twice
* Add Finalizer functions to Cluster; add/remove finalizer on Create/Delete events
* Check if clusters have a deletion timestamp and we missed that event. Run Delete() and remove finalizer when done.
* Fix nil handling when using Service from map; Remove Service, Endpoint entries from their maps - just like with Secrets
* Add handling of ResourceNotFound to all delete functions (Service, Endpoint, LogicalBackup CronJob, PDB and Secret) - this is not a real error when deleting things
* Emit events when there are issues deleting resources to the user is informed
* Depend the removal of the Finalizer on all resources being deleted successfully first. Otherwise the next sync run should let us try again
* Add config option to enable finalizers
* Removed dangling whitespace at EOL
* config.EnableFinalizers is a bool pointer
---------
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* Allow drop slots when it gets deleted from the manifest
* use leader instead replica to query slots
* fix and extend unit tests for config update checks
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* add toggle to turn off readiness probes
* include PodManagementPolicy and ReadinessProbe in stateful set comparison
* add URI scheme to generated readiness probe
* deprecate ClusterName field of Postgresql type
* remove for teamId from operator API endpints /status /logs /history
* update dns_format_string and yaml template in UI
* allow in place pw rotation of system users
* block postgres user from rotation
* mark pooler pods for replacement
* adding podsGetter where pooler is synced in unit tests
* move rotation code in extra function
* removing inner goroutine in cluster.Switchover and resolve race between processPodEvent and unregisterPodSubscriber
* unlock mutex after handling event, now with non-blocking default case
* reverse membership for additional owner roles
* remove type RoleOriginSpilo
* use e2e images with cron_admin inside
* let operator resolve reversed membership
* make additional owner roles part of the sync user strategy
* add more context in the docs about additional_owner_roles