Commit Graph

22 Commits

Author SHA1 Message Date
Thomas Rosenstein 0affa9425a Improve sync responsiveness with background execution and context cancellation
This change improves the responsiveness of the operator when handling
deletion requests by running sync operations in the background and
using context cancellation to interrupt stuck operations.

Changes:
- Add context field to Cluster struct, passed through New()
- Add Cancel() method to cancel cluster's context
- Add StartSync/EndSync/NeedsResync for managing background sync state
- Run Sync() in a background goroutine so worker can process other events
- Add context-aware DB connection methods (initDbConnWithContext)
- Add RetryWithContext() that respects context cancellation
- Cancel cluster context immediately when DeletionTimestamp detected
- Use context-aware connections in syncRoles/syncDatabases
- StartSync/NeedsResync check context cancellation to prevent new syncs
  during deletion (no need for separate deleted flag)

Flow:
1. Sync event spawns background goroutine and returns immediately
2. If another sync arrives while one is running, needsResync flag is set
3. When sync completes, it checks needsResync and requeues if needed
4. Delete cancels context -> stuck DB operations return early -> mutex released
5. StartSync/NeedsResync return false when context cancelled
6. Delete proceeds without waiting for slow/stuck sync operations
2025-12-14 20:43:53 +00:00
Felix Kunde 34df486f00
fix flaky comparison unit test of retruned errors (#2822) 2024-12-19 17:35:01 +01:00
Felix Kunde 8cc6796537
fix comparing stream annotations and improve unit test (#2820) 2024-12-18 11:22:08 +01:00
Felix Kunde d44bfabe78
do not use extra labels to list stream CRDs (#2803)
* do not use extra labels to list stream CRDs
* add diff on labels for streams + unit test coverage
2024-12-17 08:54:37 +01:00
Felix Kunde 80ef38f7f0
add resource annotation and ignore recovery type (#2817)
* add resource annotation and ignore recovery type
* Update docs/reference/cluster_manifest.md

---------

Co-authored-by: Ida Novindasari <idanovinda@gmail.com>
2024-12-16 18:17:19 +01:00
Felix Kunde 301462c415
remove streams delete and extend unit tests (#2737) 2024-12-16 18:13:52 +01:00
Felix Kunde 2ae51fb9ce
reflect linter feedback, remove unused argumnents and redundant type from arrays (#2739)
* reflect linter feedback, remove unused argumnents and redundant literal definitions
* add logical backup to TestCreate unit test
2024-08-27 17:56:07 +02:00
Felix Kunde 2f7e3ee847
fix stream duplication on operator restart (#2733)
* fix stream duplication on operator restart
* add try except to streams e2e test
2024-08-20 14:38:07 +02:00
Felix Kunde c7ee34ed12
fix sync streams and add diffs for annotations and owner references (#2728)
* extend and improve hasSlotsInSync unit test
* fix sync streams and add diffs for annotations and owner references
* incl. current annotations as desired where we do not fully control them
* added one more unit test and fixed sub test names
* pass maintenance windows to function and update unit test
2024-08-14 12:56:14 +02:00
Felix Kunde 25ccc87317
sync all resources to cluster fields (#2713)
* sync all resources to cluster fields (CronJob, Streams, Patroni resources)
* separated sync and delete logic for Patroni resources
* align delete streams and secrets logic with other resources
* rename gatherApplicationIds to getDistinctApplicationIds
* improve slot check before syncing streams CRD
* add ownerReferences and annotations diff to Patroni objects
* add extra sync code for config service so it does not get too ugly
* some bugfixes when comparing annotations and return err on found
* sync Patroni resources on update event and extended unit tests
* add config service/endpoint owner references check to e2e tes
2024-08-13 10:06:46 +02:00
Ida Novindasari 94d36327ba
stream: slot and FES should not be created if the publication creation fails (#2704)
* slot should not be created if the publication creation fails
* not create FES resource when slot doesn't exist
2024-08-02 15:09:37 +02:00
Ida Novindasari 31f474a95c
Enable slot and publication deletion when stream application is removed (#2684)
* refactor syncing publication section
* update createOrUpdateStream function to allow resource deletion when removed from manifest
* add minimal FES CRD to enable FES resources creation for E2E test
* fix bug of removing manifest slots in syncStream
* e2e test: fixing typo with major upgrade test
* e2e test: should create and delete FES resource
* e2e test: should not delete manual created resources
* e2e test: enable cluster role for FES with patching instead of deploying in manifest
2024-07-25 12:00:23 +02:00
Felix Kunde 37d6993439
remove stream resources after drop from Postgres manifest (#2563)
* remove stream resources after drop from Postgres manifest
2024-06-27 14:30:52 +02:00
Felix Kunde 3d9d0258f0
patch event stream CRD instead of update (#2535) 2024-02-14 18:20:02 +01:00
Felix Kunde e03fdaaa51
add support for recovery section in event streams (#2421) 2023-09-19 17:15:50 +02:00
Felix Kunde 29cec0ceda
configurable resources for logical backup pod template (#710)
* new config options to specify resources for logical backup jobs
* bug in logical backup script for s3 dumps
* define enum for logical_backup_provider
* changed order of logical backup azure options
* fix unit test for stream comparison
2023-01-05 15:19:36 +01:00
Felix Kunde 4534a4cd9e
fix syncing of stream CRDs (#2152)
* fix syncing of stream CRDs and improve corresponding unit tests
2022-12-30 13:09:15 +01:00
Felix Kunde e80cccb93b
use random short name for stream CRDs (#2137)
* use random short name for stream CRDs
2022-12-27 16:52:01 +01:00
Felix Kunde ef324494a0
fetch pooler and fes_user system user only when corresponding features are used (#2009)
* fetch pooler and fes_user system user only when corresponding features are used
* cover error case in unit test
* use string formatting instead of +
2022-08-24 16:28:49 +02:00
Felix Kunde 2333d531d3
Fix deletion of event streams resources (#1831)
* fix deletion of event streams
* create cluster field to store stream application ids
2022-03-31 11:48:37 +02:00
Felix Kunde 1d88009ec4
fix comparison of event stream array (#1817)
* fix comparison of event stream array
* turn optional stream fields to pointers
2022-03-18 15:06:17 +01:00
Felix Kunde d8a159ef1a
create CDC event stream CRD (#1570)
* provide event stream API
* check manifest settings for logical decoding before creating streams
* operator updates Postgres config and creates replication user
* name FES like the Postgres cluster
* add delete case and fix updating streams + update unit test
* check if fes CRD exists before syncing
* existing slot must use the same plugin
* make id and payload columns configurable
* sync streams only when they are defined in manifest
* introduce applicationId for separate stream CRDs
* add FES to RBAC in chart
* disable streams in chart
* switch to pgoutput plugin and let operator create publications
* reflect code review and additional refactoring

Co-authored-by: Paŭlo Ebermann <paul.ebermann@zalando.de>
2022-02-28 10:09:42 +01:00