Commit Graph

315 Commits

Author SHA1 Message Date
Thomas Rosenstein 0affa9425a Improve sync responsiveness with background execution and context cancellation
This change improves the responsiveness of the operator when handling
deletion requests by running sync operations in the background and
using context cancellation to interrupt stuck operations.

Changes:
- Add context field to Cluster struct, passed through New()
- Add Cancel() method to cancel cluster's context
- Add StartSync/EndSync/NeedsResync for managing background sync state
- Run Sync() in a background goroutine so worker can process other events
- Add context-aware DB connection methods (initDbConnWithContext)
- Add RetryWithContext() that respects context cancellation
- Cancel cluster context immediately when DeletionTimestamp detected
- Use context-aware connections in syncRoles/syncDatabases
- StartSync/NeedsResync check context cancellation to prevent new syncs
  during deletion (no need for separate deleted flag)

Flow:
1. Sync event spawns background goroutine and returns immediately
2. If another sync arrives while one is running, needsResync flag is set
3. When sync completes, it checks needsResync and requeues if needed
4. Delete cancels context -> stuck DB operations return early -> mutex released
5. StartSync/NeedsResync return false when context cancelled
6. Delete proceeds without waiting for slow/stuck sync operations
2025-12-14 20:43:53 +00:00
Thomas Rosenstein 9ca87d9db4 Fix deletion timestamp handling for clusters with finalizers
When a Postgres cluster has a finalizer, deleting it sets a DeletionTimestamp
but doesn't remove the object until the finalizer is cleared. The operator
was not properly handling these DeletionTimestamp changes:

1. postgresqlUpdate() was filtering out events where only DeletionTimestamp
   changed (it only checked Spec and Annotations), causing the delete to
   never be processed.

2. EventUpdate case in processEvent() didn't check for DeletionTimestamp,
   so even if the event reached the processor, it would run Update() instead
   of Delete().

3. removeFinalizer() used a cached object with stale resourceVersion,
   causing "object has been modified" errors.

Fixes:
- Add explicit DeletionTimestamp check in postgresqlUpdate() to queue the event
- Add DeletionTimestamp check in EventUpdate to call Delete() when set
- Fetch latest object from API before removing finalizer to avoid conflicts
2025-12-14 20:29:01 +01:00
Felix Kunde 1af4c50ed0
bump to v1.15.0 (#2965)
* bump to v1.15.0
* more linter hints
* update dependencies of kubectl-pg plugin
2025-10-21 11:56:33 +02:00
Felix Kunde 3bc244fe39
bump dependencies and reflect linter suggestions (#2963) 2025-10-16 10:23:36 +02:00
Jociele Padilha fa4bc21538
upgrade Go from 1.23.4 to 1.25.0 (#2945)
* upgrade go to 1.25
* add minor version to be Go 1.25.0
* revert the Go version on README to keep the history of the release
2025-08-19 14:40:39 +02:00
Felix Kunde 6035fdd58e
bump operator to 1.14.0 (#2827) 2024-12-23 12:12:33 +01:00
Ida Novindasari 470a1eab89
Add support for pg17 and remove pg12 (#2773)
* Add support for pg17
* use new gcov2lcov-action
* Use ghcr spilo-17
* Update SPILO_CURRENT and SPILO_LAZY
* Update e2e/run.sh

---------

Co-authored-by: Polina Bungina <27892524+hughcapet@users.noreply.github.com>
2024-12-20 11:22:52 +01:00
Felix Kunde 2ae51fb9ce
reflect linter feedback, remove unused argumnents and redundant type from arrays (#2739)
* reflect linter feedback, remove unused argumnents and redundant literal definitions
* add logical backup to TestCreate unit test
2024-08-27 17:56:07 +02:00
Felix Kunde a08d1679f2
align sync and update logs (#2738) 2024-08-27 09:58:32 +02:00
Felix Kunde cc9074c184
Bump operator to v1.13.0 (#2729)
* bump operator to v1.13.0
* align configmap with CRD config
* remove default from CRD config option additional_secret_mount_path
* enable automatic major version upgrades by default
2024-08-22 12:16:27 +02:00
fahed dorgaa aad03f71ea
fix golangci-lint issues (#2715)
Signed-off-by: fahed dorgaa <fahed.dorgaa@gmail.com>
Co-authored-by: fahed dorgaa <fahed.dorgaa.ext@corp.ovh.com>
Co-authored-by: Matthias Adler <macedigital@users.noreply.github.com>
2024-08-14 12:54:44 +02:00
Felix Kunde a87307e56b
Feat: enable owner references (#2688)
* feat(498): Add ownerReferences to managed entities
* empty owner reference for cross namespace secret and more tests
* update ownerReferences of existing resources
* removing ownerReference requires Update API call
* CR ownerReference on PVC blocks pvc retention policy of statefulset
* make ownerreferences optional and disabled by default
* update unit test to check len ownerReferences
* update codegen
* add owner references e2e test
* update unit test
* add block_owner_deletion field to test owner reference
* fix typos and update docs once more
* reflect code feedback

---------

Co-authored-by: Max Begenau <max@begenau.com>
2024-08-09 17:58:25 +02:00
Felix Kunde 85b8058029
bump spilo to 16-3.3, drop support for pg11 (#2706)
* bump spilo to 16-3.3, drop support for pg11
* update README
2024-08-09 14:47:23 +02:00
Ida Novindasari e6ae9e3772
Implement per-cluster maintenance window for Postgres automatic upgrade (#2710)
* implement maintenance window for major version upgrade 
* e2e test: fix major version upgrade test and extend with the time window
* unit test: add iteration to test isInMaintenanceWindow
* UI: show the window and enable edit via UI
2024-08-09 14:07:35 +02:00
Felix Kunde 7c7aa96935
bump to v1.12.2 (#2664) 2024-06-14 10:53:17 +02:00
Felix Kunde 2e1583e9c0
bump to v1.12.1 (#2658)
* bump to v1.12.1
* align Python version in setup.py with base image
2024-06-13 10:40:07 +02:00
Felix Kunde 6cde8e8c0b
Bump to v1.12.0 (#2639)
* bump tp v1.12.0
* code-generator and apiextensions-apiserver still on to 0.25.9 to allow code-generation on GH
* bump go in github action and mini fix in UI
* update UI Dockerfile

---------

Co-authored-by: Ida Novindasari <idanovinda@gmail.com>
2024-05-31 15:29:29 +02:00
Felix Kunde 1b08ee1acf
switch to ghcr image in helm chart and examples (#2634)
* switch to ghcr image in helm chart and examples
* change logical backup config for helm chart
* change internal default for logical backup image config to ghcr, too
2024-05-21 17:43:37 +02:00
Motte 13d6594cdf
Secrets deletion config (#2582)
* Secrets deletion config
* Update e2e/tests/test_e2e.py

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>

---------

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2024-05-10 16:31:21 +02:00
Felix Kunde 83878fe447
make bucket prefix for logical backup configurable (#2609)
* make bucket prefix for logical backup configurable
* include container comparison in logical backup diff
* add unit test and update description for compareContainers
* don't rely on users putting / in the config - reflect other comments from review
2024-04-23 14:24:04 +02:00
Felix Kunde 0f96eb20bf
bump tp v1.11.0 (#2580) 2024-03-15 15:02:39 +01:00
Felix Kunde 08089ed4b4
add option to prevent PVC removal on cluster deletion (#2579)
* add option to prevent PVC removal on cluster deletion
* Update docs/reference/operator_parameters.md

Co-authored-by: Motte <37443982+dmotte@users.noreply.github.com>
2024-03-14 17:01:26 +01:00
Jociele Padilha a5663da64f
add the pg version 16 (#2557)
* add the pg version 16

* add comma after pg16 in crds api

* change minimal_major_version to 12

* add new spilo image for pg16

* edit the registry from current and lazy spilo

* Update e2e/run.sh

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>

* Update README.md

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>

* add pg 11 to be compatible for the existing DBs

* update pq, pyyaml,k8s and kind version

* skip test_infrastructure_roles

* skip another test

* remove the skipping

* adjust the verification of new Patroni version states

---------

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2024-03-13 16:43:25 +01:00
Felix Kunde 8bd9080798
return create and sync error, not setStatus error (#2574)
* return create and sync error, not possible status set error
* update documentation and improve deletion logs
2024-03-12 16:31:59 +01:00
Felix Kunde 23f4fdb327
update go and dependencies (#2554) 2024-02-23 13:58:11 +01:00
Felix Kunde 29ea863faf
allow empty resources when defaults are empty (#2524)
* allow empty resources when defaults are empty
* update codegen
* add more unit tests and remove internal resources defaults
* a unit test for min limit and raising to request
* uncomment defaults in example configmap
* simplifying pooler pod generation unit test
2024-02-09 07:35:53 +01:00
Felix Kunde 4a0c483514
add unit test and documentation for finalizers (#2509)
* add unit test and documentation for finalizers
* error msg with lower case and cover sync case
* try to avoid adding json-patch dependency
* use Update to remove finalizer
* changing status and finalizer during create
* do not call Delete() twice
2024-01-22 12:13:40 +01:00
Christian Rohmann 743aade45f
Use finalizers to avoid losing delete events and to ensure full resource cleanup (#941)
* Add Finalizer functions to Cluster; add/remove finalizer on Create/Delete events
* Check if clusters have a deletion timestamp and we missed that event. Run Delete() and remove finalizer when done.
* Fix nil handling when using Service from map; Remove Service, Endpoint entries from their maps - just like with Secrets
* Add handling of ResourceNotFound to all delete functions (Service, Endpoint, LogicalBackup CronJob, PDB and Secret) - this is not a real error when deleting things
* Emit events when there are issues deleting resources to the user is informed
* Depend the removal of the Finalizer on all resources being deleted successfully first. Otherwise the next sync run should let us try again
* Add config option to enable finalizers
* Removed dangling whitespace at EOL
* config.EnableFinalizers is a bool pointer

---------

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2024-01-04 16:22:53 +01:00
Davide Bizzarri 3ca26d0dc8
Make PodDisruptionBudget master label selector optional (#2364)
* Make PDB master label selector optional

* Update pkg/apis/acid.zalan.do/v1/crds.go

---------

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2024-01-04 15:58:24 +01:00
Stef Graces bbba15f9bf
Logical backup secret (#2051)
* Add logical backup secret
2024-01-04 11:09:16 +01:00
Ida Novindasari 36389b27bc
Enable specifying PVC retention policy for auto deletion (#2343)
* Enable specifying PVC retention policy for auto deletion
* enable StatefulSetAutoDeletePVC in featureGates
* skip node affinity test
2023-09-08 13:17:37 +02:00
Felix Kunde 552bd26c0f
bump to v1.10.1 (#2410)
* bump to v1.10.1
2023-09-07 22:46:26 +02:00
Felix Kunde c580e509d3
Bump v1.10.0 (#2299)
* bump to v1.9.1
* update year in license and add links to more blog posts
* bump go to 1.19 and update dependencies
* go for 1.10.0 instead of 1.9.1
* fix unit test - removed obsolete ClusterName field
* fix DNS template in UI helm chart deployment file
2023-04-20 18:21:43 +02:00
drivebyer fc86c44ec3
Fix potential panic (#2289)
Signed-off-by: drivebyer <yang.wu@daocloud.io>
2023-04-17 14:34:18 +02:00
Felix Kunde 30b612489a
bump to v1.9.0 (#2177)
* bump to v1.9.0
* some minor UI config updates
* bump UI package.json to 1.9.0, too
2023-01-30 10:15:16 +01:00
Felix Kunde 28cd2f188a
better backwards compatibility with old DNS name format for LBs (#2171)
* better backwards compatibility with legacy DNS name format for LBs
* improve docs on DNS string
2023-01-17 10:06:11 +01:00
Felix Kunde 29cec0ceda
configurable resources for logical backup pod template (#710)
* new config options to specify resources for logical backup jobs
* bug in logical backup script for s3 dumps
* define enum for logical_backup_provider
* changed order of logical backup azure options
* fix unit test for stream comparison
2023-01-05 15:19:36 +01:00
Stef Graces bb2617a53f
Add logical backup for azure (#2052)
* Add logical backup for azure
2023-01-05 12:16:41 +01:00
yoshihikoueno becf8a4715
Bump spilo and target version for PostgreSQL 15 (#2139)
* Bumped Spilo image tag to the one that supports PostgreSQL 15. Using CDP version temporarily until non-CDP one is released.
* Added support for PostgreSQL 15 and made it default. 9.5 and 9.6 are now no longer supported
* Bumped spilo image tag to 2.1-p9
* Bumped spilo image in test launcher

Co-authored-by: yoshihiko <ariyoshi10@gmail.com>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2023-01-04 12:01:30 +01:00
Felix Kunde d7e1fb57f1
polish global config about sharing postgresql-run socket (#2155)
* polish global config about sharing postgresql-run socket
2023-01-02 18:28:48 +01:00
Francois Parquet be7b52db92
add preferred during scheduling pod anti affinity (#2048)
* add preferred during scheduling pod anti affinity

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2023-01-02 18:22:47 +01:00
Christian Rohmann 024aab1f13
Add config switch to share pg_socket in /var/run/postgresql via an emptyDir with the sidecar containers (#962) 2023-01-02 12:57:36 +01:00
Felix Kunde 3e148ea57e
enable operator support for pg15 and drop support for 9.5 and 9.6 (#2140)
* enable operator support for pg15 and drop support for 9.5 and 9.6
* not offer 15 in UI before spilo-15 is available
2022-12-15 12:17:27 +01:00
Polina Bungina 4d585250db
Add Patroni failsafe_mode parameter (#2076)
This commit adds support of a not-yet-released Patroni feature that allows postgres to run as primary in case of a failed leader lock update.
* Add Patroni 'failsafe_mode' local parameter (enable for a single PG cluster)
* Allow configuring Patroni 'failsafe_mode' parameter globally
2022-12-02 13:33:02 +01:00
yoshihikoueno c895e8f61f
Bumped Spilo 14 image tag to 2.1-p7 (#2096)
* Bumped Spilo 14 image tag to 2.1-p7

Co-authored-by: yoshihiko <ariyoshi10@gmail.com>
2022-11-03 10:57:33 +01:00
Felix Kunde a119772efb
add toggle to turn off readiness probes (#2004)
* add toggle to turn off readiness probes
* include PodManagementPolicy and ReadinessProbe in stateful set comparison
* add URI scheme to generated readiness probe
2022-10-05 18:25:24 +02:00
Felix Kunde 4c07494ac7
deprecate ClusterName field of Postgresql type and remove team from REST endpoints (#2015)
* deprecate ClusterName field of Postgresql type
* remove for teamId from operator API endpints /status /logs /history
* update dns_format_string and yaml template in UI
2022-08-29 15:00:25 +02:00
Felix Kunde 89375186b3
use old LB DNS format when teamId prefix is disabled (#2011)
* use old LB DNS format when teamId prefix is disabled
* support both old and new format in external-dns
* switch dns template from team to namespace
2022-08-25 18:29:54 +02:00
Felix Kunde 3bfd63cbe6
Make teamId in cluster name optional (#2001)
* making teamId in clustername optional
* move teamId check to addCluster function
2022-08-24 10:12:50 +02:00
Jociele Padilha b41daf4f76
Set maximum CPU and Memory requests on K8s (#1959)
* Set maximum CPU and Memory requests on K8s
2022-07-28 14:18:27 +02:00