Commit Graph

173 Commits

Author SHA1 Message Date
Alec Thomas 0f2cb12219 Fix nil pointer dereference in syncCriticalOpPodDisruptionBudget
When the PDB creation fails with "already exists" error, the pdb
variable is nil since the initial Get failed. Using pdb.ObjectMeta
would cause a panic. Use the cluster method to get the PDB name instead.
2026-01-18 01:09:55 -06:00
Alec Thomas 513291c58d Create critical-op PDB on-demand to avoid false monitoring alerts
The critical-op PodDisruptionBudget was previously created permanently,
but its selector (critical-operation=true) matched no pods during normal
operation. This caused false alerts in monitoring systems like
kube-prometheus-stack because the PDB expected healthy pods but none
matched.

Changes:
- Modified syncCriticalOpPodDisruptionBudget to check if any pods have
  the critical-operation label before creating/keeping the PDB
- PDB is now created on-demand when pods are labeled (e.g., during
  major version upgrades) and deleted when labels are removed
- Updated majorVersionUpgrade to explicitly create/delete the PDB
  around the critical operation for immediate protection
- Removed automatic critical-op PDB creation from initial cluster setup
- Added test to verify on-demand PDB creation and deletion behavior,
  including edge cases for idempotent create/delete operations

The explicit PDB creation in majorVersionUpgrade ensures immediate
protection before the critical operation starts. The sync function
serves as a safety net for edge cases like bootstrap (where Patroni
applies labels) or operator restarts during critical operations.

Fixes #3020
2026-01-02 16:05:32 -05:00
Felix Kunde 04ad66f701
stop retention user cleanup early again when DB connection attempt fails (#2999)
* stop retention user cleanup early again when DB connection attempt fails
* add unit test and new returned error from updateSecret
2025-12-10 10:01:07 +01:00
Felix Kunde 2c57498e43
skip db user actions when its secret failed to sync on update (#2969)
* skip db user actions when its secret failed to sync on update
* need to add new pgUser field to e2e test
* lets collect errors of syncSecret so we still get status updateFailed
2025-11-05 16:28:37 +01:00
Felix Kunde 3bc244fe39
bump dependencies and reflect linter suggestions (#2963) 2025-10-16 10:23:36 +02:00
Eng Zer Jun eddf521227
Replace `golang.org/x/exp` with stdlib (#2857)
* Replace `golang.org/x/exp` with stdlib

These experimental packages are now available in the Go standard
library since Go 1.21.

	1. golang.org/x/exp/slices -> slices [1]
	2. golang.org/x/exp/maps -> maps [2]

[1]: https://go.dev/doc/go1.21#slices
[2]: https://go.dev/doc/go1.21#maps

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* Run go mod tidy

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

---------

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2025-10-14 11:59:48 +02:00
Jociele Padilha fa4bc21538
upgrade Go from 1.23.4 to 1.25.0 (#2945)
* upgrade go to 1.25
* add minor version to be Go 1.25.0
* revert the Go version on README to keep the history of the release
2025-08-19 14:40:39 +02:00
Felix Kunde 746df0d33d
do not remove publications of slot defined in manifest (#2868)
* do not remove publications of slot defined in manifest
* improve condition to sync streams
* init publication tables map when adding manifest slots
* need to update c.Stream when there is no update
2025-02-26 17:31:37 +01:00
Polina Bungina a56ecaace7
Critical operation PDB (#2830)
Create the second PDB to cover Pods with a special "critical operation" label set.

This label is going to be assigned to all pg cluster's Pods by the Operator during a PG major version upgrade, by Patroni during a cluster/replica bootstrap. It can also be set manually or by any other automation tool.
2025-01-29 12:41:08 +01:00
Polina Bungina f49b4f1e97
Ensure podAnnotations are removed from pods if reset in the config (#2826) 2025-01-24 16:53:14 +01:00
Polina Bungina b0cfeb30ea
Partially revert #2810 (#2849)
Only schedule switchover for pod migration, consider mainWindow for PGVERSION env change
2025-01-23 16:35:33 +01:00
Polina Bungina 8522331cf2
Extend MaintenanceWindows parameter usage (#2810)
Consider maintenance window when migrating master pods and replacing pods (rolling update)
2025-01-15 18:04:36 +01:00
Felix Kunde a08d1679f2
align sync and update logs (#2738) 2024-08-27 09:58:32 +02:00
Felix Kunde c7ee34ed12
fix sync streams and add diffs for annotations and owner references (#2728)
* extend and improve hasSlotsInSync unit test
* fix sync streams and add diffs for annotations and owner references
* incl. current annotations as desired where we do not fully control them
* added one more unit test and fixed sub test names
* pass maintenance windows to function and update unit test
2024-08-14 12:56:14 +02:00
Felix Kunde 25ccc87317
sync all resources to cluster fields (#2713)
* sync all resources to cluster fields (CronJob, Streams, Patroni resources)
* separated sync and delete logic for Patroni resources
* align delete streams and secrets logic with other resources
* rename gatherApplicationIds to getDistinctApplicationIds
* improve slot check before syncing streams CRD
* add ownerReferences and annotations diff to Patroni objects
* add extra sync code for config service so it does not get too ugly
* some bugfixes when comparing annotations and return err on found
* sync Patroni resources on update event and extended unit tests
* add config service/endpoint owner references check to e2e tes
2024-08-13 10:06:46 +02:00
Felix Kunde a87307e56b
Feat: enable owner references (#2688)
* feat(498): Add ownerReferences to managed entities
* empty owner reference for cross namespace secret and more tests
* update ownerReferences of existing resources
* removing ownerReference requires Update API call
* CR ownerReference on PVC blocks pvc retention policy of statefulset
* make ownerreferences optional and disabled by default
* update unit test to check len ownerReferences
* update codegen
* add owner references e2e test
* update unit test
* add block_owner_deletion field to test owner reference
* fix typos and update docs once more
* reflect code feedback

---------

Co-authored-by: Max Begenau <max@begenau.com>
2024-08-09 17:58:25 +02:00
Polina Bungina 47efca33c9
Improve inherited annotations (#2657)
* Annotate PVC on Sync/Update, not only change PVC template
* Don't rotate pods when only annotations changed
* Annotate Logical Backup's and Pooler's pods
* Annotate PDB, Endpoints created by the Operator, Secrets, Logical Backup jobs

Inherited annotations are only added/updated, not removed
2024-06-26 13:10:37 +02:00
Felix Kunde 83878fe447
make bucket prefix for logical backup configurable (#2609)
* make bucket prefix for logical backup configurable
* include container comparison in logical backup diff
* add unit test and update description for compareContainers
* don't rely on users putting / in the config - reflect other comments from review
2024-04-23 14:24:04 +02:00
Felix Kunde 8bd9080798
return create and sync error, not setStatus error (#2574)
* return create and sync error, not possible status set error
* update documentation and improve deletion logs
2024-03-12 16:31:59 +01:00
cstohr1 9bb5d8add7
Fix updating SynchronousNodeCount (#2552) (#2558)
CRD support for synchronous_node_count was previously added in #1484, however the desired SynchronousNodeCount was not compared to the actual patroni configuration, which meant it was never updated.
2024-03-05 09:37:06 +01:00
Felix Kunde 3fb3b34094
change username in secret when switching rotation mode (#2549) 2024-02-22 10:26:13 +01:00
Felix Kunde e34f19be01
update spec when updating status (#2546)
* update spec when updating status
* only setSpec of pg resource is not empty
2024-02-20 10:24:24 +01:00
Felix Kunde 886cb86797
allow users to opt out from globally enabled secret rotation (#2528)
* allow users to opt out from globally enabled secret rotation
* cover new option also in e2e test
* change ignore test to existing user
2024-02-09 12:19:06 +01:00
Felix Kunde 4a0c483514
add unit test and documentation for finalizers (#2509)
* add unit test and documentation for finalizers
* error msg with lower case and cover sync case
* try to avoid adding json-patch dependency
* use Update to remove finalizer
* changing status and finalizer during create
* do not call Delete() twice
2024-01-22 12:13:40 +01:00
Felix Kunde 3bad9aaded
fix when syncing standby discription (#2513) 2024-01-12 10:41:17 +01:00
Felix Kunde dad5b132ec
Standby cluster promotion by changing manifest (#2472)
* Standby cluster promotion by changing manifest
* Updated the documentation

---------

Co-authored-by: Senthilnathan M <snathanm@vmware.com>
2024-01-04 12:33:50 +01:00
Felix Kunde 1105228d3a
in sync mode select only syncStandby as switchover candidate (#2278)
* in sync mode select only syncStandby as swicthover candidate
* do not exit retry with err
* unit test: use error from reading byte stream twice
2023-04-06 12:04:55 +02:00
Felix Kunde 80fee5bda4
continue syncing databases and extensions on err (#2262) 2023-03-14 10:58:54 +01:00
Felix Kunde e6fb57a6bd
add c.replicationSlots on sync (#2238) 2023-02-23 13:19:35 +01:00
Felix Kunde 7a90fbcb00
fix sync of stream slots (#2194) 2023-01-27 18:03:37 +01:00
Felix Kunde 7887ebbbce
set wal_level config not on empty parameters map (#2189)
* set wal_level config not on empty parameters map
* UPDATE event must trigger statefulSet sync when streams are added
2023-01-26 09:43:03 +01:00
Felix Kunde b9165190e1
set wal_level for streams in statefulSet sync (#2187)
* set wal_level for streams in statefulSet sync
2023-01-25 17:06:31 +01:00
Felix Kunde 4741b3f734
copy rolconfig during password rotation (#2183)
* copy rolconfig during password rotation

Co-authored-by: idanovinda <idanovinda@gmail.com>
2023-01-25 10:48:23 +01:00
Felix Kunde a4f95e97e0
do not rotate secrets for standby clusters (#2175) 2023-01-17 12:58:14 +01:00
idanovinda 486d5d66e0
Allow drop slots when it gets deleted from the manifest (#2089)
* Allow drop slots when it gets deleted from the manifest
* use leader instead replica to query slots
* fix and extend unit tests for config update checks

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2023-01-03 15:46:59 +01:00
Polina Bungina 4d585250db
Add Patroni failsafe_mode parameter (#2076)
This commit adds support of a not-yet-released Patroni feature that allows postgres to run as primary in case of a failed leader lock update.
* Add Patroni 'failsafe_mode' local parameter (enable for a single PG cluster)
* Allow configuring Patroni 'failsafe_mode' parameter globally
2022-12-02 13:33:02 +01:00
Felix Kunde 70f3ee8e36
skip db sync on failed initUsers during UPDATE (#2083)
* skip db sync on failed initUsers during UPDATE
* provide unit test for teams API being unavailable
* add test for 404 case
2022-10-21 17:50:14 +02:00
Felix Kunde 4786f53f03
Fix password rotation (#2043)
* fix password rotation
* test connection with rotation user in e2e test + minor changes
2022-10-13 11:33:26 +02:00
Felix Kunde ce8b009c66
fix team member deprecation (#2072) 2022-10-11 18:02:41 +02:00
Philipp B 84fe38a069
switch to batch API v1 for Jobs (#2066) 2022-10-07 11:27:58 +02:00
Felix Kunde 2aa52094db
switch to policy API v1 for PDBs (#2008)
* switch to policy API v1 for PDBs
* update e2e test dependencies
* use kind 0.14.0
* bump K8s client in e2e docker image
* bump e2e tests-runner
2022-10-06 09:43:17 +02:00
Felix Kunde e0c4603057
create streams only after postgres instances were restarted (#2034)
* create streams only after postgres instances were restarted
* checkAndSetGlobalPostgreSQLConfiguration returns if config has been patched
* restart can be pending even without a config patch
2022-09-19 15:25:55 +02:00
Felix Kunde d209612b18 use correct keys in updateSecret (#2029) 2022-09-01 10:58:42 +02:00
Felix Kunde 21d00e2ed7
rework map selection in updateSecret (#2010) 2022-08-24 17:33:39 +02:00
Felix Kunde ef324494a0
fetch pooler and fes_user system user only when corresponding features are used (#2009)
* fetch pooler and fes_user system user only when corresponding features are used
* cover error case in unit test
* use string formatting instead of +
2022-08-24 16:28:49 +02:00
Felix Kunde b2642fa2fc
allow in place pw rotation of system users (#1953)
* allow in place pw rotation of system users
* block postgres user from rotation
* mark pooler pods for replacement
* adding podsGetter where pooler is synced in unit tests
* move rotation code in extra function
2022-08-18 14:14:31 +02:00
Felix Kunde 532772c5cd
do not call EBS api when there are no pvs (#1851)
* do not call EBS api when there are no pvs
* no extra aws api call in executeEBSMigration, operate on fetched cluster.EBSVolumes
2022-04-20 12:12:02 +02:00
Felix Kunde 654d22d04a
Configure annotations to be ignored in comparisons during sync (#1823)
* feat: add ignored annotations when comparing during sync

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
Co-authored-by: Moshe Immerman <moshe@flanksource.com>
2022-03-24 18:38:37 +01:00
Felix Kunde 36df1bc87c
refactor GenerateResourceRequirements and provide unit tests (#1822)
* refactor GenerateResourceRequirements and provide unit tests
2022-03-24 17:35:00 +01:00
Felix Kunde d032e4783e
LoadBalancer toggles for master and replica pooler pods (#1799)
* Add support for pooler load balancer

Signed-off-by: Sergey Shatunov <me@prok.pw>

* Rename to enable_master_pooler_load_balancer

Signed-off-by: Sergey Shatunov <me@prok.pw>

* target port should be intval
* enhance pooler e2e test
* add new options to crds.go

Co-authored-by: Sergey Shatunov <me@prok.pw>
2022-03-04 13:36:17 +01:00