Commit Graph

112 Commits

Author SHA1 Message Date
Polina Bungina a56ecaace7
Critical operation PDB (#2830)
Create the second PDB to cover Pods with a special "critical operation" label set.

This label is going to be assigned to all pg cluster's Pods by the Operator during a PG major version upgrade, by Patroni during a cluster/replica bootstrap. It can also be set manually or by any other automation tool.
2025-01-29 12:41:08 +01:00
Polina Bungina f49b4f1e97
Ensure podAnnotations are removed from pods if reset in the config (#2826) 2025-01-24 16:53:14 +01:00
Polina Bungina 8522331cf2
Extend MaintenanceWindows parameter usage (#2810)
Consider maintenance window when migrating master pods and replacing pods (rolling update)
2025-01-15 18:04:36 +01:00
Felix Kunde 8231797efa
add cluster field for PVCs (#2785)
* add cluster field for PVCs
* sync volumes on cluster creation
* fully spell pvc in log messages
2024-10-31 14:08:50 +01:00
Felix Kunde a08d1679f2
align sync and update logs (#2738) 2024-08-27 09:58:32 +02:00
Felix Kunde 25ccc87317
sync all resources to cluster fields (#2713)
* sync all resources to cluster fields (CronJob, Streams, Patroni resources)
* separated sync and delete logic for Patroni resources
* align delete streams and secrets logic with other resources
* rename gatherApplicationIds to getDistinctApplicationIds
* improve slot check before syncing streams CRD
* add ownerReferences and annotations diff to Patroni objects
* add extra sync code for config service so it does not get too ugly
* some bugfixes when comparing annotations and return err on found
* sync Patroni resources on update event and extended unit tests
* add config service/endpoint owner references check to e2e tes
2024-08-13 10:06:46 +02:00
Felix Kunde a87307e56b
Feat: enable owner references (#2688)
* feat(498): Add ownerReferences to managed entities
* empty owner reference for cross namespace secret and more tests
* update ownerReferences of existing resources
* removing ownerReference requires Update API call
* CR ownerReference on PVC blocks pvc retention policy of statefulset
* make ownerreferences optional and disabled by default
* update unit test to check len ownerReferences
* update codegen
* add owner references e2e test
* update unit test
* add block_owner_deletion field to test owner reference
* fix typos and update docs once more
* reflect code feedback

---------

Co-authored-by: Max Begenau <max@begenau.com>
2024-08-09 17:58:25 +02:00
Polina Bungina 47efca33c9
Improve inherited annotations (#2657)
* Annotate PVC on Sync/Update, not only change PVC template
* Don't rotate pods when only annotations changed
* Annotate Logical Backup's and Pooler's pods
* Annotate PDB, Endpoints created by the Operator, Secrets, Logical Backup jobs

Inherited annotations are only added/updated, not removed
2024-06-26 13:10:37 +02:00
Felix Kunde 08089ed4b4
add option to prevent PVC removal on cluster deletion (#2579)
* add option to prevent PVC removal on cluster deletion
* Update docs/reference/operator_parameters.md

Co-authored-by: Motte <37443982+dmotte@users.noreply.github.com>
2024-03-14 17:01:26 +01:00
Felix Kunde c1bfc2c2c0
do not block delete because of emtpy child resources (#2538) 2024-02-15 14:52:52 +01:00
Felix Kunde 4a0c483514
add unit test and documentation for finalizers (#2509)
* add unit test and documentation for finalizers
* error msg with lower case and cover sync case
* try to avoid adding json-patch dependency
* use Update to remove finalizer
* changing status and finalizer during create
* do not call Delete() twice
2024-01-22 12:13:40 +01:00
Christian Rohmann 743aade45f
Use finalizers to avoid losing delete events and to ensure full resource cleanup (#941)
* Add Finalizer functions to Cluster; add/remove finalizer on Create/Delete events
* Check if clusters have a deletion timestamp and we missed that event. Run Delete() and remove finalizer when done.
* Fix nil handling when using Service from map; Remove Service, Endpoint entries from their maps - just like with Secrets
* Add handling of ResourceNotFound to all delete functions (Service, Endpoint, LogicalBackup CronJob, PDB and Secret) - this is not a real error when deleting things
* Emit events when there are issues deleting resources to the user is informed
* Depend the removal of the Finalizer on all resources being deleted successfully first. Otherwise the next sync run should let us try again
* Add config option to enable finalizers
* Removed dangling whitespace at EOL
* config.EnableFinalizers is a bool pointer

---------

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2024-01-04 16:22:53 +01:00
Felix Kunde 1d44dd4694
delete secret resource from map (#2123) 2022-12-02 10:09:19 +01:00
Philipp B 84fe38a069
switch to batch API v1 for Jobs (#2066) 2022-10-07 11:27:58 +02:00
Felix Kunde 2aa52094db
switch to policy API v1 for PDBs (#2008)
* switch to policy API v1 for PDBs
* update e2e test dependencies
* use kind 0.14.0
* bump K8s client in e2e docker image
* bump e2e tests-runner
2022-10-06 09:43:17 +02:00
Felix Kunde 60e0685c32
define readinessProbe on statefulSet (#1825)
* define readinessProbe on statefulSet 
* do not error out on deleting Patroni cluster objects
* change delete order for patroni objects
2022-03-30 18:19:34 +02:00
Felix Kunde d032e4783e
LoadBalancer toggles for master and replica pooler pods (#1799)
* Add support for pooler load balancer

Signed-off-by: Sergey Shatunov <me@prok.pw>

* Rename to enable_master_pooler_load_balancer

Signed-off-by: Sergey Shatunov <me@prok.pw>

* target port should be intval
* enhance pooler e2e test
* add new options to crds.go

Co-authored-by: Sergey Shatunov <me@prok.pw>
2022-03-04 13:36:17 +01:00
Felix Kunde f7858ffb70
Initialize arrays of errors / error messages + minor refactoring (#1701)
* init error arrays correctly
* avoid nilPointer when syncing connectionPooler
* getInfrastructureRoles should return error
* fix unit tests and return type for getInfrastructureRoles
2021-11-29 12:49:12 +01:00
Rafia Sabih 75a9e2be38
Create cross namespace secrets (#1490)
* Create cross namespace secrets

* add test cases

* fixes

* Fixes
- include namespace in secret name only when namespace is provided
- use username.namespace as key to pgUsers only when namespace is
  provided
- avoid conflict in the role creation in db by checking namespace
  alongwith the username

* Update unit tests

* Fix test case

* Fixes

- update regular expression for usernames
- add test to allow check for valid usernames
- create pg roles with namespace (if any) appended in rolename

* add more test cases for valid usernames

* update docs

* fixes as per review comments

* update e2e

* fixes

* Add toggle to allow namespaced secrets

* update docs

* comment update

* Update e2e/tests/test_e2e.py

* few minor fixes

* fix unit tests

* fix e2e

* fix e2e attempt 2

* fix e2e

Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2021-06-11 10:35:30 +02:00
Felix Kunde 32e6c135b9
replace statefulset on annotation diff (#1449)
* replace statefulset on annotation diff
* remove update annotation function for statefulset
* add unit test for syncing annotations
* add inherited annotation to unit test
2021-04-22 11:22:52 +02:00
Felix Kunde ff8143770c
Improve rolling upgrades and rolling upgrade continue (#1341)
* add TODOs for moving rooling update label on pods
* steer rolling update via pod annotation
* rename patch method and fix reading flag on sync
* pass only pods to recreatePods function
* do not take address of iterator if you use it later
* add e2e test and pass switchover targets to recreatePods
* add wait_for_pod_failover for e2e test
* add one more e2e test case
* helm chart remove 1.6.0 archive from 1.6.0 archive
* reflect code review feedback
2021-02-26 15:38:58 +01:00
Rafia Sabih 49158ecb68
Connection pooler for replica (#1127)
* Enable connection pooler for replica
* Refactor code for connection pooler
  - Move all the relevant code to a separate file
  - Move all the related tests to a separate file
  - Avoid using cluster where not required
  - Simplify the logic in sync and other methods
  - Cleanup of duplicated or unused code
* Fix labels for the replica pods
* Update deleteConnectionPooler to include role
* Adding test cases and other changes
   - Fix unit test and delete secret when required only
   - Make sure we use empty fresh cluster for every test case.
* enhance e2e test
* Disable pooler in complete manifest as this is source for e2e too an creates unnecessary pooler setups.

Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
Co-authored-by: Jan Mussler <janm81@gmail.com>
2020-11-13 14:52:21 +01:00
Jan Mussler 3a86dfc8bb
End 2 End tests speedup (#1180)
* Improving end 2 end tests, especially speed of execution and error, by implementing proper eventual asserts and timeouts.
* Add documentation for running individual tests
* Fixed String encoding in Patorni state check and error case
* Printing config as multi log line entity, makes it readable and grepable on startup
* Cosmetic changes to logs. Removed quotes from diff. Move all object diffs to text diff. Enabled padding for log level.
* Mount script with tools for easy logaccess and watching objects.
* Set proper update strategy for Postgres operator deployment.
* Move long running test to end. Move pooler test to new functions.
* Remove quote from valid K8s identifiers.
2020-10-28 10:04:33 +01:00
Felix Kunde 0508266219
Remove all secrets on delete incl. pooler (#1091)
* fix syncSecrets and remove pooler secret

* update log for deleteSecret

* use c.credentialSecretName(username)

* minor fix
2020-08-10 18:26:26 +02:00
Felix Kunde 43163cf83b
allow using both infrastructure_roles_options (#1090)
* allow using both infrastructure_roles_options

* new default values for user and role definition

* use robot_zmon as parent role

* add operator log to debug

* right name for old secret

* only extract if rolesDefs is empty

* set password1 in old infrastructure role

* fix new infra rile secret

* choose different role key for new secret

* set memberof everywhere

* reenable all tests

* reflect feedback

* remove condition for rolesDefs
2020-08-10 15:08:03 +02:00
Felix Kunde 375963424d
delete secrets the right way (#1054)
* delete secrets the right way

* make a one function

* continue deleting secrets even if one delete fails

Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
2020-07-10 15:07:42 +02:00
Christian Rohmann 8ff7658ed3
Fix pooler delete (#960)
deleteConnectionPooler function incorrectly checks that the delete api response is ResourceNotFound. Looks like the only consequence is a confusing log message, but obviously it's wrong. Remove negation, since having ResourceNotFound as error is the good case.

Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>
2020-05-13 14:55:54 +02:00
Rafia Sabih d52296c323
Propagate annotations to the StatefulSet (#932)
* Initial commit

* Corrections

- set the type of the new  configuration parameter to be array of
  strings
- propagate the annotations to statefulset at sync

* Enable regular expression matching

* Improvements

-handle rollingUpdate flag
-modularize code
-rename config parameter name

* fix merge error

* Pass annotations to connection pooler deployment

* update code-gen

* Add documentation and update manifests

* add e2e test and introduce option in configmap

* fix service annotations test

* Add unit test

* fix e2e tests

* better key lookup of annotations tests

* add debug message for annotation tests

* Fix typos

* minor fix for looping

* Handle update path and renaming

- handle the update path to update sts and connection pooler deployment.
  This way no need to wait for sync
- rename the parameter to downscaler_annotations
- handle other review comments

* another try to fix python loops

* Avoid unneccessary update events

* Update manifests

* some final polishing

* fix cluster_test after polishing

Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-05-04 14:46:56 +02:00
Dmitry Dolgov a1f2bd05b9
Prevent superuser from being a connection pool user (#906)
* Protected and system users can't be a connection pool user

It's not supported, neither it's a best practice. Also fix potential null
pointer access. For protected users it makes sense by intent of protecting this
users (e.g. from being overriden or used as something else than supposed). For
system users the reason is the same as for superuser, it's about replicastion
user and it's under patroni control.

This is implemented on both levels, operator config and postgresql manifest.
For the latter we just use default name in this case, assuming that operator
config is always correct. For the former, since it's a serious
misconfiguration, operator panics.
2020-04-09 09:21:45 +02:00
Felix Kunde b43b22dfcc
Call me pooler, not pool (#883)
* rename pooler parts and add example to manifest
* update codegen
* fix manifest and add more details to docs
* reflect renaming also in e2e tests
2020-04-01 10:34:03 +02:00
Felix Kunde 66f2cda87f
Move operator to go 1.14 (#882)
* update go modules march 2020
* update to GO 1.14
* reflect k8s client API changes
2020-03-30 15:50:17 +02:00
Dmitry Dolgov 9dfa433363
Connection pooler (#799)
Connection pooler support

Add support for a connection pooler. The idea is to make it generic enough to
be able to switch between different implementations (e.g. pgbouncer or
odyssey). Operator needs to create a deployment with pooler and a service for
it to access.

For connection pool to work properly, a database needs to be prepared by
operator, namely a separate user have to be created with an access to an
installed lookup function (to fetch credential for other users).

This setups is supposed to be used only by robot/application users. Usually a
connection pool implementation is more CPU bounded, so it makes sense to create
several pods for connection pool with more emphasize on cpu resources. At the
moment there are no special affinity or tolerations assigned to bring those
pods closer to the database. For availability purposes minimal number of
connection pool pods is 2, ideally they have to be distributed between
different nodes/AZ, but it's not enforced in the operator itself. Available
configuration supposed to be ergonomic and in the normal case require minimum
changes to a manifest to enable connection pool. To have more control over the
configuration and functionality on the pool side one can customize the
corresponding docker image.

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-03-25 12:57:26 +01:00
Felix Kunde 3b10dc645d
patch/update services on type change (#824)
* use Update when disabling LoadBalancer + added e2e test
2020-02-13 16:24:15 +01:00
Felix Kunde 107334fe71
Add global option to enable/disable init containers and sidecars (#478)
* Add global option to enable/disable init containers and sidecars
* update dependencies
2019-12-10 15:45:54 +01:00
Felix Kunde 2ce602fcd7 fix errors when changing service type (#716)
* fix errors when changing service type

* nullify service and endpoint before recreation

* improve wait for delete logic and reuse config parameters
2019-11-26 10:28:32 +01:00
Eric 6e682fd6b5 Fixing spelling mistake in delete PVC function name (#691) 2019-10-18 16:41:56 +02:00
Felix Kunde f0e29060b1
move StatefulSet to apps/v1 (#675) 2019-09-30 16:42:04 +02:00
Rafia Sabih 2886027516
Some typos/spelling mistakes fix (#580)
Harmless typos fix.
2019-06-06 14:20:15 +02:00
Sergey Dudoladov f3e1e80aaf
Add logical backup (#442)
* Add k8s cron job to spawn logical backups

* Minor doc updates
2019-05-16 15:52:01 +02:00
Felix Kunde 31e568157b reflect change in github url (#496)
Project was moved from the incubator to the Zalando main org, hence the rename
2019-02-25 11:26:55 +01:00
Maxim Ivanov 8330905ce7 Don't panic if Service for the role was not found (#451) 2019-01-18 13:38:47 +01:00
zerg-junior 7907f95d2f
Improve reporting about rolling updates (#391) 2018-09-24 11:57:43 +02:00
Oleksii Kliukin e933908084
Configure pg_hba in the local postgresql configuration of Patroni. (#361)
Previously, the operator put pg_hba into the bootstrap/pg_hba key of
Patroni. That had 2 adverse effects:
 - pg_hba.conf was shadowed by Spilo default section in the local
   postgresql configuration
 - when updating pg_hba in the cluster manifest, the updated lines were
   not propagated to DCS, since the key was defined in the boostrap
   section of Patroni.

Include some minor refactoring, moving methods to unexported when
possible and commenting out usage of md5, so that gosec won't complain.

Per https://github.com/zalando-incubator/postgres-operator/issues/330

Review by @zerg-junior
2018-08-08 11:01:26 +02:00
Oleksii Kliukin b06186eb41
Linter-induced code refactoring, run round 2. (#360)
Run more linters in the gometalinter, i.e. deadcode, megacheck,
nakedret, dup.

More consistent code formatting, remove two dead functions, eliminate
naked a bunch of naked returns, refactor a few functions to avoid code
duplication.
2018-08-06 12:09:19 +02:00
Oleksii Kliukin 59f0c5551e
Allow configuring pod priority globally and per cluster. (#353)
* Allow configuring pod priority globally and per cluster.

Allow to specify pod priority class for all pods managed by the operator,
as well as for those belonging to individual clusters.

Controlled by the pod_priority_class_name operator configuration
parameter and the podPriorityClassName manifest option.

See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
for the explanation on how to define priority classes since Kubernetes 1.8.

Some import order changes are due to go fmt.
Removal of OrphanDependents deprecated field.

Code review by @zerg-junior
2018-08-03 14:03:37 +02:00
Oleksii Kliukin ac7b132314
Refactoring inspired by gometalinter. (#357)
Among other things, fix a few issues with deepcopy implementation.
2018-08-03 11:09:45 +02:00
Oleksii Kliukin d2d3f21dc2 Client go upgrade v6 (#352)
There are shortcuts in this code, i.e. we created the deepcopy function
by using the deepcopy package instead of the generated code, that will
be addressed once migrated to client-go v8. Also, some objects,
particularly statefulsets, are still taken from v1beta, this will also
be addressed in further commits once the changes are stabilized.
2018-08-01 11:08:01 +02:00
Oleksii Kliukin 48a5744314
Use Patroni API to set bootstrap-only options. (#299)
Call Patroni API /config in order to set special options that are
ignored when set in the configuration file, such as max_connections.
Per https://github.com/zalando-incubator/postgres-operator/issues/297

* Some minor refacoring:

Rename Cluster ManualFailover to Swithover
Rename Patroni Failover to Switchover
Add more details to error messages and comments introduced in this PR.

Review by @zerg-junior
2018-05-29 12:35:25 +02:00
Oleksii Kliukin 11d568bf65 Address code review by @zerg-junior
- new info messages, rename the annotation flag.
2018-05-15 16:50:03 +02:00
Oleksii Kliukin 0c616a802f Merge branch 'master' into rolling_updates_with_statefulset_annotations
# Conflicts:
#	pkg/cluster/k8sres.go
2018-05-15 15:33:34 +02:00