Commit Graph

24 Commits

Author SHA1 Message Date
Felix Kunde 3ebe4ffb99
reflect new replica states in unit tests (#2581)
* reflect new replica states in unit tests
2024-03-15 14:46:36 +01:00
Felix Kunde bf5db676b1
replace deprecated ioutil (#2531)
* replace deprecated ioutil
* replace ioutil also in kubectl plugin
2024-02-05 11:58:36 +01:00
Felix Kunde dad5b132ec
Standby cluster promotion by changing manifest (#2472)
* Standby cluster promotion by changing manifest
* Updated the documentation

---------

Co-authored-by: Senthilnathan M <snathanm@vmware.com>
2024-01-04 12:33:50 +01:00
Felix Kunde 97be5ee1cb
use uint64 for replication lag from Patroni's member endpoint (#1893)
* use int64 for replication lag from Patroni's member endpoint
2022-05-19 09:39:56 +02:00
Felix Kunde 60e0685c32
define readinessProbe on statefulSet (#1825)
* define readinessProbe on statefulSet 
* do not error out on deleting Patroni cluster objects
* change delete order for patroni objects
2022-03-30 18:19:34 +02:00
Felix Kunde d8a159ef1a
create CDC event stream CRD (#1570)
* provide event stream API
* check manifest settings for logical decoding before creating streams
* operator updates Postgres config and creates replication user
* name FES like the Postgres cluster
* add delete case and fix updating streams + update unit test
* check if fes CRD exists before syncing
* existing slot must use the same plugin
* make id and payload columns configurable
* sync streams only when they are defined in manifest
* introduce applicationId for separate stream CRDs
* add FES to RBAC in chart
* disable streams in chart
* switch to pgoutput plugin and let operator create publications
* reflect code review and additional refactoring

Co-authored-by: Paŭlo Ebermann <paul.ebermann@zalando.de>
2022-02-28 10:09:42 +01:00
Felix Kunde 411abbe31e
handle case when Patroni returns that lag is unknown (#1724)
* handle case when Patroni returns that lag is unknown
* remove some prints from e2e test
2021-12-17 12:36:23 +01:00
Felix Kunde 07fd4ec00b
choose switchover candidate based on lag and role (#1700)
* choose switchover candidate based on lowest lag in MB and role (in synchronous mode)
2021-12-14 10:35:21 +01:00
Felix Kunde 1eafd688d0
restart master first in some edge cases (#1655)
* restart master first in some edge cases

* edge case is when desired is lower than effective

* wait after config patch and restart on sync whenever we see pending_restart

* convert options to int to check decrease and add unit test

* minor update to e2e tests

* wait only after restart not every sync

* using spilo 14 e2e images
2021-10-26 16:43:19 +02:00
Felix Kunde 2a33bf3313
improve Patroni config sync (#1635)
* improve Patroni config sync
* collect new and updated slots to patch patroni
* refactor httpGet in Patroni and extend unit tests
* GetMemberData should call the patroni endpoint
* add PATCH test
2021-10-13 17:17:26 +02:00
Felix Kunde 66620d5049
refactor restarting instances (#1535)
* refactor restarting instances and reduce listPods calls
* only add parameters to set if it differs from effective config
* update e2e test for updating Postgres config
* patch config only once
2021-08-09 16:23:41 +02:00
Igor Yanchenko ebb3204cdd
restart instances via rest api instead of recreating pods, fixes bug with being unable to decrease some values, like max_connections (#1103)
* restart instances via rest api instead of recreating pods
* Ignore differences in bootstrap.dcs when compare SPILO_CONFIGURATION
* isBootstrapOnlyParameter is rewritten, instead of whitelist it uses blacklist
* added e2e test for max_connections decreasing
* documentation updated
* pending_restart flag added to restart api call, wait fot ttl seconds after restart
* refactoring, /restart returns error if pending_restart is set to true and patroni is not pending restart
* restart postgresql instances within pods only if pod's restart is not required
* patroni might need to restart postgresql after pods were recreated if values like max_connections decreased
* instancesRestart is not critical, try to restart pods if not successful
* cleanup

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2021-06-14 11:00:58 +02:00
Jan Mussler 636a9a8191
Support major version upgrade via manifest and global upgrades via min version (#1372)
Support major version upgrade trigger via manifest. There is `off` `manual` and `full`. Manual is what you expect, and full will auto upgrade clusters below a certain threshold.
2021-02-25 11:42:43 +01:00
Sergey Dudoladov 3c91bdeffa
Re-create pods only if all replicas are running (#903)
* adds a Get call to Patroni interface to fetch state of a Patroni member
* postpones re-creating pods if at least one replica is currently being created  

Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-04-20 15:14:11 +02:00
Weilu Jia e00b37fc17 Handle IPv6 k8s pods in Patroni URLs (#671)
* Handle IPv6 Patroni URLs
2019-09-30 10:14:27 +02:00
Noah Kantrowitz 0b75a89920 Fix the casing of github.com/Sirupsen/logrus to match what the project itself uses. (#380)
Dep enforces this.
2018-09-06 10:26:48 +02:00
Oleksii Kliukin b06186eb41
Linter-induced code refactoring, run round 2. (#360)
Run more linters in the gometalinter, i.e. deadcode, megacheck,
nakedret, dup.

More consistent code formatting, remove two dead functions, eliminate
naked a bunch of naked returns, refactor a few functions to avoid code
duplication.
2018-08-06 12:09:19 +02:00
Oleksii Kliukin ac7b132314
Refactoring inspired by gometalinter. (#357)
Among other things, fix a few issues with deepcopy implementation.
2018-08-03 11:09:45 +02:00
Oleksii Kliukin d2d3f21dc2 Client go upgrade v6 (#352)
There are shortcuts in this code, i.e. we created the deepcopy function
by using the deepcopy package instead of the generated code, that will
be addressed once migrated to client-go v8. Also, some objects,
particularly statefulsets, are still taken from v1beta, this will also
be addressed in further commits once the changes are stabilized.
2018-08-01 11:08:01 +02:00
Oleksii Kliukin 48a5744314
Use Patroni API to set bootstrap-only options. (#299)
Call Patroni API /config in order to set special options that are
ignored when set in the configuration file, such as max_connections.
Per https://github.com/zalando-incubator/postgres-operator/issues/297

* Some minor refacoring:

Rename Cluster ManualFailover to Swithover
Rename Patroni Failover to Switchover
Add more details to error messages and comments introduced in this PR.

Review by @zerg-junior
2018-05-29 12:35:25 +02:00
Oleksii Kliukin cca73e30b7
Make code around recreating pods and creating objects in the database less brittle (#213)
There used to be a masterLess flag that was supposed to indicate whether the cluster it belongs to runs without the acting master by design. At some point, as we didn't really have support for such clusters, the flag has been misused to indicate there is no master in the cluster. However, that was not done consistently (a cluster without all pods running would never be masterless, even when the master is not among the running pods) and it was based on the wrong assumption that the masterless cluster will remain masterless until the next attempt to change that flag, ignoring the possibility of master coming up or some node doing a successful promotion. Therefore, this PR gets rid of that flag completely.

When the cluster is running with 0 instances, there is obviously no master and it makes no sense to create any database objects inside the non-existing master. Therefore, this PR introduces an additional check for that.

recreatePods were assuming that the roles of the pods recorded when the function has stared will not change; for instance, terminated replica pods should start as replicas. Revisit that assumption by looking at the actual role of the re-spawned pods; that avoids a failover if some replica has promoted to the master role while being re-spawned. In addition, if the failover from the old master was unsuccessful, we used to stop and leave the old master running on an old pod, without recording this fact anywhere. This PR makes the failover failure emit a warning, but not stop recreating the last master pod; in the worst case, the running master will be terminated, however, this case is rather unlikely one.

As a side effect, make waitForPodLabel return the pod definition it waited for, avoiding extra API calls in recreatePods and movePodFromEndOfLifeNode
2018-02-22 10:42:05 +01:00
Murat Kabilov 202f2de988 Retry connecting to pg 2017-10-17 17:03:50 +02:00
Murat Kabilov 6c4cb4e9da Perform manual failover during the scale down 2017-10-16 17:41:23 +02:00
Murat Kabilov 8aa11ecee2 Add patroni api client 2017-08-30 16:01:18 +02:00