Commit Graph

58 Commits

Author SHA1 Message Date
Jan Mußler 85ae41b897 Wait more before we delete. 2020-10-27 22:43:28 +01:00
Jan Mußler 067c7b5558 Adding message to verif manifest was in fact deleted. 2020-10-27 22:22:26 +01:00
Jan Mußler 30cd4edd09 Giving operator 1 second to startup. 2020-10-27 21:05:27 +01:00
Jan Mußler cabb7bc59f Catch possible pods count error. 2020-10-27 20:31:16 +01:00
Jan Mußler 9a7fc85d3b Merge master. 2020-10-27 17:25:57 +01:00
Jan Mußler 60cbd4ebbe Skip failing test for now. 2020-10-27 16:53:09 +01:00
Jan Mußler b606b6f77b Wait for pods to run. 2020-10-27 15:19:19 +01:00
Jan Mußler 326c67b670 Proper f() wrapper for taint test. 2020-10-27 14:14:30 +01:00
Jan Mußler 826d7c0c1e Typos. 2020-10-27 13:19:05 +01:00
Jan Mußler 8dc6c08cf7 Changes to tolerations test. Make it complete quicker. 2020-10-27 13:14:25 +01:00
Jan Mußler e6b71cbb98 Taints and tolerations test. 2020-10-27 12:40:21 +01:00
Jan Mußler eb8df06af5 Tiny changes. 2020-10-26 23:35:03 +01:00
Jan Mußler 88e89956e0 Verify explicit sync of deployment. 2020-10-26 23:24:03 +01:00
Jan Mußler 89741c4d60 Fix annotation error case. 2020-10-26 22:17:52 +01:00
Jan Mußler acc1d5e0b9 Pooler cleanup in spec in extra step. 2020-10-26 21:57:43 +01:00
Jan Mußler 5294995b19 Move long running test to end. Move pooler test to new functions. 2020-10-26 21:38:06 +01:00
Jan Mußler 8b057d4e43 Updating readme. 2020-10-24 00:16:27 +02:00
Jan Mußler 0c0474c95c Fix missing message. 2020-10-23 01:18:43 +02:00
Jan Mußler d88e62fc79 Fixing yaml dump. Removing restart pending between tests. 2020-10-23 01:09:02 +02:00
Jan Mußler 2aeaad03f3 Minor changes around running pods and catching error in infrastructure roles. 2020-10-22 16:12:12 +02:00
Jan Mußler 1f3730b2b4 More tests and more nice diff. 2020-10-21 23:30:35 +02:00
Jan Mußler 6b91bd3282 More e2e changes for scale up and down. 2020-10-21 17:58:16 +02:00
Jan Mußler f03409da06 Fix min resurces end to end test. 2020-10-21 17:27:00 +02:00
Jan Mußler 9b596f1eb7 Extending timeout, allow one sync. 2020-10-21 15:39:09 +02:00
Jan Mußler 2066256a17 Progressing on faster e2e tests. 2020-10-21 15:23:17 +02:00
Jan Mußler 668ef51d9f Printing config as multi log line entity, makes it readable and grepable on startup 2020-10-21 08:33:31 +02:00
Jan Mußler c6c4c4cc8a * Make lazy upgrade test work reliable
* Allow Docker image to take parameters to overwrite unittest
execution
  * Add documentation for running individual tests
  * Fixed String encoding in Patorni state check and error case
2020-10-20 19:20:38 +02:00
Jan Mußler 4fc8ca384d Fix distribution call. 2020-10-19 23:47:40 +02:00
Jan Mußler 966575dd4b * Patroni state function added in k8s
* Lazy upgrade now properly covered with eventual and waiting for pod
start
 * patching config now updates deployment, patching annotation, allowing
to trace change step
 * run.sh no takes NOCLEANUP to stop kind from being deleted
 * if kind config is present, run will not install kind
 * Fast e2e local execution now possible once kind is up
2020-10-19 23:35:08 +02:00
Dmitry Dolgov 1f5d0995a5
Lookup function installation (#1171)
* Lookup function installation

Due to reusing a previous database connection without closing it, lookup
function installation process was skipping the first database in the
list, installing twice into postgres db instead. To prevent that, make
internal initDbConnWithName to overwrite a connection object, and return
the same object only from initDbConn, which is sort of public interface.

Another solution for this would be to modify initDbConnWithName to
return a connection object and then generate one temporary connection
for each db. It sound feasible but after one attempt it seems it
requires a bit more changes around (init, close connections) and
doesn't bring anything significantly better on the table. In case if
some future changes will prove this wrong, do not hesitate to refactor.

Change retry strategy to more insistive one, namely:

* retry on the next sync even if we failed to process one database and
install pooler appliance.

* perform the whole installation unconditionally on update, since the
list of target databases could be changed.

And for the sake of making it even more robust, also log the case when
operator decides to skip installation.

Extend connection pooler e2e test with verification that all dbs have
required schema installed.
2020-10-19 16:18:58 +02:00
Jan Mußler c1ad71668b WIP 2020-10-19 14:09:22 +02:00
Jan Mußler ccde8c6bf6 More fixes for e2e tests. 2020-10-19 13:53:48 +02:00
Jan Mußler 38e6261d64 Loadbalancer test now uses eventualEqual properly. 2020-10-19 10:26:41 +02:00
Jan Mussler 21afc07d9f Improving end 2 end tests by implementing proper eventual asserts and timeouts. 2020-10-18 19:23:17 +02:00
Sergey Dudoladov 3b6dc4f92d
Improve e2e tests (#1111)
* icnrease vm size

* cache deps

* switch to the absolute cache path as cdp does not support shell expansion

* do not pull non-existing image

* manually install kind

* add alias to kind

* use full kind name

* one more name change

* install kind with other tools

* add bind mounts instead of copying files

* test fetching the runner image

* build image for pierone

* bump up the client-go version to match the master

* bump up go version

* install pinned version of kind before any test run

* do not overwrite local ./manifests during test run

* update the docs

* fix kind name

* update go.* files

* fix deps

* avoid unnecessary image upload

* properly install kind

* Change network to host to make it reachable within e2e runner. May not be the right solution though.

* Small changes. Also use entrypoint vs cmd.

* Bumping spilo. Load before test.

* undo incorrect merge from the master

Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
Co-authored-by: Jan Mußler <janm81@gmail.com>
2020-09-25 14:14:19 +02:00
Felix Kunde 5e93aabea6
improve e2e test debugging (#1107)
* print operator log in most tests when they time out
2020-08-28 14:57:19 +02:00
Felix Kunde 30c86758a3
update kind and use with old storage class (#1121)
* update kind and use with old storage class
* specify standard storage class in minimal manifest
* remove existing local storage class in kind
* fix pod distribution test
* exclude k8s master from nodes of interest
2020-08-28 12:16:37 +02:00
Felix Kunde 3ddc56e5b9
allow delete only if annotations meet configured criteria (#1069)
* define annotations for delete protection

* change log level and reduce log lines for e2e tests

* reduce wait_for_pod_start even further
2020-08-13 16:36:22 +02:00
Felix Kunde 43163cf83b
allow using both infrastructure_roles_options (#1090)
* allow using both infrastructure_roles_options

* new default values for user and role definition

* use robot_zmon as parent role

* add operator log to debug

* right name for old secret

* only extract if rolesDefs is empty

* set password1 in old infrastructure role

* fix new infra rile secret

* choose different role key for new secret

* set memberof everywhere

* reenable all tests

* reflect feedback

* remove condition for rolesDefs
2020-08-10 15:08:03 +02:00
Dmitry Dolgov 7cf2fae6df
[WIP] Extend infrastructure roles handling (#1064)
Extend infrastructure roles handling

Postgres Operator uses infrastructure roles to provide access to a database for
external users e.g. for monitoring purposes. Such infrastructure roles are
expected to be present in the form of k8s secrets with the following content:

    inrole1: some_encrypted_role
    password1: some_encrypted_password
    user1: some_entrypted_name

    inrole2: some_encrypted_role
    password2: some_encrypted_password
    user2: some_entrypted_name

The format of this content is implied implicitly and not flexible enough. In
case if we do not have possibility to change the format of a secret we want to
use in the Operator, we need to recreate it in this format.

To address this lets make the format of secret content explicitly. The idea is
to introduce a new configuration option for the Operator.

    infrastructure_roles_secrets:
    - secretname: k8s_secret_name
      userkey: some_encrypted_name
      passwordkey: some_encrypted_password
      rolekey: some_encrypted_role

    - secretname: k8s_secret_name
      userkey: some_encrypted_name
      passwordkey: some_encrypted_password
      rolekey: some_encrypted_role

This would allow Operator to use any avalable secrets to prepare infrastructure
roles. To make it backward compatible simulate the old behaviour if the new
option is not present.

The new configuration option is intended be used mainly from CRD, but it's also
available via Operator ConfigMap in a limited fashion. For ConfigMap one can
put there only a string with one secret definition in the following format (as
a string):

    infrastructure_roles_secrets: |
        secretname: k8s_secret_name,
        userkey: some_encrypted_name,
        passwordkey: some_encrypted_password,
        rolekey: some_encrypted_role

Note than only one secret could be specified this way, no multiple secrets are
allowed.

Eventually the resulting list of infrastructure roles would be a total sum of
all supported ways to describe it, namely legacy via
infrastructure_roles_secret_name and infrastructure_roles_secrets from both
ConfigMap and CRD.
2020-08-05 14:18:56 +02:00
Rafia Sabih d52296c323
Propagate annotations to the StatefulSet (#932)
* Initial commit

* Corrections

- set the type of the new  configuration parameter to be array of
  strings
- propagate the annotations to statefulset at sync

* Enable regular expression matching

* Improvements

-handle rollingUpdate flag
-modularize code
-rename config parameter name

* fix merge error

* Pass annotations to connection pooler deployment

* update code-gen

* Add documentation and update manifests

* add e2e test and introduce option in configmap

* fix service annotations test

* Add unit test

* fix e2e tests

* better key lookup of annotations tests

* add debug message for annotation tests

* Fix typos

* minor fix for looping

* Handle update path and renaming

- handle the update path to update sts and connection pooler deployment.
  This way no need to wait for sync
- rename the parameter to downscaler_annotations
- handle other review comments

* another try to fix python loops

* Avoid unneccessary update events

* Update manifests

* some final polishing

* fix cluster_test after polishing

Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-05-04 14:46:56 +02:00
Sergey Dudoladov cc635a02e3
Lazy upgrade of the Spilo image (#859)
* initial implementation

* describe forcing the rolling upgrade

* make parameter name more descriptive

* add missing pieces

* address review

* address review

* fix bug in e2e tests

* fix cluster name label in e2e test

* raise test timeout

* load spilo test image

* use available spilo image

* delete replica pod for lazy update test

* fix e2e

* fix e2e with a vengeance

* lets wait for another 30m

* print pod name in error msg

* print pod name in error msg 2

* raise timeout, comment other tests

* subsequent updates of config

* add comma

* fix e2e test

* run unit tests before e2e

* remove conflicting dependency

* Revert "remove conflicting dependency"

This reverts commit 65fc09054b.

* improve cdp build

* dont run unit before e2e tests

* Revert "improve cdp build"

This reverts commit e2a8fa12aa.

Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-04-29 10:07:14 +02:00
Sergey Dudoladov 3c91bdeffa
Re-create pods only if all replicas are running (#903)
* adds a Get call to Patroni interface to fetch state of a Patroni member
* postpones re-creating pods if at least one replica is currently being created  

Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-04-20 15:14:11 +02:00
Felix Kunde b43b22dfcc
Call me pooler, not pool (#883)
* rename pooler parts and add example to manifest
* update codegen
* fix manifest and add more details to docs
* reflect renaming also in e2e tests
2020-04-01 10:34:03 +02:00
Felix Kunde 64d816c556
add short sleep before redistributing pods (#891) 2020-03-31 11:47:39 +02:00
Dmitry Dolgov 9dfa433363
Connection pooler (#799)
Connection pooler support

Add support for a connection pooler. The idea is to make it generic enough to
be able to switch between different implementations (e.g. pgbouncer or
odyssey). Operator needs to create a deployment with pooler and a service for
it to access.

For connection pool to work properly, a database needs to be prepared by
operator, namely a separate user have to be created with an access to an
installed lookup function (to fetch credential for other users).

This setups is supposed to be used only by robot/application users. Usually a
connection pool implementation is more CPU bounded, so it makes sense to create
several pods for connection pool with more emphasize on cpu resources. At the
moment there are no special affinity or tolerations assigned to bring those
pods closer to the database. For availability purposes minimal number of
connection pool pods is 2, ideally they have to be distributed between
different nodes/AZ, but it's not enforced in the operator itself. Available
configuration supposed to be ergonomic and in the normal case require minimum
changes to a manifest to enable connection pool. To have more control over the
configuration and functionality on the pool side one can customize the
corresponding docker image.

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-03-25 12:57:26 +01:00
Felix Kunde 07c5da35e3
fix minor issues in docs and manifests (#866)
* fix minor issues in docs and manifests
* double retry_timeout_sec
2020-03-18 15:02:13 +01:00
Felix Kunde cde61f3f0b
e2e: wait for pods after disabling anti affinity (#862) 2020-03-11 14:08:54 +01:00
Felix Kunde ae2a38d62a
add e2e test for node readiness label (#846)
* add e2e test for node readiness label
* refactoring and order tests alphabetically
* always wait for replica after failover
2020-03-06 12:55:34 +01:00
Felix Kunde 742d7334a1
use cluster-name as default label everywhere (#782)
* use cluster-name as default label everywhere
* fix e2e test
2020-02-19 15:01:01 +01:00