Commit Graph

394 Commits

Author SHA1 Message Date
Pavel Tumik fbd04896c2
Add ability to upload logical backup to gcs (#1173)
Support logical backup provider/storage S3 and GCS equivalent
2020-12-16 10:41:08 +01:00
Felix Kunde 929075814a
diff SecurityContext of containers (#1255)
* diff SecurityContext of containers
* change log messages to use "does not" vs "doesn't"
2020-12-15 10:06:53 +01:00
Felix Kunde 83fbccac5a
new env var for backwards compatability between spilo 12 and 13 (#1254) 2020-12-14 18:43:53 +01:00
Jan Mussler b88d8e34e1
Fix function name in test (#1250)
* Fix function name in test

Error was somehow introduced in last 2 PRs merged.

* Update volumes_test.go
2020-12-12 00:35:27 +01:00
Felix Kunde 6a97316a69
Support inherited annotations for all major objects (#1236)
* add comments where inherited annotations could be added

* add inheritedAnnotations feature

* return nil if no annotations are set

* minor changes

* first downscaler then inherited annotations

* add unit test for inherited annotations

* add pvc to test + minor changes

* missing comma

* fix nil map assignment

* set annotations in the same order it is done in other places

* replace acidClientSet with acid getters in K8s client

* more fixes on clientSet vs getters

* minor changes

* remove endpoints from annotation test

* refine unit test - but deployment and sts are still empty

* fix checkinng sts and deployment

* make annotations setter one liners

* no need for len check anymore

Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
2020-12-11 16:34:01 +01:00
Jan Mussler 549f71bb49
Support EBS gp2 to gp3 migration on sync for below 1tb volumes (#1242)
* initial commit for gp3 migration.

* Default volume migration done.

* Added Gomock and one test case with mock.

* Dep update.

* more changes for code gen.

* push fake package.

* Rename var.

* Changes to Makefile and return value.

* Macke mocks phony due to overlap in foldername.

* Learning as one goes. Initialize map.

* Wrong toggle.

* Expect modify call.

* Fix mapping of ids in test.

* Fix volume id.

* volume ids.

* Fixing test setup. Late night...

* create all pvs.

* Fix test case config.

* store volumes and compare.

* More logs.

* Logging of migration action.

* Ensure to log errors.

* Log warning if modify failed, e.g. due to ebs volume state.

* Add more output.

* Skip local e2e tests.

* Reflect k8s volume id in test data. Extract aws volume id from k8s value.

* Finalizing ebs migration.

* More logs. describe fails.

* Fix non existing fields in gp2 discovery.

* Remove nothing to do flag for migration.

* Final commit for migration.

* add new options to all places

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-12-11 15:52:32 +01:00
Rafia Sabih 5a6da7275f
avoid hard-codeed spilo-role (#1246)
Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
2020-12-09 13:00:06 +01:00
Sergey Dudoladov dc9a5b1e61
Introduce PGVERSION (#1172)
* introduce PGVERSION

Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
2020-11-27 18:49:49 +01:00
Sergey Dudoladov 6f5751fe55
raise log level for malformed secrets (#1235)
Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
2020-11-27 18:47:50 +01:00
Boyan Bonev 85d1a72cd6
Add scheduler name support - [Update #990] (#1226)
* Add ability to specify alternative schedulers via schedulerName.

Co-authored-by: micah.coletti@gmail.com <micah.coletti@gmail.com>
2020-11-25 10:55:05 +01:00
Jan Mussler c4ae11629b
Fix connection pooler deployment selectors (#1213)
Stick with the existing pooler deployment selector labels to make it compatible with existing deployments.

Make the use of additional labels clear and avoid where not needed.

Deployment Selector and Service Selector now do not use extra labels, pod spec does.
2020-11-23 17:18:18 +01:00
Rafia Sabih 49158ecb68
Connection pooler for replica (#1127)
* Enable connection pooler for replica
* Refactor code for connection pooler
  - Move all the relevant code to a separate file
  - Move all the related tests to a separate file
  - Avoid using cluster where not required
  - Simplify the logic in sync and other methods
  - Cleanup of duplicated or unused code
* Fix labels for the replica pods
* Update deleteConnectionPooler to include role
* Adding test cases and other changes
   - Fix unit test and delete secret when required only
   - Make sure we use empty fresh cluster for every test case.
* enhance e2e test
* Disable pooler in complete manifest as this is source for e2e too an creates unnecessary pooler setups.

Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
Co-authored-by: Jan Mussler <janm81@gmail.com>
2020-11-13 14:52:21 +01:00
Felix Kunde 3fed565328
check resize mode on update events (#1194)
* check resize mode on update events

* add unit test for PVC resizing

* set resize mode to pvc in charts and manifests

* add test for quantityToGigabyte

* just one debug line for syncing volumes

* extend test and update log msg
2020-11-11 13:22:43 +01:00
Sergey Dudoladov e779eab22f
Update e2e pipeline (#1202)
* clean up after test_multi_namespace test

* see the PR description for complete list of changes

Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
2020-11-11 10:21:46 +01:00
Pavel Tumik db0d089e75
Fix cloning from GCS (#1176)
* Fix clone from gcs

* pass google credentials env var if using GS bucket

* remove requirement for timezone as GCS returns timestamp in local time to the region it is in

* Revert "remove requirement for timezone as GCS returns timestamp in local time to the region it is in"

This reverts commit ac4eb350d9.

* update GCS documentation

* remove sentence about logical backups

* reword pod environment configmap section

* fix documentation
2020-11-03 15:05:44 +01:00
Sergey Dudoladov 4f3bb6aa8c
Remove operator checks that prevent PG major version upgrade (#1160)
* remove checks that prevent major version upgrade

Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
2020-11-02 16:49:29 +01:00
Jan Mussler c694a72352
Make failure in retry a warning not an error. (#1188) 2020-10-29 13:12:25 +01:00
Felix Kunde d658b9672e
PostgresTeam CRD for advanced team management (#1165)
* PostgresTeamCRD for advanced team management

* rework internal structure to be closer to CRD

* superusers instead of admin

* add more util functions and unit tests

* fix initHumanUsers

* check for superusers when creating normal teams

* polishing and fixes

* adding the essential missing pieces

* add documentation and update rbac

* reflect some feedback

* reflect more feedback

* fixing debug logs and raise QueueResyncPeriodTPR

* add two more flags to disable CRD and its superuser support

* fix chart

* update go modules

* move to client 1.19.3 and update codegen
2020-10-28 10:40:10 +01:00
Jan Mussler 3a86dfc8bb
End 2 End tests speedup (#1180)
* Improving end 2 end tests, especially speed of execution and error, by implementing proper eventual asserts and timeouts.
* Add documentation for running individual tests
* Fixed String encoding in Patorni state check and error case
* Printing config as multi log line entity, makes it readable and grepable on startup
* Cosmetic changes to logs. Removed quotes from diff. Move all object diffs to text diff. Enabled padding for log level.
* Mount script with tools for easy logaccess and watching objects.
* Set proper update strategy for Postgres operator deployment.
* Move long running test to end. Move pooler test to new functions.
* Remove quote from valid K8s identifiers.
2020-10-28 10:04:33 +01:00
preved911 d9f5d1c9df
changed PodEnvironmentSecret location namespace (#1177)
Signed-off-by: Ildar Valiullin <preved.911@gmail.com>
2020-10-22 08:49:30 +02:00
Dmitry Dolgov 1f5d0995a5
Lookup function installation (#1171)
* Lookup function installation

Due to reusing a previous database connection without closing it, lookup
function installation process was skipping the first database in the
list, installing twice into postgres db instead. To prevent that, make
internal initDbConnWithName to overwrite a connection object, and return
the same object only from initDbConn, which is sort of public interface.

Another solution for this would be to modify initDbConnWithName to
return a connection object and then generate one temporary connection
for each db. It sound feasible but after one attempt it seems it
requires a bit more changes around (init, close connections) and
doesn't bring anything significantly better on the table. In case if
some future changes will prove this wrong, do not hesitate to refactor.

Change retry strategy to more insistive one, namely:

* retry on the next sync even if we failed to process one database and
install pooler appliance.

* perform the whole installation unconditionally on update, since the
list of target databases could be changed.

And for the sake of making it even more robust, also log the case when
operator decides to skip installation.

Extend connection pooler e2e test with verification that all dbs have
required schema installed.
2020-10-19 16:18:58 +02:00
Dmitry Dolgov d15f2d3392
Readiness probe (#1169)
Right now there are no readiness probes defined for connection pooler,
which means after a pod restart there is a short time window (between a
container start and connection pooler starting listening to a socket)
when a service can send queries to a new pod, but connection will be
refused. The pooler container is rather lightweight and it start to
listen immediately, so the time window is small, but still.

To fix this add a readiness probe for tcp socket opened by connection
pooler.
2020-10-15 10:16:42 +02:00
Sergey Dudoladov 2a21cc4393
Compare Postgres pod priority on Sync (#1144)
* compare Postgres pod priority on Sync

Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
2020-09-23 17:26:56 +02:00
neelasha-09 ab95eaa6ef
Fixes #1130 (#1139)
* Fixes #1130

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-09-22 17:16:05 +02:00
Rico Berger d09e418b56
Set user and group in security context (#1083)
* Set user and group in security context
2020-09-15 13:27:59 +02:00
Igor Yanchenko d8884a4003
Allow to overwrite default ExternalTrafficPolicy for the service (#1136)
* Allow to overwrite default ExternalTrafficPolicy for the service
2020-09-15 13:19:22 +02:00
Felix Kunde dfd0dd90ed
set search_path for default roles (#1065)
* set search_path for default roles

* deployment back to 1.5.0

Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
2020-08-11 10:42:31 +02:00
Felix Kunde 0508266219
Remove all secrets on delete incl. pooler (#1091)
* fix syncSecrets and remove pooler secret

* update log for deleteSecret

* use c.credentialSecretName(username)

* minor fix
2020-08-10 18:26:26 +02:00
Felix Kunde 43163cf83b
allow using both infrastructure_roles_options (#1090)
* allow using both infrastructure_roles_options

* new default values for user and role definition

* use robot_zmon as parent role

* add operator log to debug

* right name for old secret

* only extract if rolesDefs is empty

* set password1 in old infrastructure role

* fix new infra rile secret

* choose different role key for new secret

* set memberof everywhere

* reenable all tests

* reflect feedback

* remove condition for rolesDefs
2020-08-10 15:08:03 +02:00
Felix Kunde f3ddce81d5
fix random order for pod environment tests (#1085) 2020-07-30 17:48:15 +02:00
hlihhovac 47b11f7f89
change Clone attribute of PostgresSpec to *CloneDescription (#1020)
* change Clone attribute of PostgresSpec to *ConnectionPooler

* update go.mod from master

* fix TestConnectionPoolerSynchronization()

* Update pkg/apis/acid.zalan.do/v1/postgresql_type.go

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>

Co-authored-by: Pavlo Golub <pavlo.golub@gmail.com>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-07-30 16:31:29 +02:00
Felix Kunde 3bee590d43
fix index in TestGenerateSpiloPodEnvVarswq (#1084)
Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
2020-07-30 13:35:37 +02:00
Christian Rohmann ece341d516
Allow pod environment variables to also be sourced from a secret (#946)
* Extend operator configuration to allow for a pod_environment_secret just like pod_environment_configmap

* Add all keys from PodEnvironmentSecrets as ENV vars (using SecretKeyRef to protect the value)

* Apply envVars from pod_environment_configmap and pod_environment_secrets before doing the global settings from the operator config. This allows them to be overriden by the user (via configmap / secret)

* Add ability use a Secret for custom pod envVars (via pod_environment_secret) to admin documentation

* Add pod_environment_secret to Helm chart values.yaml

* Add unit tests for PodEnvironmentConfigMap and PodEnvironmentSecret - highly inspired by @kupson and his very similar PR #481

* Added new parameter pod_environment_secret to operatorconfig CRD and configmap examples

* Add pod_environment_secret to the operationconfiguration CRD

Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>
2020-07-30 10:48:16 +02:00
Igor Yanchenko 002b47ec32
Use scram-sha-256 hash if postgresql parameter password_encryption set to do so. (#995)
* Use scram-sha-256 hash if postgresql parameter password_encryption set to do so.

* test fixed

* Refactoring

* code style
2020-07-16 14:43:57 +02:00
Felix Kunde 375963424d
delete secrets the right way (#1054)
* delete secrets the right way

* make a one function

* continue deleting secrets even if one delete fails

Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
2020-07-10 15:07:42 +02:00
Igor Yanchenko 88735a798a
Resize volume by changing pvc size if enabled in config. (#958)
* Try to resize pvc if resizing pv has failed

* added config option to switch between storage resize strategies

* changes according to requests

* Update pkg/controller/operator_config.go

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>

* enable_storage_resize documented

added examples to the default configuration and helm value files

* enable_storage_resize renamed to volume_resize_mode, off by default

* volume_resize_mode renamed to storage_resize_mode

* Update pkg/apis/acid.zalan.do/v1/crds.go

* pkg/cluster/volumes.go updated

* Update docs/reference/operator_parameters.md

* Update manifests/postgresql-operator-default-configuration.yaml

* Update pkg/controller/operator_config.go

* Update pkg/util/config/config.go

* Update charts/postgres-operator/values-crd.yaml

* Update charts/postgres-operator/values.yaml

* Update docs/reference/operator_parameters.md

* added logging if no changes required

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-07-03 10:53:37 +02:00
Felix Kunde 0c6655a22d
skip creation later to improve visibility of errors (#1013)
* try to emit error for missing team name in cluster name

* skip creation after new cluster object

* move SetStatus to k8sclient and emit event when skipping creation and rename to SetPostgresCRDStatus

Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
2020-06-17 13:32:16 +02:00
Felix Kunde fa6929f028
do not block rolling updates with lazy spilo update enabled (#1012)
* do not block rolling updates with lazy spilo update enabled

* treat initContainers like Spilo image

Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
2020-06-11 12:23:39 +02:00
Felix Kunde fe7ffaa112
trigger rolling update when securityContext of PodTemplate changes (#1007)
Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
2020-06-09 10:27:57 +02:00
alfredw33 2b0def5bc8
Support for GCS WAL-E backups (#620)
* Support for WAL_GS_BUCKET and GOOGLE_APPLICATION_CREDENTIALS environtment variables

* Fixed merge issue but also removed all changes to support macos.

* Updated test to new format

* Missed macos specific changes

* Added documentation and addressed comments

* Update docs/administrator.md

* Update docs/administrator.md

* Update e2e/run.sh

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-06-03 17:33:48 +02:00
Steffen Pøhner Henriksen 0fa61a6ab3
Changed order of sidecar env vars (#980)
* Changed order of sidecar env vars

* Cleaned up test code
2020-05-25 16:32:33 +02:00
Felix Kunde 3a49b485e5
delete secrets of system users too (#974) 2020-05-14 11:34:02 +02:00
Christian Rohmann 8ff7658ed3
Fix pooler delete (#960)
deleteConnectionPooler function incorrectly checks that the delete api response is ResourceNotFound. Looks like the only consequence is a confusing log message, but obviously it's wrong. Remove negation, since having ResourceNotFound as error is the good case.

Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>
2020-05-13 14:55:54 +02:00
Ask Bjørn Hansen 852f29274a
Fix typo in error message (#969) 2020-05-12 10:05:42 +02:00
Rafia Sabih d52296c323
Propagate annotations to the StatefulSet (#932)
* Initial commit

* Corrections

- set the type of the new  configuration parameter to be array of
  strings
- propagate the annotations to statefulset at sync

* Enable regular expression matching

* Improvements

-handle rollingUpdate flag
-modularize code
-rename config parameter name

* fix merge error

* Pass annotations to connection pooler deployment

* update code-gen

* Add documentation and update manifests

* add e2e test and introduce option in configmap

* fix service annotations test

* Add unit test

* fix e2e tests

* better key lookup of annotations tests

* add debug message for annotation tests

* Fix typos

* minor fix for looping

* Handle update path and renaming

- handle the update path to update sts and connection pooler deployment.
  This way no need to wait for sync
- rename the parameter to downscaler_annotations
- handle other review comments

* another try to fix python loops

* Avoid unneccessary update events

* Update manifests

* some final polishing

* fix cluster_test after polishing

Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-05-04 14:46:56 +02:00
Felix Kunde d76203b3f9
Bootstrapped databases with best practice role setup (#843)
* PreparedDatabases with default role setup

* merge changes from master

* include preparedDatabases spec check when syncing databases

* create a default preparedDB if not specified

* add more default privileges for schemas

* use empty brackets block for undefined objects

* cover more default privilege scenarios and always define admin role

* add DefaultUsers flag

* support extensions and defaultUsers for preparedDatabases

* remove exact version in deployment manifest

* enable CRD validation for new field

* update generated code

* reflect code review

* fix typo in SQL command

* add documentation for preparedDatabases feature + minor changes

* some datname should stay

* add unit tests

* reflect some feedback

* init users for preparedDatabases also on update

* only change DB default privileges on creation

* add one more section in user docs

* one more sentence
2020-04-29 10:56:06 +02:00
Sergey Dudoladov cc635a02e3
Lazy upgrade of the Spilo image (#859)
* initial implementation

* describe forcing the rolling upgrade

* make parameter name more descriptive

* add missing pieces

* address review

* address review

* fix bug in e2e tests

* fix cluster name label in e2e test

* raise test timeout

* load spilo test image

* use available spilo image

* delete replica pod for lazy update test

* fix e2e

* fix e2e with a vengeance

* lets wait for another 30m

* print pod name in error msg

* print pod name in error msg 2

* raise timeout, comment other tests

* subsequent updates of config

* add comma

* fix e2e test

* run unit tests before e2e

* remove conflicting dependency

* Revert "remove conflicting dependency"

This reverts commit 65fc09054b.

* improve cdp build

* dont run unit before e2e tests

* Revert "improve cdp build"

This reverts commit e2a8fa12aa.

Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-04-29 10:07:14 +02:00
Sergey Dudoladov 0ca30ba3d9
fix params in function call (#939)
Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
2020-04-28 09:31:41 +02:00
Björn Fischer 168abfe37b
Fully speced global sidecars (#890)
* implement fully speced global sidecars

* fix issue #924
2020-04-27 17:40:22 +02:00
Christian Rohmann 21b9b6fcbe
Emit K8S events to the postgresql CR as feedback to the requestor / user (#896)
* Add EventsGetter to KubeClient to enable to sending K8S events

* Add eventRecorder to the controller, initialize it and hand it down to cluster via its constructor to enable it to emit events this way

* Add first set of events which then go to the postgresql custom resource the user interacts with to provide some feedback

* Add right to "create" events to operator cluster role

* Adapt cluster tests to new function sigurature with eventRecord (via NewFakeRecorder)

* Get a proper reference before sending events to a resource

Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>
2020-04-27 08:22:07 +02:00