Commit Graph

225 Commits

Author SHA1 Message Date
Felix Kunde d76203b3f9
Bootstrapped databases with best practice role setup (#843)
* PreparedDatabases with default role setup

* merge changes from master

* include preparedDatabases spec check when syncing databases

* create a default preparedDB if not specified

* add more default privileges for schemas

* use empty brackets block for undefined objects

* cover more default privilege scenarios and always define admin role

* add DefaultUsers flag

* support extensions and defaultUsers for preparedDatabases

* remove exact version in deployment manifest

* enable CRD validation for new field

* update generated code

* reflect code review

* fix typo in SQL command

* add documentation for preparedDatabases feature + minor changes

* some datname should stay

* add unit tests

* reflect some feedback

* init users for preparedDatabases also on update

* only change DB default privileges on creation

* add one more section in user docs

* one more sentence
2020-04-29 10:56:06 +02:00
Sergey Dudoladov cc635a02e3
Lazy upgrade of the Spilo image (#859)
* initial implementation

* describe forcing the rolling upgrade

* make parameter name more descriptive

* add missing pieces

* address review

* address review

* fix bug in e2e tests

* fix cluster name label in e2e test

* raise test timeout

* load spilo test image

* use available spilo image

* delete replica pod for lazy update test

* fix e2e

* fix e2e with a vengeance

* lets wait for another 30m

* print pod name in error msg

* print pod name in error msg 2

* raise timeout, comment other tests

* subsequent updates of config

* add comma

* fix e2e test

* run unit tests before e2e

* remove conflicting dependency

* Revert "remove conflicting dependency"

This reverts commit 65fc09054b.

* improve cdp build

* dont run unit before e2e tests

* Revert "improve cdp build"

This reverts commit e2a8fa12aa.

Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-04-29 10:07:14 +02:00
Felix Kunde 1d009d9595
bump spilo and pooler version + update docs (#945) 2020-04-28 16:01:13 +02:00
Björn Fischer 168abfe37b
Fully speced global sidecars (#890)
* implement fully speced global sidecars

* fix issue #924
2020-04-27 17:40:22 +02:00
Christian Rohmann 21b9b6fcbe
Emit K8S events to the postgresql CR as feedback to the requestor / user (#896)
* Add EventsGetter to KubeClient to enable to sending K8S events

* Add eventRecorder to the controller, initialize it and hand it down to cluster via its constructor to enable it to emit events this way

* Add first set of events which then go to the postgresql custom resource the user interacts with to provide some feedback

* Add right to "create" events to operator cluster role

* Adapt cluster tests to new function sigurature with eventRecord (via NewFakeRecorder)

* Get a proper reference before sending events to a resource

Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>
2020-04-27 08:22:07 +02:00
Sergey Dudoladov 3c91bdeffa
Re-create pods only if all replicas are running (#903)
* adds a Get call to Patroni interface to fetch state of a Patroni member
* postpones re-creating pods if at least one replica is currently being created  

Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-04-20 15:14:11 +02:00
Dmitry Dolgov a1f2bd05b9
Prevent superuser from being a connection pool user (#906)
* Protected and system users can't be a connection pool user

It's not supported, neither it's a best practice. Also fix potential null
pointer access. For protected users it makes sense by intent of protecting this
users (e.g. from being overriden or used as something else than supposed). For
system users the reason is the same as for superuser, it's about replicastion
user and it's under patroni control.

This is implemented on both levels, operator config and postgresql manifest.
For the latter we just use default name in this case, assuming that operator
config is always correct. For the former, since it's a serious
misconfiguration, operator panics.
2020-04-09 09:21:45 +02:00
ReSearchITEng 1249626a60
kubernetes_use_configmap (#887)
* kubernetes_use_configmap

* Update manifests/postgresql-operator-default-configuration.yaml

Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>

* Update manifests/configmap.yaml

Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>

* Update charts/postgres-operator/values.yaml

Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>

* go.fmt

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-04-02 13:20:45 +02:00
Felix Kunde b43b22dfcc
Call me pooler, not pool (#883)
* rename pooler parts and add example to manifest
* update codegen
* fix manifest and add more details to docs
* reflect renaming also in e2e tests
2020-04-01 10:34:03 +02:00
Felix Kunde 66f2cda87f
Move operator to go 1.14 (#882)
* update go modules march 2020
* update to GO 1.14
* reflect k8s client API changes
2020-03-30 15:50:17 +02:00
Felix Kunde ba9cf68650
Change type of pod environment config map to NamespacedName (#870)
* allow PodEnvironmentConfigMap in other namespaces
* update codegen
* update docs and comments
2020-03-25 15:59:31 +01:00
Dmitry Dolgov 9dfa433363
Connection pooler (#799)
Connection pooler support

Add support for a connection pooler. The idea is to make it generic enough to
be able to switch between different implementations (e.g. pgbouncer or
odyssey). Operator needs to create a deployment with pooler and a service for
it to access.

For connection pool to work properly, a database needs to be prepared by
operator, namely a separate user have to be created with an access to an
installed lookup function (to fetch credential for other users).

This setups is supposed to be used only by robot/application users. Usually a
connection pool implementation is more CPU bounded, so it makes sense to create
several pods for connection pool with more emphasize on cpu resources. At the
moment there are no special affinity or tolerations assigned to bring those
pods closer to the database. For availability purposes minimal number of
connection pool pods is 2, ideally they have to be distributed between
different nodes/AZ, but it's not enforced in the operator itself. Available
configuration supposed to be ergonomic and in the normal case require minimum
changes to a manifest to enable connection pool. To have more control over the
configuration and functionality on the pool side one can customize the
corresponding docker image.

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-03-25 12:57:26 +01:00
Fredrik Østrem 9ddee8f302
Use cryptographically secure password generation (#854)
The current password generation algorithm is extremely deterministic, due to being based on the standard random number generator with a deterministic seed based on the current Unix timestamp (in seconds).

This can lead to a number of security issues, including:

The same passwords being used in different Kubernetes clusters if the operator is deployed in parallel. (This issue was discovered because of four deployments having the same generated passwords due to automatically being deployed in parallel.)
The passwords being easily guessable based on the time the operator pod started when the database was created. (This would typically be present in logs, metrics, etc., that may typically be accessible to more people than should have database access.)
Fix this issue by replacing the current randomness source with crypto/rand, which should produce cryptographically secure random data that is virtually unguessable. This will avoid both of the above problems as each deployment will be guaranteed to have unique, indeterministic passwords.
2020-03-18 10:28:39 +01:00
Felix Kunde cf829df1a4
define ownership between operator and clusters via annotation (#802)
* define ownership between operator and postgres clusters
* add documentation
* add unit test
2020-03-17 16:34:31 +01:00
Felix Kunde b24da3201c
bump version to 1.4.0 + some polishing (#839)
* bump version to 1.4.0 + some polishing
* align version for UI chart
* update user docs to warn for standby replicas
* minor log message changes for RBAC resources
2020-02-25 09:50:54 +01:00
Felix Kunde e2a9b03913
bump spilo version to latest release (#836) 2020-02-20 16:21:21 +01:00
Felix Kunde aea9e9bd33
postgres-pod clusterrole (#832)
* define postgres-pod clusterrole and align rbac in chart
* align UI chart rbac with operator and update doc
* operator RBAC needs podsecuritypolicy to grant it to postgres-pod
2020-02-19 12:32:54 +01:00
Jonathan Juares Beber 4b440e59de
Fix test flakiness on TestSameService (#833)
The code added on #818 depends on map sorting to return a static reason
for service annotation changes. To avoid tests flakiness and map sorting
the tests include a `strings.HasPrefix` instead of comparing the whole
string. One of the test cases,
`service_removes_a_custom_annotation,_adds_a_new_one_and_change_another`,
is trying to test the whole reason string.

This commit replaces the test case reason, for only the reason prefix.
It removes the flakiness from the tests. As all the cases (annotation
adding, removing and value changing) are tested before, it's safe to
test only prefixes.

Also, it renames the test name from `TestServiceAnnotations` to
`TestSameService` and introduces a better description in case of test
failure, describing that only prefixes are tested.
2020-02-18 16:45:44 +01:00
Felix Kunde 702a194c41
switch to rbac/v1 (#829)
* switch to rbac/v1
2020-02-17 11:25:07 +01:00
Jonathan Juares Beber 744c71d16b
Allow services update when changing annotations (#818)
The current implementations for `pkg.util.k8sutil.SameService` considers
only service annotations change on the default annotations created by the
operator. Custom annotations are not compared and consequently not
applied after the first service creation.

This commit introduces a complete annotations comparison between the
current service created by the operator and the new one generated based on
the configs. Also, it adds tests on the above-mentioned function.
2020-02-13 10:55:30 +01:00
Vito Botta a660d758a5 Add region setting for logical backups to non-AWS storage (#813)
* Add region setting for logical backups to non-AWS storage
2020-02-10 11:48:24 +01:00
Felix Kunde 1f0312a014
make minimum limits boundaries configurable (#808)
* make minimum limits boundaries configurable
* add e2e test
2020-02-03 11:43:18 +01:00
Felix Kunde 7af1de890c
bump operator v1.3.0 with Spilo 12 image (#770) 2019-12-17 17:13:56 +01:00
Felix Kunde 182e3bc7db
add missing fields to OperatorConfiguration CRD validation (#767) 2019-12-16 17:08:09 +01:00
Felix Kunde 97e0d6d388
extend docs and polish manifest examples (#762) 2019-12-12 17:55:41 +01:00
Felix Kunde cd110aabf4
Enforce minimum cpu and memory limits (#731)
* add validation for PG resources and volume size
* check resource requests also on UPDATE and SYNC + update docs
* if cluster was running don't error on sync
2019-12-12 16:43:55 +01:00
Felix Kunde 107334fe71
Add global option to enable/disable init containers and sidecars (#478)
* Add global option to enable/disable init containers and sidecars
* update dependencies
2019-12-10 15:45:54 +01:00
Felix Kunde a3b34f146f
Add CRD validation (#599)
* add CRD manifests with validation
* update documentation
* patroni slots is not an array but a nested hash map
* make deps call tools
* cover validation in docs and export it in crds.go
* add toggle to disable creation of CRD validation and document it
* use templated service account also for CRD-configured helm deployment
2019-11-28 12:02:05 +01:00
Armin Nesiren 5f87384d7f Passing endpoint, access and secret key to logical-backup container (#628)
* Added possibility to add custom annotations to LoadBalancer service.

* Added parameters for custom endpoint, access and secret key for logical backup.

* Modified dump.sh so it knows how to handle new features. Configurable S3 SSE
2019-11-26 10:40:49 +01:00
Felix Kunde 2ce602fcd7 fix errors when changing service type (#716)
* fix errors when changing service type

* nullify service and endpoint before recreation

* improve wait for delete logic and reuse config parameters
2019-11-26 10:28:32 +01:00
Thomas Runyon 535517cd1b Custom annotations 329 (#657)
* Add ability for custom annotations to database pods
2019-11-11 10:45:35 +01:00
Felix Kunde f0e29060b1
move StatefulSet to apps/v1 (#675) 2019-09-30 16:42:04 +02:00
Weilu Jia e00b37fc17 Handle IPv6 k8s pods in Patroni URLs (#671)
* Handle IPv6 Patroni URLs
2019-09-30 10:14:27 +02:00
Sergey Dudoladov cf97ebb2b8 fix e2e tests (#672)
* fix e2e tests
* change Spilo version everywhere
2019-09-23 17:48:53 +02:00
Felix Kunde 4a863d2280 Avoid orphaned objects on delete (#654)
* Make setSpec function work correctly when updating cluster status fails
2019-08-27 12:54:35 +02:00
Felix Kunde cd350a4bc1
make run.sh executable from within e2e (#619) 2019-07-24 15:07:32 +02:00
Felix Kunde 7c19cf50db
align config map, operator config, helm chart values and templates (#595)
* align config map, operator config, helm chart values and templates
* follow helm chart conventions also in CRD templates
* split up values files and add comments
* avoid yaml confusion in postgres manifests
* bump spilo version and use example for logical_backup_s3_bucket
* add ConfigTarget switch to values
2019-07-08 17:49:25 +02:00
Felix Kunde 36003b8264
enable shmVolume setting in OperatorConfiguration (#605)
* enable shmVolume setting in OperatorConfiguration
2019-07-05 16:48:37 +02:00
Markus 93bfed3e75 Add secret mount to operator (#535)
* add secret mount to operator
2019-06-19 12:40:49 +02:00
Felix Kunde 6918394562
Add PDB configuration toggle (#583)
* Don't create an impossible disruption budget for smaller clusters.
* sync PDB also on update
2019-06-18 10:48:21 +02:00
Aaron Miller ec5b1d4d58 StatefulSet fsGroup config option to allow non-root spilo (#531)
* StatefulSet fsGroup config option to allow non-root spilo

* Allow Postgres CRD to overide SpiloFSGroup of the Operator.

* Document FSGroup of a Pod cannot be changed after creation.
2019-06-04 16:38:26 +02:00
Felix Kunde 5a0e95ac45
Add CRD configuration to Helm chart values.yaml (#559)
* add templates for CRDs incl. crd-install hooks
* support both config styles in values.yaml
* fix ServiceAccount naming in values.yaml
2019-06-03 14:48:32 +02:00
Erik Inge Bolsø ebda39368e database.go: remove hardcoded .svc.cluster.local dns suffix (#561)
* database.go: substitute hardcoded .svc.cluster.local dns suffix with config parameter

Use the pod's configured dns search path, for clusters where .svc.cluster.local is not correct.
2019-05-31 16:32:00 +02:00
Sergey Dudoladov f3e1e80aaf
Add logical backup (#442)
* Add k8s cron job to spawn logical backups

* Minor doc updates
2019-05-16 15:52:01 +02:00
Felix Kunde 0fbfbb23bb
Use /status subresource instead of plain manifest field (#534)
* turns PostgresStatus type into a struct with field PostgresClusterStatus
* setStatus patch target is now /status subresource
* unmarshalling PostgresStatus takes care of previous status field convention
* new simple bool functions status.Running(), status.Creating()
2019-05-07 12:01:45 +02:00
Aaron Miller 15ec6a920d Config option to allow Spilo container to run non-privileged. (#525)
* Config option to allow Spilo container to run non-privileged.

Runs non-privileged by default.

Fixes #395

* add spilo_privileged to manifests/configmap.yaml

* add spilo_privileged to helm chart's values.yaml
2019-04-03 17:13:39 +02:00
Sergey Dudoladov 0b53dbe5dc
Set statefulset update and management policy explicitly (#515)
* fix logging in retry

* explicitly set the stateful set update strategy to onDelete

* add podManagementPolicy
2019-03-13 11:49:18 +01:00
Vineeth Reddy db72d82f14 gofmt and golint fixes (#506)
* fix gofmt and golint issues
2019-03-04 13:13:55 +01:00
Sergey Dudoladov f400539b69
Retry moving master pods (#463)
* Retry moving master pods

* bump up master pod wait timeout
2019-02-28 16:19:27 +01:00
Felix Kunde 31e568157b reflect change in github url (#496)
Project was moved from the incubator to the Zalando main org, hence the rename
2019-02-25 11:26:55 +01:00