Commit Graph

269 Commits

Author SHA1 Message Date
Dmitry Dolgov a1f2bd05b9
Prevent superuser from being a connection pool user (#906)
* Protected and system users can't be a connection pool user

It's not supported, neither it's a best practice. Also fix potential null
pointer access. For protected users it makes sense by intent of protecting this
users (e.g. from being overriden or used as something else than supposed). For
system users the reason is the same as for superuser, it's about replicastion
user and it's under patroni control.

This is implemented on both levels, operator config and postgresql manifest.
For the latter we just use default name in this case, assuming that operator
config is always correct. For the former, since it's a serious
misconfiguration, operator panics.
2020-04-09 09:21:45 +02:00
ReSearchITEng 1249626a60
kubernetes_use_configmap (#887)
* kubernetes_use_configmap

* Update manifests/postgresql-operator-default-configuration.yaml

Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>

* Update manifests/configmap.yaml

Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>

* Update charts/postgres-operator/values.yaml

Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>

* go.fmt

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-04-02 13:20:45 +02:00
Felix Kunde b43b22dfcc
Call me pooler, not pool (#883)
* rename pooler parts and add example to manifest
* update codegen
* fix manifest and add more details to docs
* reflect renaming also in e2e tests
2020-04-01 10:34:03 +02:00
Felix Kunde 66f2cda87f
Move operator to go 1.14 (#882)
* update go modules march 2020
* update to GO 1.14
* reflect k8s client API changes
2020-03-30 15:50:17 +02:00
Felix Kunde ba9cf68650
Change type of pod environment config map to NamespacedName (#870)
* allow PodEnvironmentConfigMap in other namespaces
* update codegen
* update docs and comments
2020-03-25 15:59:31 +01:00
Dmitry Dolgov 9dfa433363
Connection pooler (#799)
Connection pooler support

Add support for a connection pooler. The idea is to make it generic enough to
be able to switch between different implementations (e.g. pgbouncer or
odyssey). Operator needs to create a deployment with pooler and a service for
it to access.

For connection pool to work properly, a database needs to be prepared by
operator, namely a separate user have to be created with an access to an
installed lookup function (to fetch credential for other users).

This setups is supposed to be used only by robot/application users. Usually a
connection pool implementation is more CPU bounded, so it makes sense to create
several pods for connection pool with more emphasize on cpu resources. At the
moment there are no special affinity or tolerations assigned to bring those
pods closer to the database. For availability purposes minimal number of
connection pool pods is 2, ideally they have to be distributed between
different nodes/AZ, but it's not enforced in the operator itself. Available
configuration supposed to be ergonomic and in the normal case require minimum
changes to a manifest to enable connection pool. To have more control over the
configuration and functionality on the pool side one can customize the
corresponding docker image.

Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
2020-03-25 12:57:26 +01:00
Fredrik Østrem 9ddee8f302
Use cryptographically secure password generation (#854)
The current password generation algorithm is extremely deterministic, due to being based on the standard random number generator with a deterministic seed based on the current Unix timestamp (in seconds).

This can lead to a number of security issues, including:

The same passwords being used in different Kubernetes clusters if the operator is deployed in parallel. (This issue was discovered because of four deployments having the same generated passwords due to automatically being deployed in parallel.)
The passwords being easily guessable based on the time the operator pod started when the database was created. (This would typically be present in logs, metrics, etc., that may typically be accessible to more people than should have database access.)
Fix this issue by replacing the current randomness source with crypto/rand, which should produce cryptographically secure random data that is virtually unguessable. This will avoid both of the above problems as each deployment will be guaranteed to have unique, indeterministic passwords.
2020-03-18 10:28:39 +01:00
Felix Kunde cf829df1a4
define ownership between operator and clusters via annotation (#802)
* define ownership between operator and postgres clusters
* add documentation
* add unit test
2020-03-17 16:34:31 +01:00
Felix Kunde b24da3201c
bump version to 1.4.0 + some polishing (#839)
* bump version to 1.4.0 + some polishing
* align version for UI chart
* update user docs to warn for standby replicas
* minor log message changes for RBAC resources
2020-02-25 09:50:54 +01:00
Felix Kunde e2a9b03913
bump spilo version to latest release (#836) 2020-02-20 16:21:21 +01:00
Felix Kunde aea9e9bd33
postgres-pod clusterrole (#832)
* define postgres-pod clusterrole and align rbac in chart
* align UI chart rbac with operator and update doc
* operator RBAC needs podsecuritypolicy to grant it to postgres-pod
2020-02-19 12:32:54 +01:00
Jonathan Juares Beber 4b440e59de
Fix test flakiness on TestSameService (#833)
The code added on #818 depends on map sorting to return a static reason
for service annotation changes. To avoid tests flakiness and map sorting
the tests include a `strings.HasPrefix` instead of comparing the whole
string. One of the test cases,
`service_removes_a_custom_annotation,_adds_a_new_one_and_change_another`,
is trying to test the whole reason string.

This commit replaces the test case reason, for only the reason prefix.
It removes the flakiness from the tests. As all the cases (annotation
adding, removing and value changing) are tested before, it's safe to
test only prefixes.

Also, it renames the test name from `TestServiceAnnotations` to
`TestSameService` and introduces a better description in case of test
failure, describing that only prefixes are tested.
2020-02-18 16:45:44 +01:00
Felix Kunde 702a194c41
switch to rbac/v1 (#829)
* switch to rbac/v1
2020-02-17 11:25:07 +01:00
Jonathan Juares Beber 744c71d16b
Allow services update when changing annotations (#818)
The current implementations for `pkg.util.k8sutil.SameService` considers
only service annotations change on the default annotations created by the
operator. Custom annotations are not compared and consequently not
applied after the first service creation.

This commit introduces a complete annotations comparison between the
current service created by the operator and the new one generated based on
the configs. Also, it adds tests on the above-mentioned function.
2020-02-13 10:55:30 +01:00
Vito Botta a660d758a5 Add region setting for logical backups to non-AWS storage (#813)
* Add region setting for logical backups to non-AWS storage
2020-02-10 11:48:24 +01:00
Felix Kunde 1f0312a014
make minimum limits boundaries configurable (#808)
* make minimum limits boundaries configurable
* add e2e test
2020-02-03 11:43:18 +01:00
Felix Kunde 7af1de890c
bump operator v1.3.0 with Spilo 12 image (#770) 2019-12-17 17:13:56 +01:00
Felix Kunde 182e3bc7db
add missing fields to OperatorConfiguration CRD validation (#767) 2019-12-16 17:08:09 +01:00
Felix Kunde 97e0d6d388
extend docs and polish manifest examples (#762) 2019-12-12 17:55:41 +01:00
Felix Kunde cd110aabf4
Enforce minimum cpu and memory limits (#731)
* add validation for PG resources and volume size
* check resource requests also on UPDATE and SYNC + update docs
* if cluster was running don't error on sync
2019-12-12 16:43:55 +01:00
Felix Kunde 107334fe71
Add global option to enable/disable init containers and sidecars (#478)
* Add global option to enable/disable init containers and sidecars
* update dependencies
2019-12-10 15:45:54 +01:00
Felix Kunde a3b34f146f
Add CRD validation (#599)
* add CRD manifests with validation
* update documentation
* patroni slots is not an array but a nested hash map
* make deps call tools
* cover validation in docs and export it in crds.go
* add toggle to disable creation of CRD validation and document it
* use templated service account also for CRD-configured helm deployment
2019-11-28 12:02:05 +01:00
Armin Nesiren 5f87384d7f Passing endpoint, access and secret key to logical-backup container (#628)
* Added possibility to add custom annotations to LoadBalancer service.

* Added parameters for custom endpoint, access and secret key for logical backup.

* Modified dump.sh so it knows how to handle new features. Configurable S3 SSE
2019-11-26 10:40:49 +01:00
Felix Kunde 2ce602fcd7 fix errors when changing service type (#716)
* fix errors when changing service type

* nullify service and endpoint before recreation

* improve wait for delete logic and reuse config parameters
2019-11-26 10:28:32 +01:00
Thomas Runyon 535517cd1b Custom annotations 329 (#657)
* Add ability for custom annotations to database pods
2019-11-11 10:45:35 +01:00
Felix Kunde f0e29060b1
move StatefulSet to apps/v1 (#675) 2019-09-30 16:42:04 +02:00
Weilu Jia e00b37fc17 Handle IPv6 k8s pods in Patroni URLs (#671)
* Handle IPv6 Patroni URLs
2019-09-30 10:14:27 +02:00
Sergey Dudoladov cf97ebb2b8 fix e2e tests (#672)
* fix e2e tests
* change Spilo version everywhere
2019-09-23 17:48:53 +02:00
Felix Kunde 4a863d2280 Avoid orphaned objects on delete (#654)
* Make setSpec function work correctly when updating cluster status fails
2019-08-27 12:54:35 +02:00
Felix Kunde cd350a4bc1
make run.sh executable from within e2e (#619) 2019-07-24 15:07:32 +02:00
Felix Kunde 7c19cf50db
align config map, operator config, helm chart values and templates (#595)
* align config map, operator config, helm chart values and templates
* follow helm chart conventions also in CRD templates
* split up values files and add comments
* avoid yaml confusion in postgres manifests
* bump spilo version and use example for logical_backup_s3_bucket
* add ConfigTarget switch to values
2019-07-08 17:49:25 +02:00
Felix Kunde 36003b8264
enable shmVolume setting in OperatorConfiguration (#605)
* enable shmVolume setting in OperatorConfiguration
2019-07-05 16:48:37 +02:00
Markus 93bfed3e75 Add secret mount to operator (#535)
* add secret mount to operator
2019-06-19 12:40:49 +02:00
Felix Kunde 6918394562
Add PDB configuration toggle (#583)
* Don't create an impossible disruption budget for smaller clusters.
* sync PDB also on update
2019-06-18 10:48:21 +02:00
Aaron Miller ec5b1d4d58 StatefulSet fsGroup config option to allow non-root spilo (#531)
* StatefulSet fsGroup config option to allow non-root spilo

* Allow Postgres CRD to overide SpiloFSGroup of the Operator.

* Document FSGroup of a Pod cannot be changed after creation.
2019-06-04 16:38:26 +02:00
Felix Kunde 5a0e95ac45
Add CRD configuration to Helm chart values.yaml (#559)
* add templates for CRDs incl. crd-install hooks
* support both config styles in values.yaml
* fix ServiceAccount naming in values.yaml
2019-06-03 14:48:32 +02:00
Erik Inge Bolsø ebda39368e database.go: remove hardcoded .svc.cluster.local dns suffix (#561)
* database.go: substitute hardcoded .svc.cluster.local dns suffix with config parameter

Use the pod's configured dns search path, for clusters where .svc.cluster.local is not correct.
2019-05-31 16:32:00 +02:00
Sergey Dudoladov f3e1e80aaf
Add logical backup (#442)
* Add k8s cron job to spawn logical backups

* Minor doc updates
2019-05-16 15:52:01 +02:00
Felix Kunde 0fbfbb23bb
Use /status subresource instead of plain manifest field (#534)
* turns PostgresStatus type into a struct with field PostgresClusterStatus
* setStatus patch target is now /status subresource
* unmarshalling PostgresStatus takes care of previous status field convention
* new simple bool functions status.Running(), status.Creating()
2019-05-07 12:01:45 +02:00
Aaron Miller 15ec6a920d Config option to allow Spilo container to run non-privileged. (#525)
* Config option to allow Spilo container to run non-privileged.

Runs non-privileged by default.

Fixes #395

* add spilo_privileged to manifests/configmap.yaml

* add spilo_privileged to helm chart's values.yaml
2019-04-03 17:13:39 +02:00
Sergey Dudoladov 0b53dbe5dc
Set statefulset update and management policy explicitly (#515)
* fix logging in retry

* explicitly set the stateful set update strategy to onDelete

* add podManagementPolicy
2019-03-13 11:49:18 +01:00
Vineeth Reddy db72d82f14 gofmt and golint fixes (#506)
* fix gofmt and golint issues
2019-03-04 13:13:55 +01:00
Sergey Dudoladov f400539b69
Retry moving master pods (#463)
* Retry moving master pods

* bump up master pod wait timeout
2019-02-28 16:19:27 +01:00
Felix Kunde 31e568157b reflect change in github url (#496)
Project was moved from the incubator to the Zalando main org, hence the rename
2019-02-25 11:26:55 +01:00
teuto.net Netzdienste GmbH 26a7fdfa9f Add Pod Anti Affinity (#489)
* Add Pod Anti Affinity
2019-02-21 16:37:03 +01:00
Stephane T d11b23bd71 Add inherited_labels (#459)
* add support for inherited_labels

Signed-off-by: Stephane Tang <hi@stang.sh>

* update docs with inherited_labels

Signed-off-by: Stephane Tang <hi@stang.sh>
2019-02-14 12:29:06 +01:00
Armin Nesiren 6f6a599c90 Added possibility to add custom annotations to LoadBalancer service. (#461)
* Added possibility to add custom annotations to LoadBalancer service.
2019-01-25 11:35:27 +01:00
zerg-junior 4b5d3cd121
Fix golint failures
* Fix golint fails based on the original work from  the user u5surf

* Skip installing Docker as CDP now have one pre-installed (repairs builds on CDP)
2019-01-08 13:04:48 +01:00
zerg-junior c0b0b9a832
[WIP] Add 'admin' option to create role (#425)
* Add 'admin' option to create role

* Fix run_locally_script
2018-12-27 10:14:33 +01:00
Dmitry Dolgov d6e6b00770
Add shm_volume option (#427)
Add possibility to mount a tmpfs volume to /dev/shm to avoid issues like
[this](https://github.com/docker-library/postgres/issues/416). To achieve that
two new options were introduced:

* `enableShmVolume` to PostgreSQL manifest, to specify whether or not mount
this volume per database cluster

* `enable_shm_volume` to operator configuration, to specify whether or not mount
per operator.

The first one, `enableShmVolume` takes precedence to allow us to be more flexible.
2018-12-21 16:22:30 +01:00
zerg-junior 45c89b3da4
[WIP] Add set_memory_request_to_limit option (#406)
* Add set_memory_request_to_limit option
2018-11-15 14:00:08 +01:00
Noah Kantrowitz 0b75a89920 Fix the casing of github.com/Sirupsen/logrus to match what the project itself uses. (#380)
Dep enforces this.
2018-09-06 10:26:48 +02:00
zerg-junior 25fa45fd58 [WIP] Grant 'superuser' to the members of Postgres admin teams (#371)
Added support for superuser team in addition to the admin team that owns the postgres cluster.
2018-08-30 10:51:37 +02:00
Oleksii Kliukin e1ed4b847d
Use code-generation for CRD API and deepcopy methods (#369)
Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on.

Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API.

All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1

Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator.

Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into

Change the type of  the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it.

Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with
the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes.

Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files.

Update client-go dependencies.

Minor reformatting and renaming.
2018-08-15 17:22:25 +02:00
Oleksii Kliukin e933908084
Configure pg_hba in the local postgresql configuration of Patroni. (#361)
Previously, the operator put pg_hba into the bootstrap/pg_hba key of
Patroni. That had 2 adverse effects:
 - pg_hba.conf was shadowed by Spilo default section in the local
   postgresql configuration
 - when updating pg_hba in the cluster manifest, the updated lines were
   not propagated to DCS, since the key was defined in the boostrap
   section of Patroni.

Include some minor refactoring, moving methods to unexported when
possible and commenting out usage of md5, so that gosec won't complain.

Per https://github.com/zalando-incubator/postgres-operator/issues/330

Review by @zerg-junior
2018-08-08 11:01:26 +02:00
Oleksii Kliukin 199aa6508c
Populate list of clusters in the controller at startup. (#364)
Assign the list of clusters in the controller with the up-to-date list
of Postgres manifests on Kubernetes during the startup.

Node migration routines launched asynchronously to the cluster
processing rely on an up-to-date list of clusters in the controller to
detect clusters affected by the migration of the node and lock them
when doing migration of master pods. Without the initial list the
operator was subject to race conditions like the one described at
https://github.com/zalando-incubator/postgres-operator/issues/363

Restructure the code to decouple list cluster function required by the
postgresql informer from the one that emits cluster sync events. No
extra work is introduced, since cluster sync already runs in a separate
goroutine (clusterResync).

Introduce explicit initial cluster sync at the end of
acquireInitialListOfClusters instead of relying on an implicit one
coming from list function of the PostgreSQL informer.

Some minor refactoring.

Review by @zerg-junior
2018-08-08 11:00:56 +02:00
Oleksii Kliukin b06186eb41
Linter-induced code refactoring, run round 2. (#360)
Run more linters in the gometalinter, i.e. deadcode, megacheck,
nakedret, dup.

More consistent code formatting, remove two dead functions, eliminate
naked a bunch of naked returns, refactor a few functions to avoid code
duplication.
2018-08-06 12:09:19 +02:00
Oleksii Kliukin 59f0c5551e
Allow configuring pod priority globally and per cluster. (#353)
* Allow configuring pod priority globally and per cluster.

Allow to specify pod priority class for all pods managed by the operator,
as well as for those belonging to individual clusters.

Controlled by the pod_priority_class_name operator configuration
parameter and the podPriorityClassName manifest option.

See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
for the explanation on how to define priority classes since Kubernetes 1.8.

Some import order changes are due to go fmt.
Removal of OrphanDependents deprecated field.

Code review by @zerg-junior
2018-08-03 14:03:37 +02:00
Oleksii Kliukin ac7b132314
Refactoring inspired by gometalinter. (#357)
Among other things, fix a few issues with deepcopy implementation.
2018-08-03 11:09:45 +02:00
Oleksii Kliukin d2d3f21dc2 Client go upgrade v6 (#352)
There are shortcuts in this code, i.e. we created the deepcopy function
by using the deepcopy package instead of the generated code, that will
be addressed once migrated to client-go v8. Also, some objects,
particularly statefulsets, are still taken from v1beta, this will also
be addressed in further commits once the changes are stabilized.
2018-08-01 11:08:01 +02:00
Oleksii Kliukin 0181a1b5b1
Introduce a repair scan to fix failing clusters (#304)
A repair is a sync scan that acts only on those clusters that indicate
that the last add, update or sync operation on them has failed. It is
supposed to kick in more frequently than the repair scan. The repair
scan still remains to be useful to fix the consequences of external
actions (i.e. someone deletes a postgres-related service by mistake)
unbeknownst to the operator.

The repair scan is controlled by the new repair_period parameter in the
operator configuration. It has to be at least 2 times more frequent than
a sync scan to have any effect (a normal sync scan will update both last
synced and last repaired attributes of the controller, since repair is
just a sync underneath).

A repair scan could be queued for a cluster that is already being synced
if the sync period exceeds the interval between repairs. In that case a
repair event will be discarded once the corresponding worker finds out
that the cluster is not failing anymore.

Review by @zerg-junior
2018-07-24 11:21:45 +02:00
zerg-junior 417f13c0bd
Submit RBAC credentials during initial Event processing (#344)
* During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.
2018-07-19 16:40:40 +02:00
Oleksii Kliukin 3a9378d3b8
Allow configuring the operator via the YAML manifest. (#326)
* Up until now, the operator read its own configuration from the
configmap.  That has a number of limitations, i.e. when the
configuration value is not a scalar, but a map or a list. We use a
custom code based on github.com/kelseyhightower/envconfig to decode
non-scalar values out of plain text keys, but that breaks when the data
inside the keys contains both YAML-special elememtns (i.e. commas) and
complex quotes, one good example for that is search_path inside
`team_api_role_configuration`. In addition, reliance on the configmap
forced a flag structure on the configuration, making it hard to write
and to read (see
https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778).

The changes allow to supply the operator configuration in a proper YAML
file. That required registering a custom CRD to support the operator
configuration and provide an example at
manifests/postgresql-operator-default-configuration.yaml. At the moment,
both old configmap and the new CRD configuration is supported, so no
compatibility issues, however, in the future I'd like to deprecate the
configmap-based configuration altogether. Contrary to the
configmap-based configuration, the CRD one doesn't embed defaults into
the operator code, however, one can use the
manifests/postgresql-operator-default-configuration.yaml as a starting
point in order to build a custom configuration.

Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters
used to create the CRD were taken from the operator configuration, which
is not possible if the configuration itself is stored in the CRD object,
I've added the ability to specify them as environment variables
`CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively.

Per review by @zerg-junior  and  @Jan-M.
2018-07-16 16:20:46 +02:00
Oleksii Kliukin 25a306244f
Support for per-cluster and operator global sidecars (#331)
* Define sidecars in the operator configuration.

Right now only the name and the docker image can be defined, but with
the help of the pod_environment_configmap parameter arbitrary
environment variables can be passed to the sidecars.

* Refactoring around generatePodTemplate.

Original implementation of per-cluster sidecars by @theRealWardo 

Per review by @zerg-junior and @Jan-M
2018-07-02 16:25:27 +02:00
zerg-junior 7394c15d0a
Make AWS region configurable in the operator cofig map (#333) 2018-06-27 17:29:02 +02:00
Oleksii Kliukin 9cb48e0889
Document operator configuration parameters. (#313) 2018-06-08 13:21:57 +02:00
Oleksii Kliukin 04b660519a Fix exec into pods to resize volumes for multi-container pods.
The original code assumed only one container per pod.
2018-06-04 14:51:39 +02:00
Oleksii Kliukin 48a5744314
Use Patroni API to set bootstrap-only options. (#299)
Call Patroni API /config in order to set special options that are
ignored when set in the configuration file, such as max_connections.
Per https://github.com/zalando-incubator/postgres-operator/issues/297

* Some minor refacoring:

Rename Cluster ManualFailover to Swithover
Rename Patroni Failover to Switchover
Add more details to error messages and comments introduced in this PR.

Review by @zerg-junior
2018-05-29 12:35:25 +02:00
Sergey Dudoladov 2e041c50e6 Bump up default Spilo image 2018-05-28 16:54:27 +02:00
Manuel Gómez 32a1456a68
Update config.go 2018-05-24 16:58:46 +02:00
Sergey Dudoladov 749d723f55 Shorten the commen 2018-05-24 16:22:13 +02:00
Sergey Dudoladov 9824ddae5e Fix etcd_host default 2018-05-24 16:05:45 +02:00
Oleksii Kliukin 11d568bf65 Address code review by @zerg-junior
- new info messages, rename the annotation flag.
2018-05-15 16:50:03 +02:00
Oleksii Kliukin 0c616a802f Merge branch 'master' into rolling_updates_with_statefulset_annotations
# Conflicts:
#	pkg/cluster/k8sres.go
2018-05-15 15:33:34 +02:00
Oleksii Kliukin 987b43456b
Deprecate old LB options, fix endpoint sync. (#287)
* Depreate old LB options, fix endpoint sync.

- deprecate useLoadBalancer, replicaLoadBalancer from the manifest
  and enable_load_balancer from the operator configuration. The old
  operator configuration options become no-op with this commit. For
  the old manifest options, `useLoadBalancer` and `replicaLoadBalancer`
  are still consulted,  but only in the absense of the new ones
  (enableMasterLoadBalancer and enableReplicaLoadBalancer).

- Make sure the endpoint being created during the sync receives proper
  addresses subset. This is more critical for the replicas, as for the
  masters Patroni will normally re-create the endpoint before the
  operator.

- Avoid creating the replica endpoint, since it will be created automatically
  by the corresponding service.
- Update the README and unit tests.

Code review by @mgomezch and @zerg-junior
2018-05-15 15:19:18 +02:00
Oleksii Kliukin 332dab5237 Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations 2018-05-08 14:51:10 +02:00
Sergey Dudoladov 59ded0c212 Shorten bucket name 2018-05-02 14:05:57 +02:00
Sergey Dudoladov c45219bafa Set up an S3 bucket for the postgres daily logs 2018-05-02 12:52:42 +02:00
Sergey Dudoladov d99b553ec1 Convert default account definiton into JSON 2018-04-25 12:35:16 +02:00
Sergey Dudoladov e3f7fac443 Comment on the default value for pod service account name 2018-04-24 15:41:28 +02:00
Sergey Dudoladov 485ec4b8ea Move service account to Controller 2018-04-24 15:13:08 +02:00
Sergey Dudoladov c31c76281c Make operator unaware of its own service account 2018-04-23 14:38:20 +02:00
Sergey Dudoladov bd51d2922b Turn ServiceAccount into struct value to avoid race conditon during account creation 2018-04-20 13:05:05 +02:00
Sergey Dudoladov 214ae04aa7 Deploy service account for pod creation on demand 2018-04-18 16:20:20 +02:00
Sergey Dudoladov 96d46252f5 Change the default values to closer match previous behaviour 2018-03-26 11:43:46 +02:00
Sergey Dudoladov a8862aeee1 Enable backward compatibility for enable_load_balancer setting from operator configmap 2018-03-19 17:19:50 +01:00
Sergey Dudoladov 145689c950 Disable load balancer for master service by default (it may cost money) 2018-03-16 13:18:13 +01:00
Sergey Dudoladov 0986e56226 Add separate params for master and replica load balancers to operator configuration 2018-03-14 12:12:28 +01:00
Dmitry Dolgov bf4b0f0f33
Merge pull request #240 from zalando-incubator/feature/goreport-improvements
Some improvements for golint, ineffassign and misspell
2018-02-22 11:31:08 +01:00
Oleksii Kliukin cca73e30b7
Make code around recreating pods and creating objects in the database less brittle (#213)
There used to be a masterLess flag that was supposed to indicate whether the cluster it belongs to runs without the acting master by design. At some point, as we didn't really have support for such clusters, the flag has been misused to indicate there is no master in the cluster. However, that was not done consistently (a cluster without all pods running would never be masterless, even when the master is not among the running pods) and it was based on the wrong assumption that the masterless cluster will remain masterless until the next attempt to change that flag, ignoring the possibility of master coming up or some node doing a successful promotion. Therefore, this PR gets rid of that flag completely.

When the cluster is running with 0 instances, there is obviously no master and it makes no sense to create any database objects inside the non-existing master. Therefore, this PR introduces an additional check for that.

recreatePods were assuming that the roles of the pods recorded when the function has stared will not change; for instance, terminated replica pods should start as replicas. Revisit that assumption by looking at the actual role of the re-spawned pods; that avoids a failover if some replica has promoted to the master role while being re-spawned. In addition, if the failover from the old master was unsuccessful, we used to stop and leave the old master running on an old pod, without recording this fact anywhere. This PR makes the failover failure emit a warning, but not stop recreating the last master pod; in the worst case, the running master will be terminated, however, this case is rather unlikely one.

As a side effect, make waitForPodLabel return the pod definition it waited for, avoiding extra API calls in recreatePods and movePodFromEndOfLifeNode
2018-02-22 10:42:05 +01:00
Oleksii Kliukin 85f7c944c2 Improve the condition check. 2018-02-22 10:13:46 +01:00
Sergey Dudoladov e048328d6a Comment on special values for watched namespace 2018-02-20 17:26:17 +01:00
Sergey Dudoladov dcfc9925f6 Respond to code review 2018-02-20 14:43:02 +01:00
Dmitrii Dolgov a7cd859919 Some improvements for golint, ineffassign and misspell 2018-02-19 17:46:31 +01:00
Sergey Dudoladov 088bf70e7d Merge branch 'master' into support-many-namespaces 2018-02-16 15:06:10 +01:00
Sergey Dudoladov 06fd9e33f5 Watch the namespace where operator deploys to unless told otherwise 2018-02-13 18:17:47 +01:00
Dmitrii Dolgov 4c1db33c27 Change the order of arguments 2018-02-08 10:43:27 +01:00
Sergey Dudoladov de2a028592 Warn if the watched namespace does not exist 2018-02-07 17:43:05 +01:00
Dmitrii Dolgov dd79fcd036 Tests for retry_utils
One can argue about how necessary they are,
but at least I remembered how to do golang.
2018-02-07 17:04:43 +01:00
Sergey Dudoladov 74fa7b9492 Restrict operator to single watched namespace via env var 2018-02-07 16:44:49 +01:00