postgres-operator

Commit Graph

Author	SHA1	Message	Date
Felix Kunde	43163cf83b	allow using both infrastructure_roles_options (#1090 ) * allow using both infrastructure_roles_options * new default values for user and role definition * use robot_zmon as parent role * add operator log to debug * right name for old secret * only extract if rolesDefs is empty * set password1 in old infrastructure role * fix new infra rile secret * choose different role key for new secret * set memberof everywhere * reenable all tests * reflect feedback * remove condition for rolesDefs	2020-08-10 15:08:03 +02:00
Felix Kunde	f3ddce81d5	fix random order for pod environment tests (#1085 )	2020-07-30 17:48:15 +02:00
hlihhovac	47b11f7f89	change Clone attribute of PostgresSpec to CloneDescription (#1020 ) change Clone attribute of PostgresSpec to ConnectionPooler update go.mod from master * fix TestConnectionPoolerSynchronization() * Update pkg/apis/acid.zalan.do/v1/postgresql_type.go Co-authored-by: Felix Kunde <felix-kunde@gmx.de> Co-authored-by: Pavlo Golub <pavlo.golub@gmail.com> Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-07-30 16:31:29 +02:00
Felix Kunde	3bee590d43	fix index in TestGenerateSpiloPodEnvVarswq (#1084 ) Co-authored-by: Felix Kunde <felix.kunde@zalando.de>	2020-07-30 13:35:37 +02:00
Christian Rohmann	ece341d516	Allow pod environment variables to also be sourced from a secret (#946 ) * Extend operator configuration to allow for a pod_environment_secret just like pod_environment_configmap * Add all keys from PodEnvironmentSecrets as ENV vars (using SecretKeyRef to protect the value) * Apply envVars from pod_environment_configmap and pod_environment_secrets before doing the global settings from the operator config. This allows them to be overriden by the user (via configmap / secret) * Add ability use a Secret for custom pod envVars (via pod_environment_secret) to admin documentation * Add pod_environment_secret to Helm chart values.yaml * Add unit tests for PodEnvironmentConfigMap and PodEnvironmentSecret - highly inspired by @kupson and his very similar PR #481 * Added new parameter pod_environment_secret to operatorconfig CRD and configmap examples * Add pod_environment_secret to the operationconfiguration CRD Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>	2020-07-30 10:48:16 +02:00
Igor Yanchenko	002b47ec32	Use scram-sha-256 hash if postgresql parameter password_encryption set to do so. (#995 ) * Use scram-sha-256 hash if postgresql parameter password_encryption set to do so. * test fixed * Refactoring * code style	2020-07-16 14:43:57 +02:00
Felix Kunde	375963424d	delete secrets the right way (#1054 ) * delete secrets the right way * make a one function * continue deleting secrets even if one delete fails Co-authored-by: Felix Kunde <felix.kunde@zalando.de>	2020-07-10 15:07:42 +02:00
Igor Yanchenko	88735a798a	Resize volume by changing pvc size if enabled in config. (#958 ) * Try to resize pvc if resizing pv has failed * added config option to switch between storage resize strategies * changes according to requests * Update pkg/controller/operator_config.go Co-authored-by: Felix Kunde <felix-kunde@gmx.de> * enable_storage_resize documented added examples to the default configuration and helm value files * enable_storage_resize renamed to volume_resize_mode, off by default * volume_resize_mode renamed to storage_resize_mode * Update pkg/apis/acid.zalan.do/v1/crds.go * pkg/cluster/volumes.go updated * Update docs/reference/operator_parameters.md * Update manifests/postgresql-operator-default-configuration.yaml * Update pkg/controller/operator_config.go * Update pkg/util/config/config.go * Update charts/postgres-operator/values-crd.yaml * Update charts/postgres-operator/values.yaml * Update docs/reference/operator_parameters.md * added logging if no changes required Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-07-03 10:53:37 +02:00
Felix Kunde	0c6655a22d	skip creation later to improve visibility of errors (#1013 ) * try to emit error for missing team name in cluster name * skip creation after new cluster object * move SetStatus to k8sclient and emit event when skipping creation and rename to SetPostgresCRDStatus Co-authored-by: Felix Kunde <felix.kunde@zalando.de>	2020-06-17 13:32:16 +02:00
Felix Kunde	fa6929f028	do not block rolling updates with lazy spilo update enabled (#1012 ) * do not block rolling updates with lazy spilo update enabled * treat initContainers like Spilo image Co-authored-by: Felix Kunde <felix.kunde@zalando.de>	2020-06-11 12:23:39 +02:00
Felix Kunde	fe7ffaa112	trigger rolling update when securityContext of PodTemplate changes (#1007 ) Co-authored-by: Felix Kunde <felix.kunde@zalando.de>	2020-06-09 10:27:57 +02:00
alfredw33	2b0def5bc8	Support for GCS WAL-E backups (#620 ) * Support for WAL_GS_BUCKET and GOOGLE_APPLICATION_CREDENTIALS environtment variables * Fixed merge issue but also removed all changes to support macos. * Updated test to new format * Missed macos specific changes * Added documentation and addressed comments * Update docs/administrator.md * Update docs/administrator.md * Update e2e/run.sh Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-06-03 17:33:48 +02:00
Steffen Pøhner Henriksen	0fa61a6ab3	Changed order of sidecar env vars (#980 ) * Changed order of sidecar env vars * Cleaned up test code	2020-05-25 16:32:33 +02:00
Felix Kunde	3a49b485e5	delete secrets of system users too (#974 )	2020-05-14 11:34:02 +02:00
Christian Rohmann	8ff7658ed3	Fix pooler delete (#960 ) deleteConnectionPooler function incorrectly checks that the delete api response is ResourceNotFound. Looks like the only consequence is a confusing log message, but obviously it's wrong. Remove negation, since having ResourceNotFound as error is the good case. Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>	2020-05-13 14:55:54 +02:00
Ask Bjørn Hansen	852f29274a	Fix typo in error message (#969 )	2020-05-12 10:05:42 +02:00
Rafia Sabih	d52296c323	Propagate annotations to the StatefulSet (#932 ) * Initial commit * Corrections - set the type of the new configuration parameter to be array of strings - propagate the annotations to statefulset at sync * Enable regular expression matching * Improvements -handle rollingUpdate flag -modularize code -rename config parameter name * fix merge error * Pass annotations to connection pooler deployment * update code-gen * Add documentation and update manifests * add e2e test and introduce option in configmap * fix service annotations test * Add unit test * fix e2e tests * better key lookup of annotations tests * add debug message for annotation tests * Fix typos * minor fix for looping * Handle update path and renaming - handle the update path to update sts and connection pooler deployment. This way no need to wait for sync - rename the parameter to downscaler_annotations - handle other review comments * another try to fix python loops * Avoid unneccessary update events * Update manifests * some final polishing * fix cluster_test after polishing Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de> Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-05-04 14:46:56 +02:00
Felix Kunde	d76203b3f9	Bootstrapped databases with best practice role setup (#843 ) * PreparedDatabases with default role setup * merge changes from master * include preparedDatabases spec check when syncing databases * create a default preparedDB if not specified * add more default privileges for schemas * use empty brackets block for undefined objects * cover more default privilege scenarios and always define admin role * add DefaultUsers flag * support extensions and defaultUsers for preparedDatabases * remove exact version in deployment manifest * enable CRD validation for new field * update generated code * reflect code review * fix typo in SQL command * add documentation for preparedDatabases feature + minor changes * some datname should stay * add unit tests * reflect some feedback * init users for preparedDatabases also on update * only change DB default privileges on creation * add one more section in user docs * one more sentence	2020-04-29 10:56:06 +02:00
Sergey Dudoladov	cc635a02e3	Lazy upgrade of the Spilo image (#859 ) * initial implementation * describe forcing the rolling upgrade * make parameter name more descriptive * add missing pieces * address review * address review * fix bug in e2e tests * fix cluster name label in e2e test * raise test timeout * load spilo test image * use available spilo image * delete replica pod for lazy update test * fix e2e * fix e2e with a vengeance * lets wait for another 30m * print pod name in error msg * print pod name in error msg 2 * raise timeout, comment other tests * subsequent updates of config * add comma * fix e2e test * run unit tests before e2e * remove conflicting dependency * Revert "remove conflicting dependency" This reverts commit `65fc09054b`. * improve cdp build * dont run unit before e2e tests * Revert "improve cdp build" This reverts commit `e2a8fa12aa`. Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de> Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-04-29 10:07:14 +02:00
Sergey Dudoladov	0ca30ba3d9	fix params in function call (#939 ) Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>	2020-04-28 09:31:41 +02:00
Björn Fischer	168abfe37b	Fully speced global sidecars (#890 ) * implement fully speced global sidecars * fix issue #924	2020-04-27 17:40:22 +02:00
Christian Rohmann	21b9b6fcbe	Emit K8S events to the postgresql CR as feedback to the requestor / user (#896 ) * Add EventsGetter to KubeClient to enable to sending K8S events * Add eventRecorder to the controller, initialize it and hand it down to cluster via its constructor to enable it to emit events this way * Add first set of events which then go to the postgresql custom resource the user interacts with to provide some feedback * Add right to "create" events to operator cluster role * Adapt cluster tests to new function sigurature with eventRecord (via NewFakeRecorder) * Get a proper reference before sending events to a resource Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>	2020-04-27 08:22:07 +02:00
Sergey Dudoladov	3c91bdeffa	Re-create pods only if all replicas are running (#903 ) * adds a Get call to Patroni interface to fetch state of a Patroni member * postpones re-creating pods if at least one replica is currently being created Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de> Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-04-20 15:14:11 +02:00
ReSearchITEng	5014eebfb2	when kubernetes_use_configmaps -> skip further endpoints actions even delete (#921 ) * further compatibility with k8sUseConfigMaps - skip further endpoints related actions * Update pkg/cluster/cluster.go thanks! Co-Authored-By: Felix Kunde <felix-kunde@gmx.de> * Update pkg/cluster/cluster.go Co-Authored-By: Felix Kunde <felix-kunde@gmx.de> * Update pkg/cluster/cluster.go Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-04-16 16:47:59 +02:00
Dmitry Dolgov	6a689cdc1c	Prevent empty syncs (#922 ) There is a possibility to pass nil as one of the specs and an empty spec into syncConnectionPooler. In this case it will perfom a syncronization because nil != empty struct. Avoid such cases and make it testable by returning list of syncronization reasons on top together with the final error.	2020-04-16 15:14:31 +02:00
ReSearchITEng	7e8f6687eb	make tls pr798 use additionalVolumes capability from pr736 (#920 ) * make tls pr798 use additionalVolumes capability from pr736 * move the volume* sections lower * update helm chart crds and docs * fix user.md typos	2020-04-15 15:24:55 +02:00
Thierry Sallé	ea3eef45d9	Additional volumes capability (#736 ) * Allow additional Volumes to be mounted * added TargetContainers option to determine if additional volume need to be mounter or not * fixed dependencies * updated manifest additional volume example * More validation Check that there are no volume mount path clashes or "all" vs ["a", "b"] mixtures. Also change the default behaviour to mount to "postgres" container. * More documentation / example about additional volumes * Revert go.sum and go.mod from origin/master * Declare addictionalVolume specs in CRDs * fixed k8sres after rebase * resolv conflict Co-authored-by: Dmitrii Dolgov <9erthalion6@gmail.com> Co-authored-by: Thierry <thierry@malt.com>	2020-04-15 09:13:35 +02:00
Dmitry Dolgov	a1f2bd05b9	Prevent superuser from being a connection pool user (#906 ) * Protected and system users can't be a connection pool user It's not supported, neither it's a best practice. Also fix potential null pointer access. For protected users it makes sense by intent of protecting this users (e.g. from being overriden or used as something else than supposed). For system users the reason is the same as for superuser, it's about replicastion user and it's under patroni control. This is implemented on both levels, operator config and postgresql manifest. For the latter we just use default name in this case, assuming that operator config is always correct. For the former, since it's a serious misconfiguration, operator panics.	2020-04-09 09:21:45 +02:00
Leon Albers	4dee8918bd	Allow configuration of patroni's replication mode (#869 ) * Add patroni parameters for `synchronous_mode` * Update complete-postgres-manifest.yaml, removed quotation marks * Update k8sres_test.go, adjust result for `Patroni configured` * Update k8sres_test.go, adjust result for `Patroni configured` * Update complete-postgres-manifest.yaml, set synchronous mode to false in this example * Update pkg/cluster/k8sres.go Does the same but is shorter. So we fix that it if you like. Co-Authored-By: Felix Kunde <felix-kunde@gmx.de> * Update docs/reference/cluster_manifest.md Co-Authored-By: Felix Kunde <felix-kunde@gmx.de> * Add patroni's `synchronous_mode_strict` * Extend `TestGenerateSpiloConfig` with `SynchronousModeStrict` Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-04-06 14:27:17 +02:00
ReSearchITEng	1249626a60	kubernetes_use_configmap (#887 ) * kubernetes_use_configmap * Update manifests/postgresql-operator-default-configuration.yaml Co-Authored-By: Felix Kunde <felix-kunde@gmx.de> * Update manifests/configmap.yaml Co-Authored-By: Felix Kunde <felix-kunde@gmx.de> * Update charts/postgres-operator/values.yaml Co-Authored-By: Felix Kunde <felix-kunde@gmx.de> * go.fmt Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-04-02 13:20:45 +02:00
Felix Kunde	b43b22dfcc	Call me pooler, not pool (#883 ) * rename pooler parts and add example to manifest * update codegen * fix manifest and add more details to docs * reflect renaming also in e2e tests	2020-04-01 10:34:03 +02:00
Felix Kunde	e6eb10d28a	fix TestTLS (#894 )	2020-04-01 10:31:31 +02:00
ReSearchITEng	6ed1030838	TLS - add OpenShift compatibility (#885 ) * solves https://github.com/zalando/postgres-operator/pull/798#issuecomment-605201260 Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-04-01 09:39:54 +02:00
Felix Kunde	66f2cda87f	Move operator to go 1.14 (#882 ) * update go modules march 2020 * update to GO 1.14 * reflect k8s client API changes	2020-03-30 15:50:17 +02:00
Felix Kunde	ba9cf68650	Change type of pod environment config map to NamespacedName (#870 ) * allow PodEnvironmentConfigMap in other namespaces * update codegen * update docs and comments	2020-03-25 15:59:31 +01:00
Dmitry Dolgov	9dfa433363	Connection pooler (#799 ) Connection pooler support Add support for a connection pooler. The idea is to make it generic enough to be able to switch between different implementations (e.g. pgbouncer or odyssey). Operator needs to create a deployment with pooler and a service for it to access. For connection pool to work properly, a database needs to be prepared by operator, namely a separate user have to be created with an access to an installed lookup function (to fetch credential for other users). This setups is supposed to be used only by robot/application users. Usually a connection pool implementation is more CPU bounded, so it makes sense to create several pods for connection pool with more emphasize on cpu resources. At the moment there are no special affinity or tolerations assigned to bring those pods closer to the database. For availability purposes minimal number of connection pool pods is 2, ideally they have to be distributed between different nodes/AZ, but it's not enforced in the operator itself. Available configuration supposed to be ergonomic and in the normal case require minimum changes to a manifest to enable connection pool. To have more control over the configuration and functionality on the pool side one can customize the corresponding docker image. Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-03-25 12:57:26 +01:00
Felix Kunde	579f78864b	pass cluster labels as JSON to Spilo (#877 )	2020-03-25 09:59:54 +01:00
Felix Kunde	b66734a0a9	omit PgVersion diff on sync (#860 ) * use PostgresParam.PgVersion everywhere * on sync compare pgVersion with SpiloConfiguration * update getNewPgVersion and added tests	2020-03-13 11:48:19 +01:00
zimbatm	65fb2ce1a6	add support for custom TLS certificates (#798 ) * add support for custom TLS certificates	2020-03-13 11:44:38 +01:00
Felix Kunde	b997e3682f	be more permissive with standbys (#842 ) * be more permissive with standbys * reflect feedback and updated docs	2020-02-24 15:14:14 +01:00
Felix Kunde	742d7334a1	use cluster-name as default label everywhere (#782 ) * use cluster-name as default label everywhere * fix e2e test	2020-02-19 15:01:01 +01:00
Felix Kunde	702a194c41	switch to rbac/v1 (#829 ) * switch to rbac/v1	2020-02-17 11:25:07 +01:00
Felix Kunde	3b10dc645d	patch/update services on type change (#824 ) * use Update when disabling LoadBalancer + added e2e test	2020-02-13 16:24:15 +01:00
Jonathan Juares Beber	ba60e15d07	Add ServiceAnnotations cluster config (#803 ) The [operator parameters][1] already support the `custom_service_annotations` config.With this parameter is possible to define custom annotations that will be used on the services created by the operator. The `custom_service_annotations` as all the other [operator parameters][1] are defined on the operator level and do not allow customization on the cluster level. A cluster may require different service annotations, as for example, set up different cloud load balancers timeouts, different ingress annotations, and/or enable more customizable environments. This commit introduces a new parameter on the cluster level, called `serviceAnnotations`, responsible for defining custom annotations just for the services created by the operator to the specifically defined cluster. It allows a mix of configuration between `custom_service_annotations` and `serviceAnnotations` where the latest one will have priority. In order to allow custom service annotations to be used on services without LoadBalancers (as for example, service mesh services annotations) both `custom_service_annotations` and `serviceAnnotations` are applied independently of load-balancing configuration. For retro-compatibility purposes, `custom_service_annotations` is still under [Load balancer related options][2]. The two default annotations when using LoadBalancer services, `external-dns.alpha.kubernetes.io/hostname` and `service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout` are still defined by the operator. `service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout` can be overridden by `custom_service_annotations` or `serviceAnnotations`, allowing a more customizable environment. `external-dns.alpha.kubernetes.io/hostname` can not be overridden once there is no differentiation between custom service annotations for replicas and masters. It updates the documentation and creates the necessary unit and e2e tests to the above-described feature too. [1]: https://github.com/zalando/postgres-operator/blob/master/docs/reference/operator_parameters.md [2]: https://github.com/zalando/postgres-operator/blob/master/docs/reference/operator_parameters.md#load-balancer-related-options	2020-02-10 12:03:25 +01:00
Vito Botta	a660d758a5	Add region setting for logical backups to non-AWS storage (#813 ) * Add region setting for logical backups to non-AWS storage	2020-02-10 11:48:24 +01:00
Felix Kunde	1f0312a014	make minimum limits boundaries configurable (#808 ) * make minimum limits boundaries configurable * add e2e test	2020-02-03 11:43:18 +01:00
Felix Kunde	7fb163252c	standby clusters can only have 1 pod for now (#797 )	2020-01-16 10:47:34 +01:00
Felix Kunde	cd110aabf4	Enforce minimum cpu and memory limits (#731 ) * add validation for PG resources and volume size * check resource requests also on UPDATE and SYNC + update docs * if cluster was running don't error on sync	2019-12-12 16:43:55 +01:00
Felix Kunde	107334fe71	Add global option to enable/disable init containers and sidecars (#478 ) * Add global option to enable/disable init containers and sidecars * update dependencies	2019-12-10 15:45:54 +01:00
Armin Nesiren	5f87384d7f	Passing endpoint, access and secret key to logical-backup container (#628 ) * Added possibility to add custom annotations to LoadBalancer service. * Added parameters for custom endpoint, access and secret key for logical backup. * Modified dump.sh so it knows how to handle new features. Configurable S3 SSE	2019-11-26 10:40:49 +01:00
Felix Kunde	2ce602fcd7	fix errors when changing service type (#716 ) * fix errors when changing service type * nullify service and endpoint before recreation * improve wait for delete logic and reuse config parameters	2019-11-26 10:28:32 +01:00
Felix Kunde	f9487e41c1	inject cluster name label into logical backup pod (#725 ) * inject cluster name label into logical backup pod	2019-11-20 13:58:41 +01:00
Felix Kunde	0b544ae43f	pass additionalSecretMount to logical backup pod (#714 )	2019-11-19 18:06:55 +01:00
Thomas Runyon	535517cd1b	Custom annotations 329 (#657 ) * Add ability for custom annotations to database pods	2019-11-11 10:45:35 +01:00
Eric	6e682fd6b5	Fixing spelling mistake in delete PVC function name (#691 )	2019-10-18 16:41:56 +02:00
Felix Kunde	f0e29060b1	move StatefulSet to apps/v1 (#675 )	2019-09-30 16:42:04 +02:00
Felix Kunde	4a863d2280	Avoid orphaned objects on delete (#654 ) * Make setSpec function work correctly when updating cluster status fails	2019-08-27 12:54:35 +02:00
Felix Kunde	1d45a6aec3	change app label for logical backup pod (#621 ) * change app label for logical backup pod	2019-07-23 15:43:07 +02:00
Felix Kunde	2c3c7fd244	query namespaced K8s API in logical backup script (#623 )	2019-07-18 14:00:30 +02:00
Felix Kunde	3a914f9a3c	camelCasing all manifest parameters (#602 ) * deprecate snake_case manifest parameters * move backward compatible check and update test	2019-07-05 18:14:03 +02:00
Felix Kunde	36003b8264	enable shmVolume setting in OperatorConfiguration (#605 ) * enable shmVolume setting in OperatorConfiguration	2019-07-05 16:48:37 +02:00
Rafia Sabih	540d58d5bd	Adding the support for standby cluster This will set up a continuous wal streaming cluster, by adding the corresponding section in postgres manifest. Instead of having a full-fledged standby cluster as in Patroni, here we use only the wal path of the source cluster and stream from there. Since, standby cluster is streaming from the master and does not require to create or use databases of it's own. Hence, it bypasses the creation of users or databases. There is a separate sample manifest added to set up a standby-cluster.	2019-06-21 10:11:39 +02:00
Markus	93bfed3e75	Add secret mount to operator (#535 ) * add secret mount to operator	2019-06-19 12:40:49 +02:00
Felix Kunde	6918394562	Add PDB configuration toggle (#583 ) * Don't create an impossible disruption budget for smaller clusters. * sync PDB also on update	2019-06-18 10:48:21 +02:00
Maxim Ivanov	3553144cda	Support subPath in generated container (#452 ) * mounted volumes now provide a subPath	2019-06-17 15:49:01 +02:00
Erik Inge Bolsø	c65a9baedf	specify ReadOnlyRootFilesystem: false for pod security policies (#560 ) Explicitly specify ReadOnlyRootFilesystem: false so kubernetes can pick a less restrictive policy the operator has access to.	2019-06-17 14:03:33 +02:00
teuto.net Netzdienste GmbH	bbf28c4df7	Add additional S3 settings for cloning (#497 )	2019-06-14 12:28:00 +02:00
Rafia Sabih	2886027516	Some typos/spelling mistakes fix (#580 ) Harmless typos fix.	2019-06-06 14:20:15 +02:00
Aaron Miller	ec5b1d4d58	StatefulSet fsGroup config option to allow non-root spilo (#531 ) * StatefulSet fsGroup config option to allow non-root spilo * Allow Postgres CRD to overide SpiloFSGroup of the Operator. * Document FSGroup of a Pod cannot be changed after creation.	2019-06-04 16:38:26 +02:00
Erik Inge Bolsø	ebda39368e	database.go: remove hardcoded .svc.cluster.local dns suffix (#561 ) * database.go: substitute hardcoded .svc.cluster.local dns suffix with config parameter Use the pod's configured dns search path, for clusters where .svc.cluster.local is not correct.	2019-05-31 16:32:00 +02:00
Felix Kunde	24d412a562	generate spilo config can return error (with test) (#570 ) * fix: raise explicit error when failing to generate spilo config Signed-off-by: Stephane Tang <hi@stang.sh>	2019-05-22 17:35:03 +02:00
Stephane T	1f4267eb05	fix: remove headless service config when deleting cluster (#567 ) see: https://github.com/zalando/postgres-operator/issues/566 Signed-off-by: Stephane Tang <hi@stang.sh>	2019-05-21 13:49:34 +02:00
Sergey Dudoladov	f3e1e80aaf	Add logical backup (#442 ) * Add k8s cron job to spawn logical backups * Minor doc updates	2019-05-16 15:52:01 +02:00
Sergey Dudoladov	2c02b371e2	fix statefulset sync (#563 )	2019-05-14 11:15:47 +02:00
Dmitry Dolgov	f29bdaf96a	Override clone s3 bucket path (#487 ) Override clone s3 bucket path Add possibility to use a custom s3 bucket path for cloning a cluster from an arbitrary bucket (e.g. from another k8s cluster). For that a new config options is introduced `s3_wal_path`, that should point to a location that spilo would understand.	2019-05-10 12:52:42 +02:00
Felix Kunde	0fbfbb23bb	Use /status subresource instead of plain manifest field (#534 ) * turns PostgresStatus type into a struct with field PostgresClusterStatus * setStatus patch target is now /status subresource * unmarshalling PostgresStatus takes care of previous status field convention * new simple bool functions status.Running(), status.Creating()	2019-05-07 12:01:45 +02:00
Aaron Miller	15ec6a920d	Config option to allow Spilo container to run non-privileged. (#525 ) * Config option to allow Spilo container to run non-privileged. Runs non-privileged by default. Fixes #395 * add spilo_privileged to manifests/configmap.yaml * add spilo_privileged to helm chart's values.yaml	2019-04-03 17:13:39 +02:00
Stephane T	edeb06d39c	fix: update init_containers (#518 ) * fix: PATH expension in Makefile Signed-off-by: Stephane Tang <hi@stang.sh> * refact: pass list of containers to compareContainers() Signed-off-by: Stephane Tang <hi@stang.sh> * compare initContainers while comparing StatefulSet Fixes #517 Signed-off-by: Stephane Tang <hi@stang.sh> * refact: compareContainers() Signed-off-by: Stephane Tang <hi@stang.sh>	2019-03-19 17:46:12 +01:00
Sergey Dudoladov	0b53dbe5dc	Set statefulset update and management policy explicitly (#515 ) * fix logging in retry * explicitly set the stateful set update strategy to onDelete * add podManagementPolicy	2019-03-13 11:49:18 +01:00
Vineeth Reddy	db72d82f14	gofmt and golint fixes (#506 ) * fix gofmt and golint issues	2019-03-04 13:13:55 +01:00
Sergey Dudoladov	587d9091e7	Set HUMAN_ROLE Spilo env var (#409 ) * Set HUMAN_ROLE Spilo env var	2019-02-27 13:40:42 +01:00
Felix Kunde	31e568157b	reflect change in github url (#496 ) Project was moved from the incubator to the Zalando main org, hence the rename	2019-02-25 11:26:55 +01:00
teuto.net Netzdienste GmbH	26a7fdfa9f	Add Pod Anti Affinity (#489 ) * Add Pod Anti Affinity	2019-02-21 16:37:03 +01:00
Stephane T	d11b23bd71	Add inherited_labels (#459 ) * add support for inherited_labels Signed-off-by: Stephane Tang <hi@stang.sh> * update docs with inherited_labels Signed-off-by: Stephane Tang <hi@stang.sh>	2019-02-14 12:29:06 +01:00
Maxim Ivanov	ed6acc1178	Correctly report success in .status on Update (#469 )	2019-01-31 13:09:17 +01:00
Maxim Ivanov	3544cc90fa	Allow specifying init_containers in Postgres CRD (#445 ) * Add support for init_containers	2019-01-29 11:08:44 +01:00
Armin Nesiren	6f6a599c90	Added possibility to add custom annotations to LoadBalancer service. (#461 ) * Added possibility to add custom annotations to LoadBalancer service.	2019-01-25 11:35:27 +01:00
Maxim Ivanov	8330905ce7	Don't panic if Service for the role was not found (#451 )	2019-01-18 13:38:47 +01:00
Jan Mussler	c70905ae8b	Modifying some of the logging to be more descriptive. (#440 ) * Modifying some of the logging to be more descriptive.	2019-01-08 13:07:36 +01:00
zerg-junior	4b5d3cd121	Fix golint failures * Fix golint fails based on the original work from the user u5surf * Skip installing Docker as CDP now have one pre-installed (repairs builds on CDP)	2019-01-08 13:04:48 +01:00
Arve Knudsen	f7058c754d	Pass more variables to Spilo container (#437 ) Pass KUBERNETES_SCOPE_LABEL, KUBERNETES_ROLE_LABEL and KUBERNETES_LABELS to spilo container, so that they could be changed. Fix for #411	2019-01-04 13:42:52 +01:00
zerg-junior	5cfcc453a9	Update CRD configuration docs and fix the CDP build (#414 ) * Update CRD configuration docs * document resource consumption of the operator * Add talks by Oleksii	2019-01-02 12:01:47 +01:00
zerg-junior	c0b0b9a832	[WIP] Add 'admin' option to create role (#425 ) * Add 'admin' option to create role * Fix run_locally_script	2018-12-27 10:14:33 +01:00
Dmitry Dolgov	d6e6b00770	Add shm_volume option (#427 ) Add possibility to mount a tmpfs volume to /dev/shm to avoid issues like [this](https://github.com/docker-library/postgres/issues/416). To achieve that two new options were introduced: * `enableShmVolume` to PostgreSQL manifest, to specify whether or not mount this volume per database cluster * `enable_shm_volume` to operator configuration, to specify whether or not mount per operator. The first one, `enableShmVolume` takes precedence to allow us to be more flexible.	2018-12-21 16:22:30 +01:00
zerg-junior	45c89b3da4	[WIP] Add set_memory_request_to_limit option (#406 ) * Add set_memory_request_to_limit option	2018-11-15 14:00:08 +01:00
zerg-junior	96e3ea9511	Properly overwrite empty allowed source ranges for load balancers (#392 ) * Properly overwrite empty allowed source ranges for load balancers	2018-11-06 11:08:45 +01:00
zerg-junior	86ba92ad02	Rename 'permanent_slots' field to 'slots' (#401 )	2018-10-31 16:11:28 +01:00
zerg-junior	1b4181a724	[WIP] Add the ability to configure replications slots in Patroni (#398 ) * Add the ability to configure replication slots in Patroni * Add debugging to Makefile for CDP builds	2018-10-31 13:10:56 +01:00
zerg-junior	7907f95d2f	Improve reporting about rolling updates (#391 )	2018-09-24 11:57:43 +02:00
Noah Kantrowitz	688d252752	Some tweaks to ensure compat with newer Go. (#383 )	2018-09-17 10:13:07 +02:00
Noah Kantrowitz	0b75a89920	Fix the casing of github.com/Sirupsen/logrus to match what the project itself uses. (#380 ) Dep enforces this.	2018-09-06 10:26:48 +02:00
zerg-junior	25fa45fd58	[WIP] Grant 'superuser' to the members of Postgres admin teams (#371 ) Added support for superuser team in addition to the admin team that owns the postgres cluster.	2018-08-30 10:51:37 +02:00
zerg-junior	aeae0a6ef2	Use cluster's own namespace to patch the cluster manifest (#373 )	2018-08-22 11:07:12 +02:00
Oleksii Kliukin	e1ed4b847d	Use code-generation for CRD API and deepcopy methods (#369 ) Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on. Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API. All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1 Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator. Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it. Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes. Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files. Update client-go dependencies. Minor reformatting and renaming.	2018-08-15 17:22:25 +02:00
Oleksii Kliukin	e933908084	Configure pg_hba in the local postgresql configuration of Patroni. (#361 ) Previously, the operator put pg_hba into the bootstrap/pg_hba key of Patroni. That had 2 adverse effects: - pg_hba.conf was shadowed by Spilo default section in the local postgresql configuration - when updating pg_hba in the cluster manifest, the updated lines were not propagated to DCS, since the key was defined in the boostrap section of Patroni. Include some minor refactoring, moving methods to unexported when possible and commenting out usage of md5, so that gosec won't complain. Per https://github.com/zalando-incubator/postgres-operator/issues/330 Review by @zerg-junior	2018-08-08 11:01:26 +02:00
Oleksii Kliukin	acf46bfa62	Include CREATEROLE to the list of allowed flags. (#365 ) Previously it has been supported by the operator, but the validity check excluded it for no reason.	2018-08-08 10:53:08 +02:00
Oleksii Kliukin	b06186eb41	Linter-induced code refactoring, run round 2. (#360 ) Run more linters in the gometalinter, i.e. deadcode, megacheck, nakedret, dup. More consistent code formatting, remove two dead functions, eliminate naked a bunch of naked returns, refactor a few functions to avoid code duplication.	2018-08-06 12:09:19 +02:00
Oleksii Kliukin	59f0c5551e	Allow configuring pod priority globally and per cluster. (#353 ) * Allow configuring pod priority globally and per cluster. Allow to specify pod priority class for all pods managed by the operator, as well as for those belonging to individual clusters. Controlled by the pod_priority_class_name operator configuration parameter and the podPriorityClassName manifest option. See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass for the explanation on how to define priority classes since Kubernetes 1.8. Some import order changes are due to go fmt. Removal of OrphanDependents deprecated field. Code review by @zerg-junior	2018-08-03 14:03:37 +02:00
Oleksii Kliukin	ac7b132314	Refactoring inspired by gometalinter. (#357 ) Among other things, fix a few issues with deepcopy implementation.	2018-08-03 11:09:45 +02:00
Oleksii Kliukin	d2d3f21dc2	Client go upgrade v6 (#352 ) There are shortcuts in this code, i.e. we created the deepcopy function by using the deepcopy package instead of the generated code, that will be addressed once migrated to client-go v8. Also, some objects, particularly statefulsets, are still taken from v1beta, this will also be addressed in further commits once the changes are stabilized.	2018-08-01 11:08:01 +02:00
Oleksii Kliukin	0181a1b5b1	Introduce a repair scan to fix failing clusters (#304 ) A repair is a sync scan that acts only on those clusters that indicate that the last add, update or sync operation on them has failed. It is supposed to kick in more frequently than the repair scan. The repair scan still remains to be useful to fix the consequences of external actions (i.e. someone deletes a postgres-related service by mistake) unbeknownst to the operator. The repair scan is controlled by the new repair_period parameter in the operator configuration. It has to be at least 2 times more frequent than a sync scan to have any effect (a normal sync scan will update both last synced and last repaired attributes of the controller, since repair is just a sync underneath). A repair scan could be queued for a cluster that is already being synced if the sync period exceeds the interval between repairs. In that case a repair event will be discarded once the corresponding worker finds out that the cluster is not failing anymore. Review by @zerg-junior	2018-07-24 11:21:45 +02:00
Oleksii Kliukin	1a0e5357dc	Improve generation of Scalyr container environment. (#346 ) * Improve generting of Scalyr container environment. Avoid duplicating POD_NAME and POD_NAMESPACE that already bundled every sidecar. Do not complain on the lack of SCLALYR_SERVER_HOST, since it is set to https://upload.eu.scalyr.com in the container we use. Do not mentioned SCALYR_SERVER_HOST in the error messages, since it is derived from the cluster name automatically.	2018-07-24 11:16:24 +02:00
Oleksii Kliukin	12871aad1a	Avoid showing an extra error when resizing volume fails (#350 ) Do not show 'persistent volumes are not compatible' errors for the volumes that failed to be resized because of the other reasons (i.e. the new size is smaller than the existing one).	2018-07-20 14:12:25 +02:00
zerg-junior	417f13c0bd	Submit RBAC credentials during initial Event processing (#344 ) * During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.	2018-07-19 16:40:40 +02:00
Oleksii Kliukin	3a9378d3b8	Allow configuring the operator via the YAML manifest. (#326 ) * Up until now, the operator read its own configuration from the configmap. That has a number of limitations, i.e. when the configuration value is not a scalar, but a map or a list. We use a custom code based on github.com/kelseyhightower/envconfig to decode non-scalar values out of plain text keys, but that breaks when the data inside the keys contains both YAML-special elememtns (i.e. commas) and complex quotes, one good example for that is search_path inside `team_api_role_configuration`. In addition, reliance on the configmap forced a flag structure on the configuration, making it hard to write and to read (see https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778). The changes allow to supply the operator configuration in a proper YAML file. That required registering a custom CRD to support the operator configuration and provide an example at manifests/postgresql-operator-default-configuration.yaml. At the moment, both old configmap and the new CRD configuration is supported, so no compatibility issues, however, in the future I'd like to deprecate the configmap-based configuration altogether. Contrary to the configmap-based configuration, the CRD one doesn't embed defaults into the operator code, however, one can use the manifests/postgresql-operator-default-configuration.yaml as a starting point in order to build a custom configuration. Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters used to create the CRD were taken from the operator configuration, which is not possible if the configuration itself is stored in the CRD object, I've added the ability to specify them as environment variables `CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively. Per review by @zerg-junior and @Jan-M.	2018-07-16 16:20:46 +02:00
Oleksii Kliukin	e90a01050c	Switchover must wait for the inner goroutine before it returns. (#343 ) * Switchover must wait for the inner goroutine before it returns. Otherwise, two corner cases may happen: - waitForPodLabel writes to the podLabelErr channel that has been already closed by the outer routine - the outer routine exists and the caller subscribes to the pod the inner goroutine has already subscribed to, resulting in panic. The previous commit `fe47f9ebea` that touched that code added the cancellation channel, but didn't bother to actually wait for the goroutine to be cancelled. Per report and review from @valer-cara. Original issue: https://github.com/zalando-incubator/postgres-operator/issues/342	2018-07-16 11:50:35 +02:00
Oleksii Kliukin	b7b950eb28	Use the StorageClassName field of the volumeClaimTemplate. (#338 ) The old way of specifying it with the annotation is deprecated and not available in recent Kubernetes versions. We will keep it there anyway until upgrading to the new go-client that is incompatible with those versions. Per report from @schmitch	2018-07-16 11:49:58 +02:00
Oleksii Kliukin	25a306244f	Support for per-cluster and operator global sidecars (#331 ) * Define sidecars in the operator configuration. Right now only the name and the docker image can be defined, but with the help of the pod_environment_configmap parameter arbitrary environment variables can be passed to the sidecars. * Refactoring around generatePodTemplate. Original implementation of per-cluster sidecars by @theRealWardo Per review by @zerg-junior and @Jan-M	2018-07-02 16:25:27 +02:00
zerg-junior	7394c15d0a	Make AWS region configurable in the operator cofig map (#333 )	2018-06-27 17:29:02 +02:00
Oleksii Kliukin	04b660519a	Fix exec into pods to resize volumes for multi-container pods. The original code assumed only one container per pod.	2018-06-04 14:51:39 +02:00
Oleksii Kliukin	48a5744314	Use Patroni API to set bootstrap-only options. (#299 ) Call Patroni API /config in order to set special options that are ignored when set in the configuration file, such as max_connections. Per https://github.com/zalando-incubator/postgres-operator/issues/297 * Some minor refacoring: Rename Cluster ManualFailover to Swithover Rename Patroni Failover to Switchover Add more details to error messages and comments introduced in this PR. Review by @zerg-junior	2018-05-29 12:35:25 +02:00
Oleksii Kliukin	76ea754fc3	Be lenient when asked to shrink a persisten volume. Do not hard error, emit a warning instead. The cluster is not going to be broken because of our refusal to shrink a volume.	2018-05-24 11:17:42 +02:00
Oleksii Kliukin	1ea8b3bbe6	Fix a crash on node migration. After an unsuccessful initial cluster sync it may happen that the cluster statefulset is empty. This has been made more likely since `88d6a7be3`, since it has introduced syncing volumes before statefulsets, and the volume sync mail fail for different reasons (i.e. the volume has been shrinked, or too many calls to Amazon).	2018-05-24 11:05:19 +02:00
Oleksii Kliukin	e84ecb1d03	Address code review by @zerg-junior	2018-05-23 11:36:38 +02:00
Oleksii Kliukin	f5550c337b	Put special patroni parameters to the bootstrap. Some special patroni postgresql parameters, like max_connections, should reside in the bootstrap.dcs.postgresql.parameters section to come into effect.	2018-05-22 18:27:12 +02:00
zerg-junior	e6d12b3480	Merge pull request #295 from zalando-incubator/continue_on_delete_errors Avoid terminating delete on errors.	2018-05-22 10:44:43 +02:00
Oleksii Kliukin	27c7245fed	Avoid terminating delete on errors. When there is an error happening upon deletion of the Kubernetes object belonging to the cluster being removed, it makes no sense to abort the deletion: the manifest will be removed anyway, therefore all the objects after the one we aborted at will stay forever.	2018-05-18 18:10:37 +02:00
Oleksii Kliukin	a8fdd3f2db	Fix crash during sync. Do not use statefulset number of pods to figure out running ones for volume resizing, since the statefulset pointer could be nil. Instead, look at the actual running pods.	2018-05-18 14:42:20 +02:00
Oleksii Kliukin	88d6a7be3f	Sync persistent volumes before statefulsets. (#293 ) Avoid the condition of waiting for the pod that cannot start PostgreSQL because it ran out of disk space.	2018-05-18 12:01:43 +02:00
Oleksii Kliukin	52ddcd25cc	Sync persistent volumes before statefulsets. Avoid the condition of waiting for the pod that cannot start PostgreSQL because it ran out of disk space.	2018-05-18 11:43:45 +02:00
Oleksii Kliukin	cf800aef90	Minor import fix	2018-05-15 16:53:12 +02:00
Oleksii Kliukin	11d568bf65	Address code review by @zerg-junior - new info messages, rename the annotation flag.	2018-05-15 16:50:03 +02:00
Oleksii Kliukin	0c616a802f	Merge branch 'master' into rolling_updates_with_statefulset_annotations # Conflicts: # pkg/cluster/k8sres.go	2018-05-15 15:33:34 +02:00
Oleksii Kliukin	987b43456b	Deprecate old LB options, fix endpoint sync. (#287 ) * Depreate old LB options, fix endpoint sync. - deprecate useLoadBalancer, replicaLoadBalancer from the manifest and enable_load_balancer from the operator configuration. The old operator configuration options become no-op with this commit. For the old manifest options, `useLoadBalancer` and `replicaLoadBalancer` are still consulted, but only in the absense of the new ones (enableMasterLoadBalancer and enableReplicaLoadBalancer). - Make sure the endpoint being created during the sync receives proper addresses subset. This is more critical for the replicas, as for the masters Patroni will normally re-create the endpoint before the operator. - Avoid creating the replica endpoint, since it will be created automatically by the corresponding service. - Update the README and unit tests. Code review by @mgomezch and @zerg-junior	2018-05-15 15:19:18 +02:00
Oleksii Kliukin	332dab5237	Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations	2018-05-08 14:51:10 +02:00
Oleksii Kliukin	f41a42f922	Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations	2018-05-07 10:16:30 +02:00
Oleksii Kliukin	ce0d4af91c	Initial implementation for the statefulset annotations indicating rolling updates.	2018-05-07 08:07:37 +02:00
Oleksii Kliukin	1a20362c5b	Initial implementation for the statefulset annotations indicating rolling updates.	2018-05-04 18:59:23 +02:00
Oleksii Kliukin	43a1db2128	Merge branch 'master' into pending_rolling_updates	2018-05-03 11:27:16 +02:00
Oleksii Kliukin	fe47f9ebea	Improve the pod moving behavior during the Kubernetes cluster upgrade. (#281 ) * Improve the pod moving behavior during the Kubernetes cluster upgrade. Fix an issue of not waiting for at least one replica to become ready (if the Statefulset indicates there are replicas) when moving the master pod off the decomissioned node. Resolves the first part of #279. Small fixes to error messages. * Eliminate a race condition during the swithover. When the operator initiates the failover (switchover) that fails and then retries it for a second time it may happen that the previous waitForPodChannel is still active. As a result, the operator subscribes to the former master pod two times, causing a panic. The problem was that the original code didn't bother to cancel the waitForPodLalbel for the new master pod in the case when the failover fails. This commit fixes it by adding a stop channel to that function. Code review by @zerg-junior	2018-05-03 10:20:24 +02:00
Sergey Dudoladov	59ded0c212	Shorten bucket name	2018-05-02 14:05:57 +02:00
Sergey Dudoladov	c45219bafa	Set up an S3 bucket for the postgres daily logs	2018-05-02 12:52:42 +02:00
Oleksii Kliukin	37caa3f60b	Fix a bug with syncing services Avoid showing "there is no service in the cluster" when syncing a service for the cluster if the operator has been restarted after the cluster had been created.	2018-04-27 12:35:25 +02:00
zerg-junior	8f08bef67c	Merge pull request #277 from zalando-incubator/automatically-deploy-service-account Deploy service account for pod creation on demand	2018-04-26 14:44:37 +02:00
Sergey Dudoladov	1b718fd4c2	Minor improvemets in reporting service account creation	2018-04-26 13:47:25 +02:00
Sergey Dudoladov	d99b553ec1	Convert default account definiton into JSON	2018-04-25 12:35:16 +02:00
Sergey Dudoladov	485ec4b8ea	Move service account to Controller	2018-04-24 15:13:08 +02:00
Sergey Dudoladov	bc8b950da4	Tolerate issues of the Teams API	2018-04-23 16:31:53 +02:00
Sergey Dudoladov	c31c76281c	Make operator unaware of its own service account	2018-04-23 14:38:20 +02:00
Sergey Dudoladov	5daf0a4172	Fix error reporting during pod service account creation	2018-04-20 14:20:38 +02:00
Sergey Dudoladov	bd51d2922b	Turn ServiceAccount into struct value to avoid race conditon during account creation	2018-04-20 13:05:05 +02:00
Sergey Dudoladov	23f893647c	Remove sync of pod service accounts	2018-04-19 15:48:58 +02:00
Sergey Dudoladov	214ae04aa7	Deploy service account for pod creation on demand	2018-04-18 16:20:20 +02:00
Oleksii Kliukin	0618723a61	Check rolling updates using controller revisions. Compare pods controller revisions with the one for the statefulset to determine whether the pod is running the latest revision and, therefore, no rolling update is necessary. This is performed only during the operator start, afterwards the rolling update status that is stored locally in the cluster structure is used for all rolling update decisions.	2018-04-09 18:07:24 +02:00
Manuel Gómez	88c68712b6	Fix statefulset label selector diffing (#273 ) Otherwise, rolling updates are done unnecessarily.	2018-04-06 17:21:57 +02:00
Oleksii Kliukin	9bf80afa6b	Remove team from statefulset selector (#271 ) * Remove 'team' label from the statefulset selector. I was never supposed to be there, but implicitely statefulset creates a selector out of meta.labels field. That is the problem with recent Kubernetes, since statefulset cannot pick up pods with non-matching label selectors, and we rely on statefulset picking up old pods after statefulset replacement. Make sure selector changes trigger replacement of the statefulset. In the case new selector has more labels than the old one nothing should be done with a statefulset, otherwise the new statefulset won't see orphaned pods from the old one, as they won't match the selector. See https://github.com/kubernetes/kubernetes/issues/46901#issuecomment-356418393	2018-04-06 13:58:47 +02:00
Oleksii Kliukin	26db91c53e	Improve infrastructure role definitions (#208 ) Enhance definitions of infrastructure roles by allowing membership in multiple roles, role options and per-role configuration to be specified in the infrastructure role configmap, which must have the same name as the infrastructure role secret. See manifests/infrastructure-roles-configmap.yaml for the examples and updated README for the description of different types of database roles supposed by the operator and their purposes. Change the logic of merging infrastructure roles with the manifest roles when they have the same name, to return the infrastructure role unchanged instead of merging. Previously, we used to propagate flags from the manifest role to the resulting infrastructure one, as there were no way to define flags for the infrastructure role; however, this is not the case anymore. Code review and tests by @erthalion	2018-04-04 17:21:36 +02:00
zerg-junior	d264be9faa	Merge pull request #261 from zalando-incubator/wal_bucket_scope_prefix Fix clone for origins in non-default namespaces.	2018-04-03 17:47:18 +02:00
zerg-junior	ff5793b584	Merge pull request #258 from zalando-incubator/always-create-replica-service [WIP] Always create replica service	2018-03-29 14:42:26 +02:00
erthalion	8967a3be2c	Add tests for load balancer function logic	2018-03-27 12:11:46 +02:00
Sergey Dudoladov	ced770a827	Respond to code review	2018-03-26 11:07:32 +02:00
Sergey Dudoladov	a8862aeee1	Enable backward compatibility for enable_load_balancer setting from operator configmap	2018-03-19 17:19:50 +01:00
Sergey Dudoladov	386d7b6bdb	Implement backward compatibility with older load balancer settings	2018-03-16 13:27:38 +01:00
Sergey Dudoladov	20f30d3739	Update the method for deciding about load balancers	2018-03-14 12:46:58 +01:00
Sergey Dudoladov	0986e56226	Add separate params for master and replica load balancers to operator configuration	2018-03-14 12:12:28 +01:00
Sergey Dudoladov	ac6c5bcf09	Explicitly name replica and master load balancer params in PostgresSpec	2018-03-14 12:03:27 +01:00
Sergey Dudoladov	5bc5e70c81	Log if replica service has no load balancer	2018-03-12 16:48:44 +01:00
Sergey Dudoladov	5ff562a607	Minor improvements	2018-03-02 14:03:41 +01:00
Sergey Dudoladov	2aeff096f7	Make ReplicaLoadBalancer a separate toggler	2018-03-02 13:35:25 +01:00
Oleksii Kliukin	59a214727c	Fix clone for origins in non-default namespaces. By default, spilo sets WAL_BUCKET_SCOPE_PREFIX depending on the cluster namespace, possibly to a non-empty string. However, we won't be able to clone those clusters, as the clone prefix is always set to an empty string. We could go the other way around and set both WAL_BUCKET_SCOPE_PREFIX and CLONE_WAL_BUCKET_SCOPE_PREFIX to a non-default value that depends on the cluster's namespace, but it seems that we don't need this feature for now (no conflict will occur even for clusters with the same name and different namespaces because of the SCOPE_SUFFIX) and it requires some additional testing first.	2018-03-01 12:26:09 +01:00
Sergey Dudoladov	35104cb72b	Add CLONE_ prefix to the env var	2018-03-01 11:19:15 +01:00
Sergey Dudoladov	bcb8caeddf	Set WAL_BUCKET_SCOPE_PREFIX to the empty string	2018-03-01 11:16:47 +01:00
Sergey Dudoladov	fb21246fcd	Remove early stopping conditions that rely on the relica service being absent	2018-02-27 17:21:51 +01:00
Sergey Dudoladov	28fed26845	Do not delete an endpoint for the replica service w/o load balancer during sync	2018-02-27 17:18:30 +01:00
Sergey Dudoladov	b107d781e8	Do not delete replica service w/o load balancer during sync	2018-02-27 17:16:00 +01:00
Sergey Dudoladov	2ef069ee93	Create/delete replica service regardless of load balancer setup	2018-02-27 17:10:49 +01:00
zerg-junior	0f392c2007	Merge pull request #252 from zalando-incubator/label-teams Add 'team' label to pods, stateful sets, secrets and pod disruption budgets	2018-02-26 12:57:26 +01:00
Sergey Dudoladov	071547e5bf	Modify to add extra labels only during resource creation	2018-02-26 11:11:50 +01:00
Oleksii Kliukin	2bb7e98268	update individual role secrets from infrastructure roles (#206 ) * Track origin of roles. * Propagate changes on infrastructure roles to corresponding secrets. When the password in the infrastructure role is updated, re-generate the secret for that role. Previously, the password for an infrastructure role was always fetched from the secret, making any updates to such role a no-op after the corresponding secret had been generated.	2018-02-23 17:24:04 +01:00
Sergey Dudoladov	00dc810544	Add 'team' label to pods, stateful sets, secrets and pod disruption budgets	2018-02-23 14:36:10 +01:00
Dmitrii Dolgov	ef50b147c5	Use list of checks instead of a map	2018-02-23 14:24:33 +01:00
Dmitrii Dolgov	95d86c7600	Move container comparison logic to a separate function	2018-02-23 11:58:37 +01:00
Oleksii Kliukin	c4aab502b3	Remove Patroni leftover objects on cluster deletion. (#244 ) * Remove all endpoints and configmaps from Patroni when Patroni is running with Kubernetes support on cluster deletion.	2018-02-23 09:52:22 +01:00
Dmitry Dolgov	bf4b0f0f33	Merge pull request #240 from zalando-incubator/feature/goreport-improvements Some improvements for golint, ineffassign and misspell	2018-02-22 11:31:08 +01:00
Oleksii Kliukin	cca73e30b7	Make code around recreating pods and creating objects in the database less brittle (#213 ) There used to be a masterLess flag that was supposed to indicate whether the cluster it belongs to runs without the acting master by design. At some point, as we didn't really have support for such clusters, the flag has been misused to indicate there is no master in the cluster. However, that was not done consistently (a cluster without all pods running would never be masterless, even when the master is not among the running pods) and it was based on the wrong assumption that the masterless cluster will remain masterless until the next attempt to change that flag, ignoring the possibility of master coming up or some node doing a successful promotion. Therefore, this PR gets rid of that flag completely. When the cluster is running with 0 instances, there is obviously no master and it makes no sense to create any database objects inside the non-existing master. Therefore, this PR introduces an additional check for that. recreatePods were assuming that the roles of the pods recorded when the function has stared will not change; for instance, terminated replica pods should start as replicas. Revisit that assumption by looking at the actual role of the re-spawned pods; that avoids a failover if some replica has promoted to the master role while being re-spawned. In addition, if the failover from the old master was unsuccessful, we used to stop and leave the old master running on an old pod, without recording this fact anywhere. This PR makes the failover failure emit a warning, but not stop recreating the last master pod; in the worst case, the running master will be terminated, however, this case is rather unlikely one. As a side effect, make waitForPodLabel return the pod definition it waited for, avoiding extra API calls in recreatePods and movePodFromEndOfLifeNode	2018-02-22 10:42:05 +01:00
zerg-junior	b0549c3c9c	Merge pull request #225 from zalando-incubator/support-many-namespaces Support many namespaces	2018-02-20 17:39:42 +01:00
Oleksii Kliukin	99c090899f	Change the suffix delimiter to slash. (#242 ) This allows using S3 API in order to simplify finding all folders that are different only by a suffix, since the suffix delimiter will not occur in the suffix itself (currently being a UID).	2018-02-20 16:31:44 +01:00
Oleksii Kliukin	c597377617	Use cluster UID as a suffix to the WAL bucket. (#211 ) Avoid reusing WAL S3 buckets of the older cluster with the same name as the existing one. For the new cluster, the S3 bucket name will include a suffix that is equal to the UID of the PostgreSQL object describing the cluster. That way, the bucket name will stay the same for all members iff they correspond to the same PostgreSQL cluster object. When "clone: uid:" key is present in the cluster manifest and the cluster is cloned from an S3 bucket (currently that happens if the endTimestamp is present in the clone description) the S3 bucket to clone from is suffixed with the -uid value.	2018-02-20 15:36:43 +01:00
Dmitrii Dolgov	a7cd859919	Some improvements for golint, ineffassign and misspell	2018-02-19 17:46:31 +01:00
Sergey Dudoladov	f194a2ae5a	Introduce changes from the PR #200 by @alexeyklyukin	2018-02-07 14:02:32 +01:00
Sergey Dudoladov	ea84f9d577	Rename the configmap 'namespace' entry to avoid confusion with the map's owm namespace	2018-02-06 15:09:00 +01:00
Oleksii Kliukin	b90a36c909	Set node_readiness_label default to an empty value. (#204 ) Previously, it was set to the lifecycle-status:ready, breaking a lot of minikube deployments. Also it was not possible befor to run with this label set to an empty value. Document the effect of the label in the new section of the documentation.	2018-01-16 15:43:03 +01:00
Manuel Gómez	bf4406d2a4	Consider container names in Statefulset diffs (#210 ) This includes a comparison on container names being equal in the decision of whether a Statefulset has been updated.	2018-01-16 12:06:11 +01:00
Oleksii Kliukin	23011bdf9a	Migrate only master pods. Migrate single masters. (#199 ) Avoid migrating replica pods, since they will be handled by the node draining anyway (the PDB specifies that only masters are to be kept). Allow migration of the single-pod clusters.	2018-01-09 11:55:11 +01:00
zerg-junior	bb5ce6cbbe	Merge pull request #195 from zalando-incubator/databases-rest-endpoint Add a REST endpoint to list databases in all clusters	2018-01-09 11:53:32 +01:00
Oleksii Kliukin	8e99518eeb	Improve behavior on node decomissionining (#184 ) * Trigger the node migration on the lack of the readiness label. * Examine the node's readiness status on node add. Make sure we don't miss the not ready node, especially when the operator is killed during the migration.	2018-01-04 11:53:15 +01:00
Manuel Gómez	1109cfa7a1	Add PostgreSQL pod namespace Scalyr sidecar environment (#196 ) Another tiny bit of information that could be useful for log filters once we start deploying clusters into separate namespaces.	2017-12-22 17:12:50 +01:00
Oleksii Kliukin	9720ac1f7e	WIP: Hold the proper locks while examining the list of databases. Introduce a new lock called specMu lock to protect the cluster spec. This lock is held on update and sync, and when retrieving the spec in the API code. There is no need to acquire it for cluster creation and deletion: creation assigns the spec to the cluster before linking it to the controller, and deletion just removes the cluster from the list in the controller, both holding the global clustersMu Lock.	2017-12-22 13:06:11 +01:00
Manuel Gómez	cd9bc7bdc5	Add PostgreSQL pod name Scalyr sidecar environment (#194 ) This will allow the Scalyr image to add a custom attribute to shipped log entries that notes the name of the originating pod.	2017-12-21 16:52:27 +01:00
Manuel Gómez	15c278d4e8	Scalyr agent sidecar for log shipping (#190 ) * Scalyr agent sidecar for log shipping * Remove the default for the Scalyr image Now the image needs to be specified explicitly to enable log shipping to Scalyr. This removes the problem of having to generate the config file or publish our agent image repository. * Add configuration variable for Scalyr server URL Defaults to the EU address. * Alter style Newlines are cheap and make code easier to edit/refactor, but ok. * Fix StatefulSet comparison logic I broke it when I made the comparison consider all containers in the PostgreSQL pod.	2017-12-21 15:34:26 +01:00
Oleksii Kliukin	da0de8cff7	Make sure the statefulset that is deleted manually gets re-created. (#191 ) * Make sure the statefulset that is deleted manually gets re-created. Per report and analysis by Manuel Gomez. * Move the existence checks for other objects out of the Create functions. create{Object} for services, endpoints and PDBs refused to continue if there is a cached definition in the cluster, however, the only place where it makes sense is when creating a new cluster. Note that contrary to the statefulset this doesn't fix any issues, since those definitions were nullified correspondingly when the sync code detected there is no object present in the Kubernetes cluster.	2017-12-21 15:20:43 +01:00
zerg-junior	5d5fa680a3	Merge pull request #180 from zalando-incubator/container-name Make pod's single container name static	2017-12-15 16:13:33 +01:00
Oleksii Kliukin	bf80f5225e	Introduce higher and lower bounds for the number of instances (#178 ) * Introduce higher and lower bounds for the number of instances Reduce the number of instances to the min_instances if it is lower and to the max_instances if it is higher. -1 for either of those means there is no lower or upper bound. In addition, terminate the operator when there is a nonsense in the configuration (i.e. max_instances < min_instances). Reviewed by Jan Mußler and Sergey Dudoladov.	2017-12-15 16:02:50 +01:00
Sergey Dudoladov	52e358ba8f	Make pod's single container name static	2017-12-15 15:53:53 +01:00
Oleksii Kliukin	0e255f82c6	Provide more information about variable conflicts. They are mentioned in the documentation and the operator will emit a warning each time the variable from the pod environment configmap is ignored because the same variable is defined by the operator. Some minor changes in the variable names to make the code more readable. Per review from Sergey Dudoladov.	2017-12-14 14:39:33 +01:00
Oleksii Kliukin	da4b66210a	Expand variables from the PodEnvironmentConfigMap (#4 ) Inject PodEnvironmentConfigMap variables inline into the statefulset definition in order to be able to figure out changes to the statefulset when only PodEnvironmentConfigMap has changed.	2017-12-14 14:39:33 +01:00
Oleksii Kliukin	1c5451cd7d	Spelling fix.	2017-12-14 14:39:33 +01:00
Oleksii Kliukin	55dc12e512	Examine custom environment sources when syncing. When comparing statefulsets, make sure EnvFrom fields are compared as well.	2017-12-14 14:39:33 +01:00
Georg Kunz	e8d9c75949	Allow custom Postgres pod environment variables	2017-12-14 14:39:33 +01:00
Oleksii Kliukin	87bc47d8d0	Fixes for the case of re-creating the cluster after deletion. - make sure that the secrets for the system users (superuser, replication) are not deleted when the main cluster is. Therefore, we can re-create the cluster, potentially forcing Patroni to restore it from the backup and enable Patroni to connect, since it will use the old password, not the newly generated random one. - when syncing users, always check whether they are already in the DB. Previously, we did this only for the sync cluster case, but the new cluster could be actually the one restored from the backup by Patroni, having all or some of the users already in place. - delete endponts last. Patroni uses the $clustername endpoint in order to store the leader related metadata. If we remove it before removing all pods, one of those pods running Patroni will re-create it and the next attempt to create the cluster with the same name will stuble on the existing endpoint. - Use db.Exec instead of db.Query for queries that expect no result. This also fixes the issue with the DB creation, since we didn't release an empty Row object it was not possible to create more than one database for a cluster.	2017-12-13 16:49:00 +01:00
Oleksii Kliukin	1fb8cf7ea0	Avoid overwriting critical users. (#172 ) * Avoid overwriting critical users. Disallow defining new users either in the cluster manifest, teams API or infrastructure roles with the names mentioned in the new protected_role_names parameter (list of comma-separated names) Additionally, forbid defining a user with the name matching either super_username or replication_username, so that we don't overwrite system roles required for correct working of the operator itself. Also, clear PostgreSQL roles on each sync first in order to avoid using the old definitions that are no longer present in the current manifest, infrastructure roles secret or the teams API.	2017-12-05 14:27:12 +01:00
Oleksii Kliukin	022ce29314	Make an error message more verbose.	2017-12-04 10:49:25 +01:00
Oleksii Kliukin	637921cdee	Tests for initHumanUsers and initinitRobotUsers. Change the Cluster class in the process to implelement Teams API calls and Oauth token fetches as interfaces, so that we can mock them in the tests.	2017-12-04 10:49:25 +01:00
Oleksii Kliukin	611cfe96d6	Fix an issue when not assigning the merge result. Add some tests.	2017-12-04 10:49:25 +01:00
Oleksii Kliukin	831ebb1f32	Fix the error reporting.	2017-12-04 10:49:25 +01:00
Oleksii Kliukin	2e226dee26	Avoid overwriting infrastrure roles. When a role is defined in the infrastructure roles and the cluster manifest use the infrastructure role definition and add flags defined in the manifest. Previously the role has been overwritten by the definition from the manifest. Because a random password is generated for each role from the manifest the applications relying on the infrastructure role credentials from the infrastructure roles secret were unable to connect.	2017-12-04 10:49:25 +01:00
Oleksii Kliukin	dd0affc390	Tweak our reaction to the cluster upgrade process. Previously, the operator started to move the pods off the nodes to be decomissioned by watching the eol_node_label value. Every new postgres pod has been created with the anti-affinity to that label, making sure that the pods being moved won't land on another to be decomissioned node. The changes introduce another label that indicates the ready node. The new pod affinity will esnure that the pod is only scheduled to the node marked as ready, discarding the previous anti-affinity. That way the nodes can transition from the pending-decomission to the other statuses (drained, terminating) without having pods suddently scaled to them. In addition, rename the label that triggers the start of the upgrade process to node_eol_label (for consistency with node_readiness_label) and set its default vvalue to lifecycle-status:pending-decomission.	2017-11-30 14:11:49 +01:00
Oleksii Kliukin	1ffe98ba9f	Fix the connection leak and user options sync. - fix the lack of closing the cursor for the query that returned no rows. - fix syncing of the user options, as previously those were not fetched from the database.	2017-11-27 16:46:34 +01:00
Oleksii Kliukin	975b21f633	Rename api roles configuration parameter. Change api_roles_configuration to team_api_role_configuration	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	2352fc9a39	go fmt run	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	415a7fdc4d	Allow global configuration options for API roles. Add options to the PgUser structure, potentially allowing to set per-role options in the cluster definition as well. Introduce api_roles_configuration operator option with the default of log_statement=all	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	6dcd074ea0	Allow per-cluster setting of a docker image. Add dockerImage cluster configuration parameter that overrides global operator defaults when set to a non-empty value.	2017-11-14 11:53:04 +01:00
Oleksii Kliukin	c25e849fe4	Fix a failure to create new statefulset at sync. Also do a fmt run.	2017-11-08 18:24:17 +01:00
Murat Kabilov	86803406db	use sync methods while updating the cluster	2017-11-03 12:00:43 +01:00
Georg Kunz	47dd766fa7	Add node toleration config to PodSpec (#151 ) * Add node toleration config to PodSpec This allows to taint nodes dedicated to Postgres and prevents other pods from running on these nodes. * Document taint and toleration setup And remove setting from default operator ConfigMap * Allow to overwrite tolerations with Postgres manifest	2017-11-02 19:10:44 +01:00
Oleksii Kliukin	ce960e892a	Create new databases and change owners of existing ones during sync. (#153 ) * Create new databases and change owners of existing ones during sync.	2017-11-02 17:46:33 +01:00
Oleksii Kliukin	7a76be7d3e	Minor fixes around PDB (pod-distruption-budget) syncing: (#147 ) - Call comparison function in the case of the sync as well as for update - Include full cluster name in PDB name - Assign cluster labels to the PDB object	2017-10-23 12:26:59 +02:00
Murat Kabilov	c17aabb642	fix pod disruption budget labels (#146 )	2017-10-20 15:01:51 +02:00
Murat Kabilov	661b141849	Fix Pod Disruption Budget null pointer exception	2017-10-20 11:43:50 +02:00
Murat Kabilov	a1deae198b	add missing master matchLabel for the PDB (#144 )	2017-10-20 11:26:40 +02:00
Oleksii Kliukin	eba23279c8	Kube cluster upgrade	2017-10-19 10:49:42 +02:00
Oleksii Kliukin	1dbf259c76	Retry opening DB connections. (#140 ) Make sure DB connection retry also reopens a connection after closing it	2017-10-18 16:28:00 +02:00
Oleksii Kliukin	99870d8eac	Fix division by zero when connecting to the DB. Apparently the retry function's first parameter is the duration of a single attempt and it cannot be zero.	2017-10-18 10:44:49 +02:00
Murat Kabilov	202f2de988	Retry connecting to pg	2017-10-17 17:03:50 +02:00
Murat Kabilov	6c4cb4e9da	Perform manual failover during the scale down	2017-10-16 17:41:23 +02:00
Murat Kabilov	5b29576a8e	Remove redundant constants	2017-10-16 15:52:48 +02:00
Murat Kabilov	3b32265258	Set status of the cluster on sync fail/success	2017-10-12 15:10:42 +02:00
Jan Mussler	cec695d48e	Superuser toggle for team members Make superuser toggleable for team members. Add and "admin" role to team members if superuser is disabled.	2017-10-12 15:01:54 +02:00
Murat Kabilov	8d5faaa5a5	return idle status when worker has nothing to do	2017-10-11 15:42:20 +02:00
Oleksii Kliukin	793defef72	Fix pod wait timeouts. Previously, a timer had been reset on every message received through the pod channel.	2017-10-11 14:58:37 +02:00
Murat Kabilov	83c8d6c419	Extend diagnostic api with worker status info	2017-10-11 12:26:09 +02:00
Murat Kabilov	71a540ff48	Merge branch 'master' into crd	2017-10-09 11:55:18 +02:00
Murat Kabilov	a35e9c6119	move from tpr to crd	2017-10-06 15:12:08 +02:00
Murat Kabilov	3b8c06416e	skip manual failover for 1-pod clusters	2017-10-05 13:30:15 +03:00
Jan Mussler	c4af0ac6a6	Update cluster.go	2017-10-05 10:58:23 +02:00
Jan M	4a1170855a	Adding '_' to allowed chars.	2017-10-05 10:53:19 +02:00
Murat Kabilov	48ec6b35b9	perform manual failover on pg cluster rolling upgrade	2017-10-04 16:56:47 +03:00
Murat Kabilov	00194d0130	create dbs on cluster create	2017-10-04 16:24:27 +03:00
Murat Kabilov	5cfdabb63e	fix regexp for api endpoint urls	2017-09-28 12:00:40 +02:00
Murat Kabilov	be8bf22c00	add missing return	2017-09-28 11:23:56 +02:00

... 3 4 5 6 7 ...

566 Commits