postgres-operator

Commit Graph

Author	SHA1	Message	Date
Felix Kunde	11c2e815f7	include status subresource in validation (#744 ) * include status subresource in validation	2019-12-02 15:27:47 +01:00
Felix Kunde	a3b34f146f	Add CRD validation (#599 ) * add CRD manifests with validation * update documentation * patroni slots is not an array but a nested hash map * make deps call tools * cover validation in docs and export it in crds.go * add toggle to disable creation of CRD validation and document it * use templated service account also for CRD-configured helm deployment	2019-11-28 12:02:05 +01:00
Armin Nesiren	5f87384d7f	Passing endpoint, access and secret key to logical-backup container (#628 ) * Added possibility to add custom annotations to LoadBalancer service. * Added parameters for custom endpoint, access and secret key for logical backup. * Modified dump.sh so it knows how to handle new features. Configurable S3 SSE	2019-11-26 10:40:49 +01:00
Felix Kunde	2ce602fcd7	fix errors when changing service type (#716 ) * fix errors when changing service type * nullify service and endpoint before recreation * improve wait for delete logic and reuse config parameters	2019-11-26 10:28:32 +01:00
Felix Kunde	f9487e41c1	inject cluster name label into logical backup pod (#725 ) * inject cluster name label into logical backup pod	2019-11-20 13:58:41 +01:00
Felix Kunde	0b544ae43f	pass additionalSecretMount to logical backup pod (#714 )	2019-11-19 18:06:55 +01:00
Thomas Runyon	535517cd1b	Custom annotations 329 (#657 ) * Add ability for custom annotations to database pods	2019-11-11 10:45:35 +01:00
Felix Kunde	340dc4aa3d	Fix operator_configuration_type.go (#703 ) fix typo	2019-10-31 12:19:41 +01:00
Eric	6e682fd6b5	Fixing spelling mistake in delete PVC function name (#691 )	2019-10-18 16:41:56 +02:00
Dmitry Dolgov	647a4d3023	Remove service accounts cache (#685 ) For optimization purposes operator was creating a cache map to remember if service accounts and role binding was deployed to a namespace. This could lead to a problem, when a namespace was deleted, since this cache was not synchronized. For the sake of correctness remove the cache, and check every time if required service account and rbac is present. In the normal case this introduces an overhead of two API calls per an event (one to get a service accounts, one to get a role binding), which should not be a problem, unless proven otherwise.	2019-10-11 11:06:14 +02:00
Dmitry Dolgov	baae1887b3	Replace glide with Go modules (#544 ) * And attempt to build with modules and remove glide * new tools.go file to get code-generator dependency + updated codegen + remove Glide files and update docs	2019-10-02 16:18:55 +02:00
Felix Kunde	f0e29060b1	move StatefulSet to apps/v1 (#675 )	2019-09-30 16:42:04 +02:00
Weilu Jia	e00b37fc17	Handle IPv6 k8s pods in Patroni URLs (#671 ) * Handle IPv6 Patroni URLs	2019-09-30 10:14:27 +02:00
Sergey Dudoladov	cf97ebb2b8	fix e2e tests (#672 ) * fix e2e tests * change Spilo version everywhere	2019-09-23 17:48:53 +02:00
Felix Kunde	4a863d2280	Avoid orphaned objects on delete (#654 ) * Make setSpec function work correctly when updating cluster status fails	2019-08-27 12:54:35 +02:00
Felix Kunde	abdb003f40	additional printer columns for CRDs (#653 ) * additional printer columns for CRDs	2019-08-16 13:22:45 +02:00
Felix Kunde	cd350a4bc1	make run.sh executable from within e2e (#619 )	2019-07-24 15:07:32 +02:00
Felix Kunde	1d45a6aec3	change app label for logical backup pod (#621 ) * change app label for logical backup pod	2019-07-23 15:43:07 +02:00
Felix Kunde	2c3c7fd244	query namespaced K8s API in logical backup script (#623 )	2019-07-18 14:00:30 +02:00
Felix Kunde	7c19cf50db	align config map, operator config, helm chart values and templates (#595 ) * align config map, operator config, helm chart values and templates * follow helm chart conventions also in CRD templates * split up values files and add comments * avoid yaml confusion in postgres manifests * bump spilo version and use example for logical_backup_s3_bucket * add ConfigTarget switch to values	2019-07-08 17:49:25 +02:00
Felix Kunde	3a914f9a3c	camelCasing all manifest parameters (#602 ) * deprecate snake_case manifest parameters * move backward compatible check and update test	2019-07-05 18:14:03 +02:00
Felix Kunde	36003b8264	enable shmVolume setting in OperatorConfiguration (#605 ) * enable shmVolume setting in OperatorConfiguration	2019-07-05 16:48:37 +02:00
Rafia Sabih	540d58d5bd	Adding the support for standby cluster This will set up a continuous wal streaming cluster, by adding the corresponding section in postgres manifest. Instead of having a full-fledged standby cluster as in Patroni, here we use only the wal path of the source cluster and stream from there. Since, standby cluster is streaming from the master and does not require to create or use databases of it's own. Hence, it bypasses the creation of users or databases. There is a separate sample manifest added to set up a standby-cluster.	2019-06-21 10:11:39 +02:00
Markus	93bfed3e75	Add secret mount to operator (#535 ) * add secret mount to operator	2019-06-19 12:40:49 +02:00
Felix Kunde	6918394562	Add PDB configuration toggle (#583 ) * Don't create an impossible disruption budget for smaller clusters. * sync PDB also on update	2019-06-18 10:48:21 +02:00
Maxim Ivanov	3553144cda	Support subPath in generated container (#452 ) * mounted volumes now provide a subPath	2019-06-17 15:49:01 +02:00
Erik Inge Bolsø	c65a9baedf	specify ReadOnlyRootFilesystem: false for pod security policies (#560 ) Explicitly specify ReadOnlyRootFilesystem: false so kubernetes can pick a less restrictive policy the operator has access to.	2019-06-17 14:03:33 +02:00
Maxim Ivanov	44acd7e4db	Not being able to register CRD is not a fatal error (#444 ) Operator proceeds to checking if CRD is present and ready, and if not, only then it is a fatal error.	2019-06-14 16:08:29 +02:00
Erik Inge Bolsø	6fbfee3903	decouple clusterrole name and serviceaccount name (#581 ) Decouple clusterrole name and service account name.	2019-06-14 14:24:23 +02:00
teuto.net Netzdienste GmbH	bbf28c4df7	Add additional S3 settings for cloning (#497 )	2019-06-14 12:28:00 +02:00
Rafia Sabih	2886027516	Some typos/spelling mistakes fix (#580 ) Harmless typos fix.	2019-06-06 14:20:15 +02:00
Aaron Miller	ec5b1d4d58	StatefulSet fsGroup config option to allow non-root spilo (#531 ) * StatefulSet fsGroup config option to allow non-root spilo * Allow Postgres CRD to overide SpiloFSGroup of the Operator. * Document FSGroup of a Pod cannot be changed after creation.	2019-06-04 16:38:26 +02:00
Felix Kunde	5a0e95ac45	Add CRD configuration to Helm chart values.yaml (#559 ) * add templates for CRDs incl. crd-install hooks * support both config styles in values.yaml * fix ServiceAccount naming in values.yaml	2019-06-03 14:48:32 +02:00
Erik Inge Bolsø	ebda39368e	database.go: remove hardcoded .svc.cluster.local dns suffix (#561 ) * database.go: substitute hardcoded .svc.cluster.local dns suffix with config parameter Use the pod's configured dns search path, for clusters where .svc.cluster.local is not correct.	2019-05-31 16:32:00 +02:00
Felix Kunde	24d412a562	generate spilo config can return error (with test) (#570 ) * fix: raise explicit error when failing to generate spilo config Signed-off-by: Stephane Tang <hi@stang.sh>	2019-05-22 17:35:03 +02:00
Stephane T	1f4267eb05	fix: remove headless service config when deleting cluster (#567 ) see: https://github.com/zalando/postgres-operator/issues/566 Signed-off-by: Stephane Tang <hi@stang.sh>	2019-05-21 13:49:34 +02:00
Sergey Dudoladov	f3e1e80aaf	Add logical backup (#442 ) * Add k8s cron job to spawn logical backups * Minor doc updates	2019-05-16 15:52:01 +02:00
Sergey Dudoladov	2c02b371e2	fix statefulset sync (#563 )	2019-05-14 11:15:47 +02:00
Dmitry Dolgov	f29bdaf96a	Override clone s3 bucket path (#487 ) Override clone s3 bucket path Add possibility to use a custom s3 bucket path for cloning a cluster from an arbitrary bucket (e.g. from another k8s cluster). For that a new config options is introduced `s3_wal_path`, that should point to a location that spilo would understand.	2019-05-10 12:52:42 +02:00
Felix Kunde	ad0b250b5b	patch CRD on operator update (#558 ) * patch existing CRD each time there is an operator update	2019-05-09 12:35:15 +02:00
Felix Kunde	0fbfbb23bb	Use /status subresource instead of plain manifest field (#534 ) * turns PostgresStatus type into a struct with field PostgresClusterStatus * setStatus patch target is now /status subresource * unmarshalling PostgresStatus takes care of previous status field convention * new simple bool functions status.Running(), status.Creating()	2019-05-07 12:01:45 +02:00
Sergey Dudoladov	c1d108a832	Fix CRD-based operator configuration (#541 ) * Fix CRD-based operator configuration * add inherited labels, update docker image	2019-04-15 13:52:38 +02:00
Aaron Miller	15ec6a920d	Config option to allow Spilo container to run non-privileged. (#525 ) * Config option to allow Spilo container to run non-privileged. Runs non-privileged by default. Fixes #395 * add spilo_privileged to manifests/configmap.yaml * add spilo_privileged to helm chart's values.yaml	2019-04-03 17:13:39 +02:00
Felix Kunde	313db7d10b	set default name also for RoleBinding and roleRef (#529 )	2019-04-02 17:16:47 +02:00
Stephane T	edeb06d39c	fix: update init_containers (#518 ) * fix: PATH expension in Makefile Signed-off-by: Stephane Tang <hi@stang.sh> * refact: pass list of containers to compareContainers() Signed-off-by: Stephane Tang <hi@stang.sh> * compare initContainers while comparing StatefulSet Fixes #517 Signed-off-by: Stephane Tang <hi@stang.sh> * refact: compareContainers() Signed-off-by: Stephane Tang <hi@stang.sh>	2019-03-19 17:46:12 +01:00
Sergey Dudoladov	0b53dbe5dc	Set statefulset update and management policy explicitly (#515 ) * fix logging in retry * explicitly set the stateful set update strategy to onDelete * add podManagementPolicy	2019-03-13 11:49:18 +01:00
Vineeth Reddy	db72d82f14	gofmt and golint fixes (#506 ) * fix gofmt and golint issues	2019-03-04 13:13:55 +01:00
Sergey Dudoladov	f400539b69	Retry moving master pods (#463 ) * Retry moving master pods * bump up master pod wait timeout	2019-02-28 16:19:27 +01:00
Sergey Dudoladov	587d9091e7	Set HUMAN_ROLE Spilo env var (#409 ) * Set HUMAN_ROLE Spilo env var	2019-02-27 13:40:42 +01:00
Sergey Dudoladov	74cc9a44f8	Post-graduation updates (#495 ) * update generated code * update glide.lock * Verify staleness of generated code during build and before running tests	2019-02-26 12:34:05 +01:00
Felix Kunde	31e568157b	reflect change in github url (#496 ) Project was moved from the incubator to the Zalando main org, hence the rename	2019-02-25 11:26:55 +01:00
teuto.net Netzdienste GmbH	26a7fdfa9f	Add Pod Anti Affinity (#489 ) * Add Pod Anti Affinity	2019-02-21 16:37:03 +01:00
Stephane T	d11b23bd71	Add inherited_labels (#459 ) * add support for inherited_labels Signed-off-by: Stephane Tang <hi@stang.sh> * update docs with inherited_labels Signed-off-by: Stephane Tang <hi@stang.sh>	2019-02-14 12:29:06 +01:00
Rafał Kupka	ba23de3d17	Pass PodEnvironmentConfigMap (#477 )	2019-02-04 12:24:49 +01:00
Maxim Ivanov	ed6acc1178	Correctly report success in .status on Update (#469 )	2019-01-31 13:09:17 +01:00
Maxim Ivanov	3544cc90fa	Allow specifying init_containers in Postgres CRD (#445 ) * Add support for init_containers	2019-01-29 11:08:44 +01:00
Armin Nesiren	6f6a599c90	Added possibility to add custom annotations to LoadBalancer service. (#461 ) * Added possibility to add custom annotations to LoadBalancer service.	2019-01-25 11:35:27 +01:00
Maxim Ivanov	8330905ce7	Don't panic if Service for the role was not found (#451 )	2019-01-18 13:38:47 +01:00
Maxim Ivanov	1109c861fb	Report new Postgres CR error when previously incorrect one is being updated (#449 )	2019-01-18 13:36:44 +01:00
Jan Mussler	c70905ae8b	Modifying some of the logging to be more descriptive. (#440 ) * Modifying some of the logging to be more descriptive.	2019-01-08 13:07:36 +01:00
zerg-junior	4b5d3cd121	Fix golint failures * Fix golint fails based on the original work from the user u5surf * Skip installing Docker as CDP now have one pre-installed (repairs builds on CDP)	2019-01-08 13:04:48 +01:00
Arve Knudsen	f7058c754d	Pass more variables to Spilo container (#437 ) Pass KUBERNETES_SCOPE_LABEL, KUBERNETES_ROLE_LABEL and KUBERNETES_LABELS to spilo container, so that they could be changed. Fix for #411	2019-01-04 13:42:52 +01:00
zerg-junior	5cfcc453a9	Update CRD configuration docs and fix the CDP build (#414 ) * Update CRD configuration docs * document resource consumption of the operator * Add talks by Oleksii	2019-01-02 12:01:47 +01:00
zerg-junior	c0b0b9a832	[WIP] Add 'admin' option to create role (#425 ) * Add 'admin' option to create role * Fix run_locally_script	2018-12-27 10:14:33 +01:00
zerg-junior	26670408c4	Revert "Unify warnings about unmovable pods (#389 )" (#430 ) This reverts commit `4fa09e0dcb`. Reason: the reverted commit bloats the logs	2018-12-21 17:39:34 +01:00
zerg-junior	4fa09e0dcb	Unify warnings about unmovable pods (#389 ) * Unify warnings about unmovable pods * Log conditions that prevent master pod migration	2018-12-21 16:44:31 +01:00
Dmitry Dolgov	d6e6b00770	Add shm_volume option (#427 ) Add possibility to mount a tmpfs volume to /dev/shm to avoid issues like [this](https://github.com/docker-library/postgres/issues/416). To achieve that two new options were introduced: * `enableShmVolume` to PostgreSQL manifest, to specify whether or not mount this volume per database cluster * `enable_shm_volume` to operator configuration, to specify whether or not mount per operator. The first one, `enableShmVolume` takes precedence to allow us to be more flexible.	2018-12-21 16:22:30 +01:00
zerg-junior	45c89b3da4	[WIP] Add set_memory_request_to_limit option (#406 ) * Add set_memory_request_to_limit option	2018-11-15 14:00:08 +01:00
zerg-junior	96e3ea9511	Properly overwrite empty allowed source ranges for load balancers (#392 ) * Properly overwrite empty allowed source ranges for load balancers	2018-11-06 11:08:45 +01:00
zerg-junior	86ba92ad02	Rename 'permanent_slots' field to 'slots' (#401 )	2018-10-31 16:11:28 +01:00
Dmitry Dolgov	78e83308fc	API url regexps (#400 ) * Make url regexp more flexible, to accept identifier with dashes * Add few simple tests * Check also numerics	2018-10-31 14:52:41 +01:00
zerg-junior	1b4181a724	[WIP] Add the ability to configure replications slots in Patroni (#398 ) * Add the ability to configure replication slots in Patroni * Add debugging to Makefile for CDP builds	2018-10-31 13:10:56 +01:00
zerg-junior	7907f95d2f	Improve reporting about rolling updates (#391 )	2018-09-24 11:57:43 +02:00
Noah Kantrowitz	688d252752	Some tweaks to ensure compat with newer Go. (#383 )	2018-09-17 10:13:07 +02:00
Noah Kantrowitz	0b75a89920	Fix the casing of github.com/Sirupsen/logrus to match what the project itself uses. (#380 ) Dep enforces this.	2018-09-06 10:26:48 +02:00
Noah Kantrowitz	a4224f6063	Move CRD definitions into a formal API to allow access from other controllers. (#378 )	2018-08-31 11:20:02 +02:00
zerg-junior	25fa45fd58	[WIP] Grant 'superuser' to the members of Postgres admin teams (#371 ) Added support for superuser team in addition to the admin team that owns the postgres cluster.	2018-08-30 10:51:37 +02:00
zerg-junior	1e53e22773	Improve error reporting for short cluster names (#377 ) * Improve error reporting for short cluster names * Revert to clusterName	2018-08-29 17:08:59 +02:00
zerg-junior	aeae0a6ef2	Use cluster's own namespace to patch the cluster manifest (#373 )	2018-08-22 11:07:12 +02:00
Oleksii Kliukin	e1ed4b847d	Use code-generation for CRD API and deepcopy methods (#369 ) Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on. Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API. All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1 Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator. Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it. Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes. Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files. Update client-go dependencies. Minor reformatting and renaming.	2018-08-15 17:22:25 +02:00
Oleksii Kliukin	e933908084	Configure pg_hba in the local postgresql configuration of Patroni. (#361 ) Previously, the operator put pg_hba into the bootstrap/pg_hba key of Patroni. That had 2 adverse effects: - pg_hba.conf was shadowed by Spilo default section in the local postgresql configuration - when updating pg_hba in the cluster manifest, the updated lines were not propagated to DCS, since the key was defined in the boostrap section of Patroni. Include some minor refactoring, moving methods to unexported when possible and commenting out usage of md5, so that gosec won't complain. Per https://github.com/zalando-incubator/postgres-operator/issues/330 Review by @zerg-junior	2018-08-08 11:01:26 +02:00
Oleksii Kliukin	199aa6508c	Populate list of clusters in the controller at startup. (#364 ) Assign the list of clusters in the controller with the up-to-date list of Postgres manifests on Kubernetes during the startup. Node migration routines launched asynchronously to the cluster processing rely on an up-to-date list of clusters in the controller to detect clusters affected by the migration of the node and lock them when doing migration of master pods. Without the initial list the operator was subject to race conditions like the one described at https://github.com/zalando-incubator/postgres-operator/issues/363 Restructure the code to decouple list cluster function required by the postgresql informer from the one that emits cluster sync events. No extra work is introduced, since cluster sync already runs in a separate goroutine (clusterResync). Introduce explicit initial cluster sync at the end of acquireInitialListOfClusters instead of relying on an implicit one coming from list function of the PostgreSQL informer. Some minor refactoring. Review by @zerg-junior	2018-08-08 11:00:56 +02:00
Oleksii Kliukin	acf46bfa62	Include CREATEROLE to the list of allowed flags. (#365 ) Previously it has been supported by the operator, but the validity check excluded it for no reason.	2018-08-08 10:53:08 +02:00
Oleksii Kliukin	14050588ee	Move to client-go 8. (#362 ) Not much changes, except for one function that has been deprecated. However, unless we find a way to use semantic version comparisons like '^' on a branch name, we would have to update the apimachinery, apiextensions-apiserver and code-generator dependencies manually. Also, slash a linter warning about RoleOriginUnknown being not used.	2018-08-07 12:31:08 +02:00
Oleksii Kliukin	b06186eb41	Linter-induced code refactoring, run round 2. (#360 ) Run more linters in the gometalinter, i.e. deadcode, megacheck, nakedret, dup. More consistent code formatting, remove two dead functions, eliminate naked a bunch of naked returns, refactor a few functions to avoid code duplication.	2018-08-06 12:09:19 +02:00
Oleksii Kliukin	59f0c5551e	Allow configuring pod priority globally and per cluster. (#353 ) * Allow configuring pod priority globally and per cluster. Allow to specify pod priority class for all pods managed by the operator, as well as for those belonging to individual clusters. Controlled by the pod_priority_class_name operator configuration parameter and the podPriorityClassName manifest option. See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass for the explanation on how to define priority classes since Kubernetes 1.8. Some import order changes are due to go fmt. Removal of OrphanDependents deprecated field. Code review by @zerg-junior	2018-08-03 14:03:37 +02:00
Oleksii Kliukin	ac7b132314	Refactoring inspired by gometalinter. (#357 ) Among other things, fix a few issues with deepcopy implementation.	2018-08-03 11:09:45 +02:00
Oleksii Kliukin	d2d3f21dc2	Client go upgrade v6 (#352 ) There are shortcuts in this code, i.e. we created the deepcopy function by using the deepcopy package instead of the generated code, that will be addressed once migrated to client-go v8. Also, some objects, particularly statefulsets, are still taken from v1beta, this will also be addressed in further commits once the changes are stabilized.	2018-08-01 11:08:01 +02:00
Oleksii Kliukin	f27833b5eb	Fix disabling database access and teams API via command-line options. (#351 )	2018-07-27 10:24:05 +02:00
Oleksii Kliukin	0181a1b5b1	Introduce a repair scan to fix failing clusters (#304 ) A repair is a sync scan that acts only on those clusters that indicate that the last add, update or sync operation on them has failed. It is supposed to kick in more frequently than the repair scan. The repair scan still remains to be useful to fix the consequences of external actions (i.e. someone deletes a postgres-related service by mistake) unbeknownst to the operator. The repair scan is controlled by the new repair_period parameter in the operator configuration. It has to be at least 2 times more frequent than a sync scan to have any effect (a normal sync scan will update both last synced and last repaired attributes of the controller, since repair is just a sync underneath). A repair scan could be queued for a cluster that is already being synced if the sync period exceeds the interval between repairs. In that case a repair event will be discarded once the corresponding worker finds out that the cluster is not failing anymore. Review by @zerg-junior	2018-07-24 11:21:45 +02:00
Oleksii Kliukin	1a0e5357dc	Improve generation of Scalyr container environment. (#346 ) * Improve generting of Scalyr container environment. Avoid duplicating POD_NAME and POD_NAMESPACE that already bundled every sidecar. Do not complain on the lack of SCLALYR_SERVER_HOST, since it is set to https://upload.eu.scalyr.com in the container we use. Do not mentioned SCALYR_SERVER_HOST in the error messages, since it is derived from the cluster name automatically.	2018-07-24 11:16:24 +02:00
Oleksii Kliukin	12871aad1a	Avoid showing an extra error when resizing volume fails (#350 ) Do not show 'persistent volumes are not compatible' errors for the volumes that failed to be resized because of the other reasons (i.e. the new size is smaller than the existing one).	2018-07-20 14:12:25 +02:00
zerg-junior	417f13c0bd	Submit RBAC credentials during initial Event processing (#344 ) * During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.	2018-07-19 16:40:40 +02:00
Oleksii Kliukin	3a9378d3b8	Allow configuring the operator via the YAML manifest. (#326 ) * Up until now, the operator read its own configuration from the configmap. That has a number of limitations, i.e. when the configuration value is not a scalar, but a map or a list. We use a custom code based on github.com/kelseyhightower/envconfig to decode non-scalar values out of plain text keys, but that breaks when the data inside the keys contains both YAML-special elememtns (i.e. commas) and complex quotes, one good example for that is search_path inside `team_api_role_configuration`. In addition, reliance on the configmap forced a flag structure on the configuration, making it hard to write and to read (see https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778). The changes allow to supply the operator configuration in a proper YAML file. That required registering a custom CRD to support the operator configuration and provide an example at manifests/postgresql-operator-default-configuration.yaml. At the moment, both old configmap and the new CRD configuration is supported, so no compatibility issues, however, in the future I'd like to deprecate the configmap-based configuration altogether. Contrary to the configmap-based configuration, the CRD one doesn't embed defaults into the operator code, however, one can use the manifests/postgresql-operator-default-configuration.yaml as a starting point in order to build a custom configuration. Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters used to create the CRD were taken from the operator configuration, which is not possible if the configuration itself is stored in the CRD object, I've added the ability to specify them as environment variables `CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively. Per review by @zerg-junior and @Jan-M.	2018-07-16 16:20:46 +02:00
Oleksii Kliukin	e90a01050c	Switchover must wait for the inner goroutine before it returns. (#343 ) * Switchover must wait for the inner goroutine before it returns. Otherwise, two corner cases may happen: - waitForPodLabel writes to the podLabelErr channel that has been already closed by the outer routine - the outer routine exists and the caller subscribes to the pod the inner goroutine has already subscribed to, resulting in panic. The previous commit `fe47f9ebea` that touched that code added the cancellation channel, but didn't bother to actually wait for the goroutine to be cancelled. Per report and review from @valer-cara. Original issue: https://github.com/zalando-incubator/postgres-operator/issues/342	2018-07-16 11:50:35 +02:00
Oleksii Kliukin	b7b950eb28	Use the StorageClassName field of the volumeClaimTemplate. (#338 ) The old way of specifying it with the annotation is deprecated and not available in recent Kubernetes versions. We will keep it there anyway until upgrading to the new go-client that is incompatible with those versions. Per report from @schmitch	2018-07-16 11:49:58 +02:00
Oleksii Kliukin	25a306244f	Support for per-cluster and operator global sidecars (#331 ) * Define sidecars in the operator configuration. Right now only the name and the docker image can be defined, but with the help of the pod_environment_configmap parameter arbitrary environment variables can be passed to the sidecars. * Refactoring around generatePodTemplate. Original implementation of per-cluster sidecars by @theRealWardo Per review by @zerg-junior and @Jan-M	2018-07-02 16:25:27 +02:00
zerg-junior	7394c15d0a	Make AWS region configurable in the operator cofig map (#333 )	2018-06-27 17:29:02 +02:00
Oleksii Kliukin	9cb48e0889	Document operator configuration parameters. (#313 )	2018-06-08 13:21:57 +02:00
Oleksii Kliukin	04b660519a	Fix exec into pods to resize volumes for multi-container pods. The original code assumed only one container per pod.	2018-06-04 14:51:39 +02:00
Oleksii Kliukin	16a710a99a	Avoid possible skipping SYNC events. OB1 bug in the condition deciding whether to sync.	2018-05-31 18:29:15 +02:00
Oleksii Kliukin	48a5744314	Use Patroni API to set bootstrap-only options. (#299 ) Call Patroni API /config in order to set special options that are ignored when set in the configuration file, such as max_connections. Per https://github.com/zalando-incubator/postgres-operator/issues/297 * Some minor refacoring: Rename Cluster ManualFailover to Swithover Rename Patroni Failover to Switchover Add more details to error messages and comments introduced in this PR. Review by @zerg-junior	2018-05-29 12:35:25 +02:00
Sergey Dudoladov	2e041c50e6	Bump up default Spilo image	2018-05-28 16:54:27 +02:00
zerg-junior	1352c4a5e2	Merge pull request #302 from zalando-incubator/fix-etcd-host-default Fix etcd_host default	2018-05-24 17:17:20 +02:00
Manuel Gómez	32a1456a68	Update config.go	2018-05-24 16:58:46 +02:00
Sergey Dudoladov	749d723f55	Shorten the commen	2018-05-24 16:22:13 +02:00
Sergey Dudoladov	9824ddae5e	Fix etcd_host default	2018-05-24 16:05:45 +02:00
Oleksii Kliukin	76ea754fc3	Be lenient when asked to shrink a persisten volume. Do not hard error, emit a warning instead. The cluster is not going to be broken because of our refusal to shrink a volume.	2018-05-24 11:17:42 +02:00
Oleksii Kliukin	1ea8b3bbe6	Fix a crash on node migration. After an unsuccessful initial cluster sync it may happen that the cluster statefulset is empty. This has been made more likely since `88d6a7be3`, since it has introduced syncing volumes before statefulsets, and the volume sync mail fail for different reasons (i.e. the volume has been shrinked, or too many calls to Amazon).	2018-05-24 11:05:19 +02:00
Oleksii Kliukin	e84ecb1d03	Address code review by @zerg-junior	2018-05-23 11:36:38 +02:00
Oleksii Kliukin	f5550c337b	Put special patroni parameters to the bootstrap. Some special patroni postgresql parameters, like max_connections, should reside in the bootstrap.dcs.postgresql.parameters section to come into effect.	2018-05-22 18:27:12 +02:00
zerg-junior	e6d12b3480	Merge pull request #295 from zalando-incubator/continue_on_delete_errors Avoid terminating delete on errors.	2018-05-22 10:44:43 +02:00
Oleksii Kliukin	27c7245fed	Avoid terminating delete on errors. When there is an error happening upon deletion of the Kubernetes object belonging to the cluster being removed, it makes no sense to abort the deletion: the manifest will be removed anyway, therefore all the objects after the one we aborted at will stay forever.	2018-05-18 18:10:37 +02:00
Oleksii Kliukin	a8fdd3f2db	Fix crash during sync. Do not use statefulset number of pods to figure out running ones for volume resizing, since the statefulset pointer could be nil. Instead, look at the actual running pods.	2018-05-18 14:42:20 +02:00
Oleksii Kliukin	88d6a7be3f	Sync persistent volumes before statefulsets. (#293 ) Avoid the condition of waiting for the pod that cannot start PostgreSQL because it ran out of disk space.	2018-05-18 12:01:43 +02:00
Oleksii Kliukin	52ddcd25cc	Sync persistent volumes before statefulsets. Avoid the condition of waiting for the pod that cannot start PostgreSQL because it ran out of disk space.	2018-05-18 11:43:45 +02:00
Oleksii Kliukin	da4cc2705b	Use deepcopy to propagate the spec to clusters. Avoid sharing pointers to the same spec data between the informer and the clusters. The only catch is that the error field is cleared during deepcopy, since it is an interface that may contain private fields that cannot be copied, however, the error is only used when the manifest is parsed and before it is queued, therefore, we never refer to that field in the cluster structure.	2018-05-17 16:05:12 +02:00
Oleksii Kliukin	ebe50abccb	Make sure we never modify informer cached manifest. (#290 ) `987b434` introduced a new function that modifies the cluster spec in memory before the cluster processes it. Unfortunately, the instance being modified appeared to be the one stored internally in the PostgresInformer, resulting in those modifications to be propagated with futher cluster events and producing update loops in some occasions. This commit makes sure we copy the spec before putting it into the clusterEventQueues.	2018-05-16 18:23:31 +02:00
Oleksii Kliukin	cf800aef90	Minor import fix	2018-05-15 16:53:12 +02:00
Oleksii Kliukin	11d568bf65	Address code review by @zerg-junior - new info messages, rename the annotation flag.	2018-05-15 16:50:03 +02:00
Oleksii Kliukin	0c616a802f	Merge branch 'master' into rolling_updates_with_statefulset_annotations # Conflicts: # pkg/cluster/k8sres.go	2018-05-15 15:33:34 +02:00
Oleksii Kliukin	987b43456b	Deprecate old LB options, fix endpoint sync. (#287 ) * Depreate old LB options, fix endpoint sync. - deprecate useLoadBalancer, replicaLoadBalancer from the manifest and enable_load_balancer from the operator configuration. The old operator configuration options become no-op with this commit. For the old manifest options, `useLoadBalancer` and `replicaLoadBalancer` are still consulted, but only in the absense of the new ones (enableMasterLoadBalancer and enableReplicaLoadBalancer). - Make sure the endpoint being created during the sync receives proper addresses subset. This is more critical for the replicas, as for the masters Patroni will normally re-create the endpoint before the operator. - Avoid creating the replica endpoint, since it will be created automatically by the corresponding service. - Update the README and unit tests. Code review by @mgomezch and @zerg-junior	2018-05-15 15:19:18 +02:00
Oleksii Kliukin	332dab5237	Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations	2018-05-08 14:51:10 +02:00
Oleksii Kliukin	f41a42f922	Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations	2018-05-07 10:16:30 +02:00
Oleksii Kliukin	ce0d4af91c	Initial implementation for the statefulset annotations indicating rolling updates.	2018-05-07 08:07:37 +02:00
Oleksii Kliukin	1a20362c5b	Initial implementation for the statefulset annotations indicating rolling updates.	2018-05-04 18:59:23 +02:00
Oleksii Kliukin	43a1db2128	Merge branch 'master' into pending_rolling_updates	2018-05-03 11:27:16 +02:00
Oleksii Kliukin	4c8dfd7e20	Remove the check for the clone cluster name. (#270 ) * Sanity checks for the cluster name, improve tests. - check that the normal and clone cluster name complies with the valid service name. For clone cluster, only do it if clone timestamp is not set; with a clone timestamp set, the clone name points to the S3 bucket - add tests and improve existing ones, making sure we don't call Error() method for an empty error, as well as that we don't miss cases where expected error is not empty, but actual call to be tested does not return an error. Code review by @zerg-junior and @Jan-M	2018-05-03 10:21:37 +02:00
Oleksii Kliukin	fe47f9ebea	Improve the pod moving behavior during the Kubernetes cluster upgrade. (#281 ) * Improve the pod moving behavior during the Kubernetes cluster upgrade. Fix an issue of not waiting for at least one replica to become ready (if the Statefulset indicates there are replicas) when moving the master pod off the decomissioned node. Resolves the first part of #279. Small fixes to error messages. * Eliminate a race condition during the swithover. When the operator initiates the failover (switchover) that fails and then retries it for a second time it may happen that the previous waitForPodChannel is still active. As a result, the operator subscribes to the former master pod two times, causing a panic. The problem was that the original code didn't bother to cancel the waitForPodLalbel for the new master pod in the case when the failover fails. This commit fixes it by adding a stop channel to that function. Code review by @zerg-junior	2018-05-03 10:20:24 +02:00
Sergey Dudoladov	59ded0c212	Shorten bucket name	2018-05-02 14:05:57 +02:00
Sergey Dudoladov	c45219bafa	Set up an S3 bucket for the postgres daily logs	2018-05-02 12:52:42 +02:00
Oleksii Kliukin	37caa3f60b	Fix a bug with syncing services Avoid showing "there is no service in the cluster" when syncing a service for the cluster if the operator has been restarted after the cluster had been created.	2018-04-27 12:35:25 +02:00
zerg-junior	8f08bef67c	Merge pull request #277 from zalando-incubator/automatically-deploy-service-account Deploy service account for pod creation on demand	2018-04-26 14:44:37 +02:00
Sergey Dudoladov	1b718fd4c2	Minor improvemets in reporting service account creation	2018-04-26 13:47:25 +02:00
Sergey Dudoladov	4255e702bc	Always empty account's namespace after parsing	2018-04-25 13:57:24 +02:00
Sergey Dudoladov	d99b553ec1	Convert default account definiton into JSON	2018-04-25 12:35:16 +02:00
Sergey Dudoladov	e3f7fac443	Comment on the default value for pod service account name	2018-04-24 15:41:28 +02:00
Sergey Dudoladov	3d0ab40d64	Explicitly warn on account name mismatch	2018-04-24 15:31:22 +02:00
Sergey Dudoladov	485ec4b8ea	Move service account to Controller	2018-04-24 15:13:08 +02:00
Sergey Dudoladov	bc8b950da4	Tolerate issues of the Teams API	2018-04-23 16:31:53 +02:00
Sergey Dudoladov	c31c76281c	Make operator unaware of its own service account	2018-04-23 14:38:20 +02:00
Sergey Dudoladov	5daf0a4172	Fix error reporting during pod service account creation	2018-04-20 14:20:38 +02:00
Sergey Dudoladov	bd51d2922b	Turn ServiceAccount into struct value to avoid race conditon during account creation	2018-04-20 13:05:05 +02:00
Sergey Dudoladov	a5a65e93f4	Name service account consistenly	2018-04-19 16:15:52 +02:00
Sergey Dudoladov	23f893647c	Remove sync of pod service accounts	2018-04-19 15:48:58 +02:00
Sergey Dudoladov	214ae04aa7	Deploy service account for pod creation on demand	2018-04-18 16:20:20 +02:00
Oleksii Kliukin	0618723a61	Check rolling updates using controller revisions. Compare pods controller revisions with the one for the statefulset to determine whether the pod is running the latest revision and, therefore, no rolling update is necessary. This is performed only during the operator start, afterwards the rolling update status that is stored locally in the cluster structure is used for all rolling update decisions.	2018-04-09 18:07:24 +02:00
Manuel Gómez	88c68712b6	Fix statefulset label selector diffing (#273 ) Otherwise, rolling updates are done unnecessarily.	2018-04-06 17:21:57 +02:00
Oleksii Kliukin	9bf80afa6b	Remove team from statefulset selector (#271 ) * Remove 'team' label from the statefulset selector. I was never supposed to be there, but implicitely statefulset creates a selector out of meta.labels field. That is the problem with recent Kubernetes, since statefulset cannot pick up pods with non-matching label selectors, and we rely on statefulset picking up old pods after statefulset replacement. Make sure selector changes trigger replacement of the statefulset. In the case new selector has more labels than the old one nothing should be done with a statefulset, otherwise the new statefulset won't see orphaned pods from the old one, as they won't match the selector. See https://github.com/kubernetes/kubernetes/issues/46901#issuecomment-356418393	2018-04-06 13:58:47 +02:00
Oleksii Kliukin	26db91c53e	Improve infrastructure role definitions (#208 ) Enhance definitions of infrastructure roles by allowing membership in multiple roles, role options and per-role configuration to be specified in the infrastructure role configmap, which must have the same name as the infrastructure role secret. See manifests/infrastructure-roles-configmap.yaml for the examples and updated README for the description of different types of database roles supposed by the operator and their purposes. Change the logic of merging infrastructure roles with the manifest roles when they have the same name, to return the infrastructure role unchanged instead of merging. Previously, we used to propagate flags from the manifest role to the resulting infrastructure one, as there were no way to define flags for the infrastructure role; however, this is not the case anymore. Code review and tests by @erthalion	2018-04-04 17:21:36 +02:00

1 2 3 4 5 ...

565 Commits