postgres-operator

Commit Graph

Author	SHA1	Message	Date
Dmitry Dolgov	d6e6b00770	Add shm_volume option (#427 ) Add possibility to mount a tmpfs volume to /dev/shm to avoid issues like [this](https://github.com/docker-library/postgres/issues/416). To achieve that two new options were introduced: * `enableShmVolume` to PostgreSQL manifest, to specify whether or not mount this volume per database cluster * `enable_shm_volume` to operator configuration, to specify whether or not mount per operator. The first one, `enableShmVolume` takes precedence to allow us to be more flexible.	2018-12-21 16:22:30 +01:00
zerg-junior	45c89b3da4	[WIP] Add set_memory_request_to_limit option (#406 ) * Add set_memory_request_to_limit option	2018-11-15 14:00:08 +01:00
zerg-junior	25fa45fd58	[WIP] Grant 'superuser' to the members of Postgres admin teams (#371 ) Added support for superuser team in addition to the admin team that owns the postgres cluster.	2018-08-30 10:51:37 +02:00
Oleksii Kliukin	e1ed4b847d	Use code-generation for CRD API and deepcopy methods (#369 ) Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on. Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API. All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1 Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator. Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it. Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes. Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files. Update client-go dependencies. Minor reformatting and renaming.	2018-08-15 17:22:25 +02:00
Oleksii Kliukin	59f0c5551e	Allow configuring pod priority globally and per cluster. (#353 ) * Allow configuring pod priority globally and per cluster. Allow to specify pod priority class for all pods managed by the operator, as well as for those belonging to individual clusters. Controlled by the pod_priority_class_name operator configuration parameter and the podPriorityClassName manifest option. See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass for the explanation on how to define priority classes since Kubernetes 1.8. Some import order changes are due to go fmt. Removal of OrphanDependents deprecated field. Code review by @zerg-junior	2018-08-03 14:03:37 +02:00
Oleksii Kliukin	0181a1b5b1	Introduce a repair scan to fix failing clusters (#304 ) A repair is a sync scan that acts only on those clusters that indicate that the last add, update or sync operation on them has failed. It is supposed to kick in more frequently than the repair scan. The repair scan still remains to be useful to fix the consequences of external actions (i.e. someone deletes a postgres-related service by mistake) unbeknownst to the operator. The repair scan is controlled by the new repair_period parameter in the operator configuration. It has to be at least 2 times more frequent than a sync scan to have any effect (a normal sync scan will update both last synced and last repaired attributes of the controller, since repair is just a sync underneath). A repair scan could be queued for a cluster that is already being synced if the sync period exceeds the interval between repairs. In that case a repair event will be discarded once the corresponding worker finds out that the cluster is not failing anymore. Review by @zerg-junior	2018-07-24 11:21:45 +02:00
zerg-junior	417f13c0bd	Submit RBAC credentials during initial Event processing (#344 ) * During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.	2018-07-19 16:40:40 +02:00
Oleksii Kliukin	25a306244f	Support for per-cluster and operator global sidecars (#331 ) * Define sidecars in the operator configuration. Right now only the name and the docker image can be defined, but with the help of the pod_environment_configmap parameter arbitrary environment variables can be passed to the sidecars. * Refactoring around generatePodTemplate. Original implementation of per-cluster sidecars by @theRealWardo Per review by @zerg-junior and @Jan-M	2018-07-02 16:25:27 +02:00
zerg-junior	7394c15d0a	Make AWS region configurable in the operator cofig map (#333 )	2018-06-27 17:29:02 +02:00
Oleksii Kliukin	9cb48e0889	Document operator configuration parameters. (#313 )	2018-06-08 13:21:57 +02:00
Sergey Dudoladov	2e041c50e6	Bump up default Spilo image	2018-05-28 16:54:27 +02:00
Manuel Gómez	32a1456a68	Update config.go	2018-05-24 16:58:46 +02:00
Sergey Dudoladov	749d723f55	Shorten the commen	2018-05-24 16:22:13 +02:00
Sergey Dudoladov	9824ddae5e	Fix etcd_host default	2018-05-24 16:05:45 +02:00
Oleksii Kliukin	987b43456b	Deprecate old LB options, fix endpoint sync. (#287 ) * Depreate old LB options, fix endpoint sync. - deprecate useLoadBalancer, replicaLoadBalancer from the manifest and enable_load_balancer from the operator configuration. The old operator configuration options become no-op with this commit. For the old manifest options, `useLoadBalancer` and `replicaLoadBalancer` are still consulted, but only in the absense of the new ones (enableMasterLoadBalancer and enableReplicaLoadBalancer). - Make sure the endpoint being created during the sync receives proper addresses subset. This is more critical for the replicas, as for the masters Patroni will normally re-create the endpoint before the operator. - Avoid creating the replica endpoint, since it will be created automatically by the corresponding service. - Update the README and unit tests. Code review by @mgomezch and @zerg-junior	2018-05-15 15:19:18 +02:00
Sergey Dudoladov	59ded0c212	Shorten bucket name	2018-05-02 14:05:57 +02:00
Sergey Dudoladov	c45219bafa	Set up an S3 bucket for the postgres daily logs	2018-05-02 12:52:42 +02:00
Sergey Dudoladov	d99b553ec1	Convert default account definiton into JSON	2018-04-25 12:35:16 +02:00
Sergey Dudoladov	e3f7fac443	Comment on the default value for pod service account name	2018-04-24 15:41:28 +02:00
Sergey Dudoladov	485ec4b8ea	Move service account to Controller	2018-04-24 15:13:08 +02:00
Sergey Dudoladov	c31c76281c	Make operator unaware of its own service account	2018-04-23 14:38:20 +02:00
Sergey Dudoladov	bd51d2922b	Turn ServiceAccount into struct value to avoid race conditon during account creation	2018-04-20 13:05:05 +02:00
Sergey Dudoladov	214ae04aa7	Deploy service account for pod creation on demand	2018-04-18 16:20:20 +02:00
Sergey Dudoladov	96d46252f5	Change the default values to closer match previous behaviour	2018-03-26 11:43:46 +02:00
Sergey Dudoladov	a8862aeee1	Enable backward compatibility for enable_load_balancer setting from operator configmap	2018-03-19 17:19:50 +01:00
Sergey Dudoladov	145689c950	Disable load balancer for master service by default (it may cost money)	2018-03-16 13:18:13 +01:00
Sergey Dudoladov	0986e56226	Add separate params for master and replica load balancers to operator configuration	2018-03-14 12:12:28 +01:00
Sergey Dudoladov	e048328d6a	Comment on special values for watched namespace	2018-02-20 17:26:17 +01:00
Sergey Dudoladov	dcfc9925f6	Respond to code review	2018-02-20 14:43:02 +01:00
Sergey Dudoladov	74fa7b9492	Restrict operator to single watched namespace via env var	2018-02-07 16:44:49 +01:00
Sergey Dudoladov	ea84f9d577	Rename the configmap 'namespace' entry to avoid confusion with the map's owm namespace	2018-02-06 15:09:00 +01:00
Oleksii Kliukin	b90a36c909	Set node_readiness_label default to an empty value. (#204 ) Previously, it was set to the lifecycle-status:ready, breaking a lot of minikube deployments. Also it was not possible befor to run with this label set to an empty value. Document the effect of the label in the new section of the documentation.	2018-01-16 15:43:03 +01:00
Oleksii Kliukin	8e99518eeb	Improve behavior on node decomissionining (#184 ) * Trigger the node migration on the lack of the readiness label. * Examine the node's readiness status on node add. Make sure we don't miss the not ready node, especially when the operator is killed during the migration.	2018-01-04 11:53:15 +01:00
Manuel Gómez	15c278d4e8	Scalyr agent sidecar for log shipping (#190 ) * Scalyr agent sidecar for log shipping * Remove the default for the Scalyr image Now the image needs to be specified explicitly to enable log shipping to Scalyr. This removes the problem of having to generate the config file or publish our agent image repository. * Add configuration variable for Scalyr server URL Defaults to the EU address. * Alter style Newlines are cheap and make code easier to edit/refactor, but ok. * Fix StatefulSet comparison logic I broke it when I made the comparison consider all containers in the PostgreSQL pod.	2017-12-21 15:34:26 +01:00
Oleksii Kliukin	bf80f5225e	Introduce higher and lower bounds for the number of instances (#178 ) * Introduce higher and lower bounds for the number of instances Reduce the number of instances to the min_instances if it is lower and to the max_instances if it is higher. -1 for either of those means there is no lower or upper bound. In addition, terminate the operator when there is a nonsense in the configuration (i.e. max_instances < min_instances). Reviewed by Jan Mußler and Sergey Dudoladov.	2017-12-15 16:02:50 +01:00
Georg Kunz	e8d9c75949	Allow custom Postgres pod environment variables	2017-12-14 14:39:33 +01:00
Oleksii Kliukin	1fb8cf7ea0	Avoid overwriting critical users. (#172 ) * Avoid overwriting critical users. Disallow defining new users either in the cluster manifest, teams API or infrastructure roles with the names mentioned in the new protected_role_names parameter (list of comma-separated names) Additionally, forbid defining a user with the name matching either super_username or replication_username, so that we don't overwrite system roles required for correct working of the operator itself. Also, clear PostgreSQL roles on each sync first in order to avoid using the old definitions that are no longer present in the current manifest, infrastructure roles secret or the teams API.	2017-12-05 14:27:12 +01:00
Oleksii Kliukin	dd0affc390	Tweak our reaction to the cluster upgrade process. Previously, the operator started to move the pods off the nodes to be decomissioned by watching the eol_node_label value. Every new postgres pod has been created with the anti-affinity to that label, making sure that the pods being moved won't land on another to be decomissioned node. The changes introduce another label that indicates the ready node. The new pod affinity will esnure that the pod is only scheduled to the node marked as ready, discarding the previous anti-affinity. That way the nodes can transition from the pending-decomission to the other statuses (drained, terminating) without having pods suddently scaled to them. In addition, rename the label that triggers the start of the upgrade process to node_eol_label (for consistency with node_readiness_label) and set its default vvalue to lifecycle-status:pending-decomission.	2017-11-30 14:11:49 +01:00
Oleksii Kliukin	1ffe98ba9f	Fix the connection leak and user options sync. - fix the lack of closing the cursor for the query that returned no rows. - fix syncing of the user options, as previously those were not fetched from the database.	2017-11-27 16:46:34 +01:00
Oleksii Kliukin	086ead03f5	Warn about attempts to use escape quotes.	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	975b21f633	Rename api roles configuration parameter. Change api_roles_configuration to team_api_role_configuration	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	2352fc9a39	go fmt run	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	71f57c9fe3	Fix escaping of parameter values and extra spaces. - document the newly introduced option (for now in the main README) - make query error output more readable.	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	415a7fdc4d	Allow global configuration options for API roles. Add options to the PgUser structure, potentially allowing to set per-role options in the cluster definition as well. Introduce api_roles_configuration operator option with the default of log_statement=all	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	c25e849fe4	Fix a failure to create new statefulset at sync. Also do a fmt run.	2017-11-08 18:24:17 +01:00
Georg Kunz	47dd766fa7	Add node toleration config to PodSpec (#151 ) * Add node toleration config to PodSpec This allows to taint nodes dedicated to Postgres and prevents other pods from running on these nodes. * Document taint and toleration setup And remove setting from default operator ConfigMap * Allow to overwrite tolerations with Postgres manifest	2017-11-02 19:10:44 +01:00
Oleksii Kliukin	eba23279c8	Kube cluster upgrade	2017-10-19 10:49:42 +02:00
Jan Mussler	cec695d48e	Superuser toggle for team members Make superuser toggleable for team members. Add and "admin" role to team members if superuser is disabled.	2017-10-12 15:01:54 +02:00
Murat Kabilov	a35e9c6119	move from tpr to crd	2017-10-06 15:12:08 +02:00
Murat Kabilov	93d4bf2b55	Merge branch 'master' into api-improvements	2017-09-26 14:47:13 +02:00
Murat Kabilov	9a66e09b88	cluster history api endpoint	2017-09-26 14:30:45 +02:00
Murat Kabilov	d876f4d88e	set secret name template via config map	2017-09-18 14:25:09 +02:00
Oleksii Kliukin	8b85935a7a	Allow cloning clusters from the operator. (#90 ) Allow cloning clusters from the operator. The changes add a new JSON node `clone` with possible values `cluster` and `timestamp`. `cluster` is mandatory, and setting a non-empty `timestamp` triggers wal-e point in time recovery. Spilo and Patroni do the whole heavy-lifting, the operator just defines certain variables and gathers some data about how to connect to the host to clone or the target S3 bucket. As a minor change, set the image pull policy to IfNotPresent instead of Always to simplify local testing. Change the default replication username to standby.	2017-09-08 16:47:03 +02:00
Murat Kabilov	71dfb33b2b	make pod termination grace period configurable	2017-08-18 16:38:25 +02:00
Murat Kabilov	82d5583809	add diagnostic api http server	2017-08-15 12:20:09 +02:00
Murat Kabilov	51fdfb90f7	log cluster and controller events in the ringlog via logrus hook	2017-08-15 12:16:09 +02:00
Oleksii Kliukin	8b58782a4a	fix pam_role_name parameter name.	2017-08-02 17:55:06 +02:00
Murat Kabilov	cf663cb841	Fix golint warnings	2017-08-01 16:08:56 +02:00
Murat Kabilov	4f36e447c3	Skip config params with no values (#62 )	2017-07-14 17:22:25 +02:00
Oleksii Kliukin	00150711e4	Configure load balancer on a per-cluster and operator-wide level (#57 ) * Deny all requests to the load balancer by default. * Operator-wide toggle for the load-balancer. * Define per-cluster useLoadBalancer option. If useLoadBalancer is not set - then operator-wide defaults take place. If it is true - the load balancer is created, otherwise a service type clusterIP is created. Internally, we have to completely replace the service if the service type changes. We cannot patch, since some fields from the old service that will remain after patch are incompatible with the new one, and handling them explicitly when updating the service is ugly and error-prone. We cannot update the service because of the immutable fields, that leaves us the only option of deleting the old service and creating the new one. Unfortunately, there is still an issue of unnecessary removal of endpoints associated with the service, it will be addressed in future commits. * Revert the unintended effect of go fmt * Recreate endpoints on service update. When the service type is changed, the service is deleted and then the one with the new type is created. Unfortnately, endpoints are deleted as well. Re-create them afterwards, preserving the original addresses stored in them. * Improve error messages and comments. Use generate instead of gen in names.	2017-06-30 13:38:49 +02:00
Murat Kabilov	e104a67260	Fix resync of the clusters	2017-06-08 11:51:48 +02:00
Oleksii Kliukin	dc36c4ca12	Implement replicaLoadBalancer boolean flag. (#38 ) The flag adds a replica service with the name cluster_name-repl and a DNS name that defaults to {cluster}-repl.{team}.{hostedzone}. The implementation converted Service field of the cluster into a map with one or two elements and deals with the cases when the new flag is changed on a running cluster (the update and the sync should create or delete the replica service). In order to pick up master and replica service and master endpoint when listing cluster resources. * Update the spec when updating the cluster.	2017-06-07 13:54:17 +02:00
Murat Kabilov	d34273543e	Fix the golint, gosimple warnings	2017-05-18 17:38:54 +02:00
Murat Kabilov	95a57d1e4f	Use named arguments in the DNS name format	2017-05-18 17:23:59 +02:00
Oleksii Kliukin	5adceceb36	go fmt run	2017-05-12 17:48:25 +02:00
Oleksii Kliukin	abd04e6f5a	Avoid abbreviations in user-facing parameters.	2017-05-12 17:44:51 +02:00
Oleksii Kliukin	03064637f1	Allow disabling access to the DB and the Teams API. Command-line options --nodatabaseaccess and --noteamsapi disable all teams api interaction and access to the Postgres database. This is useful for debugging purposes when the operator runs out of cluster (with --outofcluster flag). The same effect can be achieved by setting enable_db_access and/or enable_teams_api to false.	2017-05-12 17:40:48 +02:00
Murat Kabilov	92d7fbf372	replace github.bus.zalan.do with github.cm/zalando-incubator	2017-05-12 11:50:16 +02:00
Murat Kabilov	2370659c69	Parallel cluster processing Run operations concerning multiple clusters in parallel. Each cluster gets its own worker in order to create, update, sync or delete clusters. Each worker acquires the lock on a cluster. Subsequent operations on the same cluster have to wait until the current one finishes. There is a pool of parallel workers, configurable with the `workers` parameter in the configmap and set by default to 4. The cluster-related tasks are assigned to the workers based on a cluster name: the tasks for the same cluster will be always assigned to the same worker. There is no blocking between workers, although there is a chance that a single worker will become a bottleneck if too many clusters are assigned to it; therefore, for large-scale deployments it might be necessary to bump up workers from the default value.	2017-05-12 11:41:35 +02:00
Oleksii Kliukin	1c4bce86df	Avoid "bulk-comparing" pod resources during sync. (#109 ) * Avoid "bulk-comparing" pod resources during sync. First attempt to fix bogus restarts due to the reported mismatch of container resources where one of the resources is an empty struct, while the other has all fields set to nil. In addition, add an ability to set limits and requests per pod, as well as the operator-level defaults.	2017-05-12 11:41:35 +02:00
Murat Kabilov	8026c69222	update default config param values	2017-05-12 11:41:34 +02:00
Murat Kabilov	da438aab3a	Use ConfigMap to store operator's config	2017-05-12 11:41:34 +02:00
Murat Kabilov	08c0e3b6dd	Use unified type for the namespaced object names	2017-05-12 11:41:34 +02:00
Oleksii Kliukin	71b93b4cc2	Feature/infrastructure roles (#91 ) * Add infrastructure roles configured globally. Those are the roles defined in the operator itself. The operator's configuration refers to the secret containing role names, passwords and membership information. While they are referred to as roles, in reality those are users. In addition, improve the regex to filter out invalid users and make sure user secret names are compatible with DNS name spec. Add an example manifest for the infrastructure roles.	2017-05-12 11:41:33 +02:00
Murat Kabilov	dd2ed5ff9d	Add team name to tpr object metadata name	2017-05-12 11:41:33 +02:00
Oleksii Kliukin	5b66d0adba	Correct go json tags (extra space).	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	a5f0ef10d0	go fmt run	2017-05-12 11:41:31 +02:00
Oleksii Kliukin	0764505a10	correct the wal bucket parameter name.	2017-05-12 11:41:31 +02:00
Oleksii Kliukin	7841b85892	Add configuration to support running WAL-E. - Set WAL_S3_BUCKET to point WAL-E where to fetch/store WAL files - Set annotations/iam.amazonaws.com/role to set the role to access AWS" The new env vairables are PGOP_WAL_S3_BUCKET and PGOP_KUBE_IAM_ROLE.	2017-05-12 11:41:31 +02:00
Murat Kabilov	852c5beae5	Check etcd key availability for the new cluster	2017-05-12 11:41:31 +02:00
Oleksii Kliukin	b69b6b26e5	git fmt run	2017-05-12 11:41:30 +02:00
Murat Kabilov	310c119dfa	Display config on operator start up	2017-05-12 11:41:30 +02:00
Murat Kabilov	a97dfb07de	fix struct tag delimiter	2017-05-12 11:41:30 +02:00
Oleksii Kliukin	3a4c6268be	Increase log verbosity, namely for object updates. - add a new environment variable for triggering debug log level - show both new, old object and diff during syncs and updates - use pretty package to pretty-print go structures -	2017-05-12 11:41:29 +02:00
Murat Kabilov	c2d2a67ad5	Get config from environment variables; ignore pg major version change; get rid of resources package;	2017-05-12 11:41:29 +02:00

1 2 3

135 Commits