postgres-operator

Commit Graph

Author	SHA1	Message	Date
zerg-junior	86ba92ad02	Rename 'permanent_slots' field to 'slots' (#401 )	2018-10-31 16:11:28 +01:00
zerg-junior	1b4181a724	[WIP] Add the ability to configure replications slots in Patroni (#398 ) * Add the ability to configure replication slots in Patroni * Add debugging to Makefile for CDP builds	2018-10-31 13:10:56 +01:00
Oleksii Kliukin	e1ed4b847d	Use code-generation for CRD API and deepcopy methods (#369 ) Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on. Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API. All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1 Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator. Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it. Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes. Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files. Update client-go dependencies. Minor reformatting and renaming.	2018-08-15 17:22:25 +02:00
Jan Mussler	6e8dcabac7	Update postgres-operator.yaml Bump manifest to use v1.0.0 operator	2018-08-10 14:17:44 +02:00
Oleksii Kliukin	0181a1b5b1	Introduce a repair scan to fix failing clusters (#304 ) A repair is a sync scan that acts only on those clusters that indicate that the last add, update or sync operation on them has failed. It is supposed to kick in more frequently than the repair scan. The repair scan still remains to be useful to fix the consequences of external actions (i.e. someone deletes a postgres-related service by mistake) unbeknownst to the operator. The repair scan is controlled by the new repair_period parameter in the operator configuration. It has to be at least 2 times more frequent than a sync scan to have any effect (a normal sync scan will update both last synced and last repaired attributes of the controller, since repair is just a sync underneath). A repair scan could be queued for a cluster that is already being synced if the sync period exceeds the interval between repairs. In that case a repair event will be discarded once the corresponding worker finds out that the cluster is not failing anymore. Review by @zerg-junior	2018-07-24 11:21:45 +02:00
zerg-junior	accbe20804	Upgrade version to enable RBAC in multiple namespace (#348 )	2018-07-19 18:22:30 +02:00
zerg-junior	417f13c0bd	Submit RBAC credentials during initial Event processing (#344 ) * During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.	2018-07-19 16:40:40 +02:00
Oleksii Kliukin	3a9378d3b8	Allow configuring the operator via the YAML manifest. (#326 ) * Up until now, the operator read its own configuration from the configmap. That has a number of limitations, i.e. when the configuration value is not a scalar, but a map or a list. We use a custom code based on github.com/kelseyhightower/envconfig to decode non-scalar values out of plain text keys, but that breaks when the data inside the keys contains both YAML-special elememtns (i.e. commas) and complex quotes, one good example for that is search_path inside `team_api_role_configuration`. In addition, reliance on the configmap forced a flag structure on the configuration, making it hard to write and to read (see https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778). The changes allow to supply the operator configuration in a proper YAML file. That required registering a custom CRD to support the operator configuration and provide an example at manifests/postgresql-operator-default-configuration.yaml. At the moment, both old configmap and the new CRD configuration is supported, so no compatibility issues, however, in the future I'd like to deprecate the configmap-based configuration altogether. Contrary to the configmap-based configuration, the CRD one doesn't embed defaults into the operator code, however, one can use the manifests/postgresql-operator-default-configuration.yaml as a starting point in order to build a custom configuration. Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters used to create the CRD were taken from the operator configuration, which is not possible if the configuration itself is stored in the CRD object, I've added the ability to specify them as environment variables `CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively. Per review by @zerg-junior and @Jan-M.	2018-07-16 16:20:46 +02:00
zerg-junior	7394c15d0a	Make AWS region configurable in the operator cofig map (#333 )	2018-06-27 17:29:02 +02:00
erthalion	e661ea1ea7	Mention `uid` field	2018-06-01 16:44:57 +02:00
zerg-junior	69e4ae2d95	Update postgres-operator.yaml Tags are of fixed length (not arbitrary long prefixes of commit hashes)	2018-05-25 12:59:12 +02:00
zerg-junior	9c86f8bd96	Fix conf for minikube (#301 ) * Bump up a Spilo version to use Patroni >= v1.4.4 ; this fixes issues with k8s 1.10 API changes * Bump up an operator version to use the new 'etcd_host' default value * Re-use 'zalando-postgres-operator' as a pod service account and add extra RBAC permissions to make it work * Document in quickstart connecting to Postgres via psql	2018-05-25 12:25:42 +02:00
Sergey Dudoladov	83a26fb78b	Rename RBAC file	2018-05-17 12:05:31 +02:00
Sergey Dudoladov	a926515530	Employ RBAC when run on minikube	2018-05-16 15:28:45 +02:00
Sergey Dudoladov	ca8542185a	Add RBAC to Quickstart guide	2018-05-16 11:01:16 +02:00
Oleksii Kliukin	40163677c7	Remove Kubernetes upgrade-related labels The node_eol_label is obsolete and not used. The node_readiness_label, if set, will prevent scheduling pods on the node without that label, by default minikube doesn't set any label on the node.	2018-05-08 15:50:10 +02:00
Jan M	2bb3bdeeb4	Slimming out README and config map, targeting easy first time deployers to minicube.	2018-05-04 12:20:54 +02:00
zerg-junior	8f08bef67c	Merge pull request #277 from zalando-incubator/automatically-deploy-service-account Deploy service account for pod creation on demand	2018-04-26 14:44:37 +02:00
Sergey Dudoladov	c31c76281c	Make operator unaware of its own service account	2018-04-23 14:38:20 +02:00
Manuel Gómez	5e1d86e31e	Fix clone timestamp key in example manifest (#276 ) It was set to `endTimestamp`, but it should be `timestamp`.	2018-04-16 18:23:41 +02:00
Oleksii Kliukin	c44cd9e4e6	Define the operator RBAC (#234 ) Note that the account here is named zalando-postgres-operator and not the 'operator' default that is created in the serviceaccount.yaml and also used by the operator configmap to create new postgres clusters. This is done intentionally, as to avoid breaking those setups that already work. Ideally, the operator should be run under the zalando-postgres-operator service account. However, the service account used to run Postgres clusters does not require all those privileges and is described at https://github.com/zalando/patroni/blob/master/kubernetes/patroni_k8s.yaml The service account defined here acquires some privileges not really used by the operator (i.e. we only need list and watch on configmaps), this is also done intentionally to avoid breaking things if someone decides to configure the same service account in the operator's configmap to run postgres clusters. Documentation and further testing by @zerg-junior	2018-04-05 11:24:24 +02:00
Oleksii Kliukin	26db91c53e	Improve infrastructure role definitions (#208 ) Enhance definitions of infrastructure roles by allowing membership in multiple roles, role options and per-role configuration to be specified in the infrastructure role configmap, which must have the same name as the infrastructure role secret. See manifests/infrastructure-roles-configmap.yaml for the examples and updated README for the description of different types of database roles supposed by the operator and their purposes. Change the logic of merging infrastructure roles with the manifest roles when they have the same name, to return the infrastructure role unchanged instead of merging. Previously, we used to propagate flags from the manifest role to the resulting infrastructure one, as there were no way to define flags for the infrastructure role; however, this is not the case anymore. Code review and tests by @erthalion	2018-04-04 17:21:36 +02:00
zerg-junior	ff5793b584	Merge pull request #258 from zalando-incubator/always-create-replica-service [WIP] Always create replica service	2018-03-29 14:42:26 +02:00
Sergey Dudoladov	96d46252f5	Change the default values to closer match previous behaviour	2018-03-26 11:43:46 +02:00
Sergey Dudoladov	a8862aeee1	Enable backward compatibility for enable_load_balancer setting from operator configmap	2018-03-19 17:19:50 +01:00
Sergey Dudoladov	931b48fcbb	Respond to code reviews	2018-03-16 15:36:42 +01:00
Sergey Dudoladov	0986e56226	Add separate params for master and replica load balancers to operator configuration	2018-03-14 12:12:28 +01:00
Sergey Dudoladov	ac6c5bcf09	Explicitly name replica and master load balancer params in PostgresSpec	2018-03-14 12:03:27 +01:00
zerg-junior	cca50122a6	Delete config file added by mistake	2018-03-12 12:54:02 +01:00
Sergey Dudoladov	6839ce0170	Fix configuration of dns names	2018-03-12 12:45:52 +01:00
Jan Mussler	cb55749c1b	Update postgres-operator.yaml (#255 ) Bump operator image version.	2018-02-26 20:03:56 +01:00
Sergey Dudoladov	dcfc9925f6	Respond to code review	2018-02-20 14:43:02 +01:00
Sergey Dudoladov	4c23917d42	Watch all namespaces if the relevant param is empty string / 'default' if param is unset	2018-02-12 11:47:56 +01:00
Sergey Dudoladov	c0bc8eaa6d	Comment manifests	2018-02-08 15:15:47 +01:00
Sergey Dudoladov	8b7bbde06e	Make env var overwrite configmap setting for watching namespaces	2018-02-06 16:12:47 +01:00
Sergey Dudoladov	0ef801f4e0	Add example of the watched namespace to the operator config map	2018-02-06 15:16:21 +01:00
Oleksii Kliukin	b90a36c909	Set node_readiness_label default to an empty value. (#204 ) Previously, it was set to the lifecycle-status:ready, breaking a lot of minikube deployments. Also it was not possible befor to run with this label set to an empty value. Document the effect of the label in the new section of the documentation.	2018-01-16 15:43:03 +01:00
zerg-junior	6c57334666	Add an example for cloning a backup from existing cluster (#189 ) Add an example for cloning a backup from existing cluster	2017-12-19 16:21:06 +01:00
Sergey Dudoladov	c1b3ce8028	Fix loadBalancerConfig	2017-12-18 17:32:22 +01:00
zerg-junior	3c178f68df	Warn on infrastructure-roles.yaml format violations (#177 ) Emit a warning if there are unprocessed entries in the infrastructure-roles secret.	2017-12-15 17:21:41 +01:00
Oleksii Kliukin	dd0affc390	Tweak our reaction to the cluster upgrade process. Previously, the operator started to move the pods off the nodes to be decomissioned by watching the eol_node_label value. Every new postgres pod has been created with the anti-affinity to that label, making sure that the pods being moved won't land on another to be decomissioned node. The changes introduce another label that indicates the ready node. The new pod affinity will esnure that the pod is only scheduled to the node marked as ready, discarding the previous anti-affinity. That way the nodes can transition from the pending-decomission to the other statuses (drained, terminating) without having pods suddently scaled to them. In addition, rename the label that triggers the start of the upgrade process to node_eol_label (for consistency with node_readiness_label) and set its default vvalue to lifecycle-status:pending-decomission.	2017-11-30 14:11:49 +01:00
Oleksii Kliukin	975b21f633	Rename api roles configuration parameter. Change api_roles_configuration to team_api_role_configuration	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	415a7fdc4d	Allow global configuration options for API roles. Add options to the PgUser structure, potentially allowing to set per-role options in the cluster definition as well. Introduce api_roles_configuration operator option with the default of log_statement=all	2017-11-22 10:43:35 +01:00
Jan Mussler	a98a7c95c2	Reorganize Readme (#142 ) removing parts of config. * chaning secret name pattern to make things shorter. * Move section on self building docker image. * Fix typo. * Bump image. * bump version for pdb fix. * Changes in regards to review. * Fix xhyve driver link. * Move to new api, remove service account, not needed for minikube. * Changed minimal manifest and example to use right file. * Added service account for operator again, it is needed in pods anyways later.	2017-10-24 20:42:22 +02:00
Alexander Kukushkin	39200ba8d4	Enable k8s leader election (#145 ) and bump docker image version	2017-10-20 13:58:15 +02:00
Alexander Kukushkin	a98c712a52	Change spilo docker image to demospilo (#141 ) Image size is slightly more than 24MB, it doesn't contain wal-e and not suitable for production, but it is very good for demo purposes.	2017-10-19 13:53:12 +02:00
Oleksii Kliukin	eba23279c8	Kube cluster upgrade	2017-10-19 10:49:42 +02:00
Jan Mussler	cec695d48e	Superuser toggle for team members Make superuser toggleable for team members. Add and "admin" role to team members if superuser is disabled.	2017-10-12 15:01:54 +02:00
Murat Kabilov	702d901bd9	use clear name for env var denoting namespace to watch (#129 )	2017-10-12 10:42:20 +02:00
Murat Kabilov	a35e9c6119	move from tpr to crd	2017-10-06 15:12:08 +02:00
Murat Kabilov	00194d0130	create dbs on cluster create	2017-10-04 16:24:27 +03:00
Murat Kabilov	93d4bf2b55	Merge branch 'master' into api-improvements	2017-09-26 14:47:13 +02:00
Murat Kabilov	9a66e09b88	cluster history api endpoint	2017-09-26 14:30:45 +02:00
Murat Kabilov	d876f4d88e	set secret name template via config map	2017-09-18 14:25:09 +02:00
Oleksii Kliukin	8b85935a7a	Allow cloning clusters from the operator. (#90 ) Allow cloning clusters from the operator. The changes add a new JSON node `clone` with possible values `cluster` and `timestamp`. `cluster` is mandatory, and setting a non-empty `timestamp` triggers wal-e point in time recovery. Spilo and Patroni do the whole heavy-lifting, the operator just defines certain variables and gathers some data about how to connect to the host to clone or the target S3 bucket. As a minor change, set the image pull policy to IfNotPresent instead of Always to simplify local testing. Change the default replication username to standby.	2017-09-08 16:47:03 +02:00
Murat Kabilov	71dfb33b2b	make pod termination grace period configurable	2017-08-18 16:38:25 +02:00
Murat Kabilov	228639b839	add api port and ring log size values to the config map	2017-08-15 12:37:58 +02:00
Oleksii Kliukin	00150711e4	Configure load balancer on a per-cluster and operator-wide level (#57 ) * Deny all requests to the load balancer by default. * Operator-wide toggle for the load-balancer. * Define per-cluster useLoadBalancer option. If useLoadBalancer is not set - then operator-wide defaults take place. If it is true - the load balancer is created, otherwise a service type clusterIP is created. Internally, we have to completely replace the service if the service type changes. We cannot patch, since some fields from the old service that will remain after patch are incompatible with the new one, and handling them explicitly when updating the service is ugly and error-prone. We cannot update the service because of the immutable fields, that leaves us the only option of deleting the old service and creating the new one. Unfortunately, there is still an issue of unnecessary removal of endpoints associated with the service, it will be addressed in future commits. * Revert the unintended effect of go fmt * Recreate endpoints on service update. When the service type is changed, the service is deleted and then the one with the new type is created. Unfortnately, endpoints are deleted as well. Re-create them afterwards, preserving the original addresses stored in them. * Improve error messages and comments. Use generate instead of gen in names.	2017-06-30 13:38:49 +02:00
Murat Kabilov	e104a67260	Fix resync of the clusters	2017-06-08 11:51:48 +02:00
Murat Kabilov	f7aaf8863d	Change maintenance window format	2017-05-30 09:56:10 +02:00
Murat Kabilov	95a57d1e4f	Use named arguments in the DNS name format	2017-05-18 17:23:59 +02:00
Murat Kabilov	0fd498d4d3	set image pull policy to ifnotpresent	2017-05-12 16:38:42 +02:00
Murat Kabilov	deef84e606	remove new line from the token; remove unnecessary data keys from the postgresq-operator secret	2017-05-12 16:38:42 +02:00
Murat Kabilov	9ee9e286ec	make use of the local fake teams api	2017-05-12 16:38:42 +02:00
Murat Kabilov	2370659c69	Parallel cluster processing Run operations concerning multiple clusters in parallel. Each cluster gets its own worker in order to create, update, sync or delete clusters. Each worker acquires the lock on a cluster. Subsequent operations on the same cluster have to wait until the current one finishes. There is a pool of parallel workers, configurable with the `workers` parameter in the configmap and set by default to 4. The cluster-related tasks are assigned to the workers based on a cluster name: the tasks for the same cluster will be always assigned to the same worker. There is no blocking between workers, although there is a chance that a single worker will become a bottleneck if too many clusters are assigned to it; therefore, for large-scale deployments it might be necessary to bump up workers from the default value.	2017-05-12 11:41:35 +02:00
Oleksii Kliukin	1c4bce86df	Avoid "bulk-comparing" pod resources during sync. (#109 ) * Avoid "bulk-comparing" pod resources during sync. First attempt to fix bogus restarts due to the reported mismatch of container resources where one of the resources is an empty struct, while the other has all fields set to nil. In addition, add an ability to set limits and requests per pod, as well as the operator-level defaults.	2017-05-12 11:41:35 +02:00
Murat Kabilov	8026c69222	update default config param values	2017-05-12 11:41:34 +02:00
Murat Kabilov	da438aab3a	Use ConfigMap to store operator's config	2017-05-12 11:41:34 +02:00
Oleksii Kliukin	71b93b4cc2	Feature/infrastructure roles (#91 ) * Add infrastructure roles configured globally. Those are the roles defined in the operator itself. The operator's configuration refers to the secret containing role names, passwords and membership information. While they are referred to as roles, in reality those are users. In addition, improve the regex to filter out invalid users and make sure user secret names are compatible with DNS name spec. Add an example manifest for the infrastructure roles.	2017-05-12 11:41:33 +02:00
Murat Kabilov	dd2ed5ff9d	Add team name to tpr object metadata name	2017-05-12 11:41:33 +02:00
Murat Kabilov	c2d2a67ad5	Get config from environment variables; ignore pg major version change; get rid of resources package;	2017-05-12 11:41:29 +02:00
Oleksii Kliukin	1817bf65a1	Make example manifests minikube-friendly. Remove fixed namespace from all manifests, reduce resource requests. Remove the storageclass default, since it is not present in minikube. Use the team name instead of integer id, remove unused robots. The manifests are still compatible with the non-local deployment, the only difference is that now a namespace is requred (assuming that the operator can only be deployed in a specific namespace.)	2017-05-12 11:41:28 +02:00
Oleksii Kliukin	a2e78ac2ec	Feature/persistent volumes	2017-05-12 11:41:25 +02:00
Murat Kabilov	ae77fa15e8	Pod Rolling update introduce Pod events channel; add parsing of the MaintenanceWindows section; skip deleting Etcd key on cluster delete; use external etcd host; watch for tpr/pods in the namespace of the operator pod only;	2017-05-12 11:41:25 +02:00
Murat Kabilov	2b8956bd33	Add service account manifest	2017-05-12 11:41:19 +02:00
Murat Kabilov	dfde075c66	Use TPR object namespace while creating its objects	2017-05-12 11:37:09 +02:00
Murat Kabilov	6e2d64bd50	Create human users from teams api	2017-05-12 11:37:09 +02:00
Murat Kabilov	58506634c4	Create pg users	2017-05-12 11:37:09 +02:00
Murat Kabilov	abb1173035	Code refactor	2017-05-12 11:37:09 +02:00
Murat Kabilov	75e6bfa55c	makefile improvements	2017-05-12 11:37:07 +02:00

... 3 4 5 6 7

330 Commits