postgres-operator

Commit Graph

Author	SHA1	Message	Date
Oleksii Kliukin	03064637f1	Allow disabling access to the DB and the Teams API. Command-line options --nodatabaseaccess and --noteamsapi disable all teams api interaction and access to the Postgres database. This is useful for debugging purposes when the operator runs out of cluster (with --outofcluster flag). The same effect can be achieved by setting enable_db_access and/or enable_teams_api to false.	2017-05-12 17:40:48 +02:00
Murat Kabilov	92d7fbf372	replace github.bus.zalan.do with github.cm/zalando-incubator	2017-05-12 11:50:16 +02:00
Murat Kabilov	1b82009151	Command exec inside the Pod method	2017-05-12 11:41:36 +02:00
Murat Kabilov	28a74622d7	Fix typo in the teams api json spec	2017-05-12 11:41:36 +02:00
Murat Kabilov	18700b9ef7	Optimize template constant	2017-05-12 11:41:36 +02:00
Murat Kabilov	fd449342e5	Use Kubernetes API instead of API group	2017-05-12 11:41:36 +02:00
Oleksii Kliukin	ec3f24c3ee	Honor the "spec-by-example" manifest we have. No longer ignore custom PostgreSQL and Patroni parameters and initdb options. Since all Patroni parameters that are not under initdb or pg_hba are specified as a plain map, there is no way to distinguish those that should go into the bootstrap section from those that should stay in the local configuration. As the example used only bootstrap parameters, currently all such options go into the bootstrap section. Also the initdb options are repsented as a map, while Patroni initdb options are a list of either maps or strings (i.e. "data-checksums" doesn't need an argument). For now, there is a work-around, but in the future we might consider changing the spec.	2017-05-12 11:41:36 +02:00
Oleksii Kliukin	6983f444ed	Periodically sync roles with the running clusters. (#102 ) The sync adds or alters database roles based on the roles defined in the cluster's TPR, Team API and operator's infrastructure roles. At the moment, roles are not deleted, as it would be dangerous for the robot roles in case TPR is misconfigured. In addition, ALTER ROLE does not remove role options, i.e. SUPERUSER or CREATEROLE, neither it removes role membership: only new options are added and new role membership is granted. So far, options like NOSUPERUSER and NOCREATEROLE won't be handed correctly, when mixed with the non-negative counterparts, also NOLOGIN should be processed correctly. The code assumes that only MD5 passwords are stored in the DB and will likely break with the new SCRAM auth in PostgreSQL 10. On the implementation side, create the new interface to abstract roles merge and creation, move most of the role-based functionality from cluster/pg into the new 'users' module, strip create user code of special cases related to human-based users (moving them to init instead) and fixed the password md5 generator to avoid processing already encrypted passwords. In addition, moved the system roles off the slice containing all other roles in order to avoid extra efforts to avoid creating them. Also, fix a leak in DB connections when the new connection is not considered healthy and discarded without being closed. Initialize the database during the sync phase before syncing users.	2017-05-12 11:41:35 +02:00
Martin Linkhorst	411487e66d	update annotation for ExternalDNS (#115 )	2017-05-12 11:41:35 +02:00
Oleksii Kliukin	9f9a89185f	Do rolling update after creating of a statefulset if pods were present. (#110 ) Make sure we always re-create pods if we had to create the statefulset, even if the pods from the old statefulset were already there.	2017-05-12 11:41:35 +02:00
Oleksii Kliukin	49cb395aed	Set ELB timeout annotation for the service. (#114 ) By default the ELB terminates the idle connection after 60 seconds. Increase this interval to a more reasonable one of 1 h.	2017-05-12 11:41:35 +02:00
Murat Kabilov	2370659c69	Parallel cluster processing Run operations concerning multiple clusters in parallel. Each cluster gets its own worker in order to create, update, sync or delete clusters. Each worker acquires the lock on a cluster. Subsequent operations on the same cluster have to wait until the current one finishes. There is a pool of parallel workers, configurable with the `workers` parameter in the configmap and set by default to 4. The cluster-related tasks are assigned to the workers based on a cluster name: the tasks for the same cluster will be always assigned to the same worker. There is no blocking between workers, although there is a chance that a single worker will become a bottleneck if too many clusters are assigned to it; therefore, for large-scale deployments it might be necessary to bump up workers from the default value.	2017-05-12 11:41:35 +02:00
Oleksii Kliukin	1c4bce86df	Avoid "bulk-comparing" pod resources during sync. (#109 ) * Avoid "bulk-comparing" pod resources during sync. First attempt to fix bogus restarts due to the reported mismatch of container resources where one of the resources is an empty struct, while the other has all fields set to nil. In addition, add an ability to set limits and requests per pod, as well as the operator-level defaults.	2017-05-12 11:41:35 +02:00
Murat Kabilov	9b0d0d487c	Use PATCH while updating Services and StatefulSets	2017-05-12 11:41:34 +02:00
Murat Kabilov	8026c69222	update default config param values	2017-05-12 11:41:34 +02:00
Murat Kabilov	a7c57874d5	Do not create roles if cluster is masterless fix pod deletion	2017-05-12 11:41:34 +02:00
Murat Kabilov	da438aab3a	Use ConfigMap to store operator's config	2017-05-12 11:41:34 +02:00
Oleksii Kliukin	47e3e29a56	Add version label to the cluster. (#96 ) * Add version label to the cluster. According to the STUPS team the daemon that exports logs to scalyr stops the export if the version label is missing. * Move label names to constants. * Run go fmt	2017-05-12 11:41:34 +02:00
Murat Kabilov	08c0e3b6dd	Use unified type for the namespaced object names	2017-05-12 11:41:34 +02:00
Murat Kabilov	79fdba4ac7	make sure name of the cluster matches format {teamname}-{clustername}	2017-05-12 11:41:34 +02:00
Oleksii Kliukin	71b93b4cc2	Feature/infrastructure roles (#91 ) * Add infrastructure roles configured globally. Those are the roles defined in the operator itself. The operator's configuration refers to the secret containing role names, passwords and membership information. While they are referred to as roles, in reality those are users. In addition, improve the regex to filter out invalid users and make sure user secret names are compatible with DNS name spec. Add an example manifest for the infrastructure roles.	2017-05-12 11:41:33 +02:00
Murat Kabilov	b8fba429df	typo in service name	2017-05-12 11:41:33 +02:00
Murat Kabilov	3bd9b3b42f	typo in config name	2017-05-12 11:41:33 +02:00
Murat Kabilov	16cc517106	Add name for the service port	2017-05-12 11:41:33 +02:00
Murat Kabilov	dd2ed5ff9d	Add team name to tpr object metadata name	2017-05-12 11:41:33 +02:00
Murat Kabilov	db53134cbd	Skip syncing Pods	2017-05-12 11:41:33 +02:00
Murat Kabilov	655f6dcadb	make cluster resources private	2017-05-12 11:41:33 +02:00
Murat Kabilov	101dc06acb	Better logging for teams api calls	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	5b66d0adba	Correct go json tags (extra space).	2017-05-12 11:41:32 +02:00
Murat Kabilov	322676a6b9	Skip deleting Pods and PVCs if failed to delete StatefulSet	2017-05-12 11:41:32 +02:00
Murat Kabilov	bb4fec25ae	Fix deletion of the failed cluster; more debug messages	2017-05-12 11:41:32 +02:00
Murat Kabilov	ce90a54cf9	create key in the cluster map on cluster creation failure	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	3b99ce3d2e	Improve the diff in cluster resources. - Use the branch of pretty with this feature fixed: https://github.com/kr/pretty/pull/42 - Add the Limit to the resources declaration to avoid dummy differences between statefulsets (where both Resource structures are empty, but in one case the fields are not mentioned, while in another they are assigned to empty values).	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	455f91128f	Move master/replica role names into the constants.	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	176c6e8b19	Avoid passing the role into the recreatePod. Conceptually, the operator's task is just to change the pod. As it has no influence over the role the pod will take (either the master or a replica), it shouldn't wait for the specific role. This fixes at least one issue, where the pod running in a single-pod cluster has been waited for forever by the operator expecting it to have a wrong role (since Patroni callback assiging it the original replica role has been killed after a quick promote by the next callback.)	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	8e658174e8	Fix the issue with calling a non-existent function.	2017-05-12 11:41:31 +02:00
Murat Kabilov	d4bb72989a	Warn if etcd key for the new cluster already exist	2017-05-12 11:41:31 +02:00
Oleksii Kliukin	a5f0ef10d0	go fmt run	2017-05-12 11:41:31 +02:00
Oleksii Kliukin	0764505a10	correct the wal bucket parameter name.	2017-05-12 11:41:31 +02:00
Oleksii Kliukin	7841b85892	Add configuration to support running WAL-E. - Set WAL_S3_BUCKET to point WAL-E where to fetch/store WAL files - Set annotations/iam.amazonaws.com/role to set the role to access AWS" The new env vairables are PGOP_WAL_S3_BUCKET and PGOP_KUBE_IAM_ROLE.	2017-05-12 11:41:31 +02:00
Murat Kabilov	852c5beae5	Check etcd key availability for the new cluster	2017-05-12 11:41:31 +02:00
Oleksii Kliukin	04ed22f73f	Remove unnecessary initializations.	2017-05-12 11:41:31 +02:00
Murat Kabilov	ee83e196a9	Fix secrets sync * log if secret already exists	2017-05-12 11:41:30 +02:00
Oleksii Kliukin	8268b07ad2	Set logger level per package instead of doing this globally	2017-05-12 11:41:30 +02:00
Oleksii Kliukin	8db44d6f18	Avoid unnecessary marshaling.	2017-05-12 11:41:30 +02:00
Oleksii Kliukin	b69b6b26e5	git fmt run	2017-05-12 11:41:30 +02:00
Murat Kabilov	310c119dfa	Display config on operator start up	2017-05-12 11:41:30 +02:00
Murat Kabilov	a97dfb07de	fix struct tag delimiter	2017-05-12 11:41:30 +02:00
Oleksii Kliukin	ba8e8d1857	Avoid showing objects alongside diffs. That reduces the amount of clutter in the debug output. Run go fmt on the sources.	2017-05-12 11:41:30 +02:00
Oleksii Kliukin	3a4c6268be	Increase log verbosity, namely for object updates. - add a new environment variable for triggering debug log level - show both new, old object and diff during syncs and updates - use pretty package to pretty-print go structures -	2017-05-12 11:41:29 +02:00
Oleksii Kliukin	19dfa0c2b0	Run pod in a privileged securityContext. This is necessary in order to access devices from the pod. The target is to be able to run resize2fs on a pod itself.	2017-05-12 11:41:29 +02:00
Murat Kabilov	c2d2a67ad5	Get config from environment variables; ignore pg major version change; get rid of resources package;	2017-05-12 11:41:29 +02:00
Murat Kabilov	79a6726d4d	Increase logging verbosity, restructure code	2017-05-12 11:41:28 +02:00
Murat Kabilov	3aaa05fb96	Use encrypted passwords while creating robot users	2017-05-12 11:41:28 +02:00
Oleksii Kliukin	48ba6adf8a	Avoid calling Team API with an expired token. Previously, the controller fetched the Oauth token once at start, so eventually the token would expire and the operator could not create new users. This commit makes the operator fetch the token before each call to the Teams API.	2017-05-12 11:41:28 +02:00
Murat Kabilov	b6e6308bdc	wait for the pods from the previous rolling update	2017-05-12 11:41:28 +02:00
Murat Kabilov	bbdc2f52a9	fix resource load and list	2017-05-12 11:41:28 +02:00
Murat Kabilov	6f7399b36f	Sync clusters states * move statefulset creation from cluster spec to the separate function * sync cluster state with desired state; * move out from arrays for cluster resources; * recreate pods instead of deleting them in case of statefulset change * check for master while creating cluster/updating pods * simplify retryutil * list pvc while listing resources * name kubernetes resources with capital letter * do rolling update in case of env variables change	2017-05-12 11:41:27 +02:00
Oleksii Kliukin	1377724b2e	Fix a compliation error.	2017-05-12 11:41:27 +02:00
Oleksii Kliukin	31d7426327	ClusterTeamName -> ClusterName. Add a TODO item.	2017-05-12 11:41:27 +02:00
Oleksii Kliukin	814f75f7c1	Formatting changes	2017-05-12 11:41:27 +02:00
Oleksii Kliukin	7529b84b93	Move all operator-related constants together.	2017-05-12 11:41:27 +02:00
Oleksii Kliukin	55dbacdfa6	Assign DNS name to the cluster. DNS name is generated from the team name and cluster name. Use "zalando.org/dnsname" service annotation that makes 'mate' service assign a CNAME to the load balancer name.	2017-05-12 11:41:27 +02:00
Oleksii Kliukin	45fcb2adc9	Assign SUPERUSER to human users by default.	2017-05-12 11:41:27 +02:00
Murat Kabilov	34ac47aed9	Expose container 8080 port	2017-05-12 11:41:26 +02:00
Murat Kabilov	486c8ecb07	use neutral name for set cluster status function	2017-05-12 11:41:26 +02:00
Murat Kabilov	1c6e7ac2e7	loadBalancerSourceRanges update	2017-05-12 11:41:26 +02:00
Murat Kabilov	fc127069ab	remove unnecessary ControllerNamespace	2017-05-12 11:41:26 +02:00
Murat Kabilov	416dace289	get rid of arrays in the kuberesources; use shorter form of checking for errors	2017-05-12 11:41:26 +02:00
Oleksii Kliukin	abd313f2d9	Fix a missing colon.	2017-05-12 11:41:26 +02:00
Oleksii Kliukin	f65fab00dd	Fix a typo	2017-05-12 11:41:26 +02:00
Oleksii Kliukin	033c28f03a	Delete persistent volumes on deletion of the cluster.	2017-05-12 11:41:26 +02:00
Murat Kabilov	caa0eab19b	Move statefulset creation from cluster spec to the separate function	2017-05-12 11:41:25 +02:00
Oleksii Kliukin	776ed3fa0f	Simplify getting configuration.	2017-05-12 11:41:25 +02:00
Murat Kabilov	021eedb226	Fix resource already exists log messages	2017-05-12 11:41:25 +02:00
Oleksii Kliukin	a2e78ac2ec	Feature/persistent volumes	2017-05-12 11:41:25 +02:00
Murat Kabilov	ae77fa15e8	Pod Rolling update introduce Pod events channel; add parsing of the MaintenanceWindows section; skip deleting Etcd key on cluster delete; use external etcd host; watch for tpr/pods in the namespace of the operator pod only;	2017-05-12 11:41:25 +02:00
Murat Kabilov	dfde075c66	Use TPR object namespace while creating its objects	2017-05-12 11:37:09 +02:00
Murat Kabilov	6e2d64bd50	Create human users from teams api	2017-05-12 11:37:09 +02:00
Murat Kabilov	58506634c4	Create pg users	2017-05-12 11:37:09 +02:00
Murat Kabilov	7e4d0410c2	Use one secret per user	2017-05-12 11:37:09 +02:00
Murat Kabilov	abb1173035	Code refactor	2017-05-12 11:37:09 +02:00
Murat Kabilov	75e6bfa55c	makefile improvements	2017-05-12 11:37:07 +02:00
Oleksii Kliukin	e96f8a80ee	Option to run the operator out of cluster.	2017-05-08 12:10:27 +02:00
Oleksii Kliukin	b3a9516bae	Add a missing file.	2017-05-08 12:10:26 +02:00
Oleksii Kliukin	e5e0e3a148	Use camelCase.	2017-05-08 12:10:26 +02:00
Oleksii Kliukin	38bc9da25a	WIP: allow operator to run both in- and out- of cluster.	2017-05-08 12:10:26 +02:00
Murat Kabilov	5b5a64e55d	Check if etcd service has its port exposed	2017-05-08 12:10:26 +02:00
Murat Kabilov	d5a7683a38	some refactoring	2017-05-08 12:10:26 +02:00
Murat Kabilov	256ff37c19	refactor file tree structure	2017-05-08 12:10:25 +02:00

... 14 15 16 17 18

890 Commits