postgres-operator

Commit Graph

Author	SHA1	Message	Date
Oleksii Kliukin	da4b66210a	Expand variables from the PodEnvironmentConfigMap (#4 ) Inject PodEnvironmentConfigMap variables inline into the statefulset definition in order to be able to figure out changes to the statefulset when only PodEnvironmentConfigMap has changed.	2017-12-14 14:39:33 +01:00
Oleksii Kliukin	1c5451cd7d	Spelling fix.	2017-12-14 14:39:33 +01:00
Oleksii Kliukin	55dc12e512	Examine custom environment sources when syncing. When comparing statefulsets, make sure EnvFrom fields are compared as well.	2017-12-14 14:39:33 +01:00
Georg Kunz	e8d9c75949	Allow custom Postgres pod environment variables	2017-12-14 14:39:33 +01:00
Oleksii Kliukin	87bc47d8d0	Fixes for the case of re-creating the cluster after deletion. - make sure that the secrets for the system users (superuser, replication) are not deleted when the main cluster is. Therefore, we can re-create the cluster, potentially forcing Patroni to restore it from the backup and enable Patroni to connect, since it will use the old password, not the newly generated random one. - when syncing users, always check whether they are already in the DB. Previously, we did this only for the sync cluster case, but the new cluster could be actually the one restored from the backup by Patroni, having all or some of the users already in place. - delete endponts last. Patroni uses the $clustername endpoint in order to store the leader related metadata. If we remove it before removing all pods, one of those pods running Patroni will re-create it and the next attempt to create the cluster with the same name will stuble on the existing endpoint. - Use db.Exec instead of db.Query for queries that expect no result. This also fixes the issue with the DB creation, since we didn't release an empty Row object it was not possible to create more than one database for a cluster.	2017-12-13 16:49:00 +01:00
Oleksii Kliukin	1fb8cf7ea0	Avoid overwriting critical users. (#172 ) * Avoid overwriting critical users. Disallow defining new users either in the cluster manifest, teams API or infrastructure roles with the names mentioned in the new protected_role_names parameter (list of comma-separated names) Additionally, forbid defining a user with the name matching either super_username or replication_username, so that we don't overwrite system roles required for correct working of the operator itself. Also, clear PostgreSQL roles on each sync first in order to avoid using the old definitions that are no longer present in the current manifest, infrastructure roles secret or the teams API.	2017-12-05 14:27:12 +01:00
Oleksii Kliukin	022ce29314	Make an error message more verbose.	2017-12-04 10:49:25 +01:00
Oleksii Kliukin	637921cdee	Tests for initHumanUsers and initinitRobotUsers. Change the Cluster class in the process to implelement Teams API calls and Oauth token fetches as interfaces, so that we can mock them in the tests.	2017-12-04 10:49:25 +01:00
Oleksii Kliukin	611cfe96d6	Fix an issue when not assigning the merge result. Add some tests.	2017-12-04 10:49:25 +01:00
Oleksii Kliukin	831ebb1f32	Fix the error reporting.	2017-12-04 10:49:25 +01:00
Oleksii Kliukin	2e226dee26	Avoid overwriting infrastrure roles. When a role is defined in the infrastructure roles and the cluster manifest use the infrastructure role definition and add flags defined in the manifest. Previously the role has been overwritten by the definition from the manifest. Because a random password is generated for each role from the manifest the applications relying on the infrastructure role credentials from the infrastructure roles secret were unable to connect.	2017-12-04 10:49:25 +01:00
Oleksii Kliukin	dd0affc390	Tweak our reaction to the cluster upgrade process. Previously, the operator started to move the pods off the nodes to be decomissioned by watching the eol_node_label value. Every new postgres pod has been created with the anti-affinity to that label, making sure that the pods being moved won't land on another to be decomissioned node. The changes introduce another label that indicates the ready node. The new pod affinity will esnure that the pod is only scheduled to the node marked as ready, discarding the previous anti-affinity. That way the nodes can transition from the pending-decomission to the other statuses (drained, terminating) without having pods suddently scaled to them. In addition, rename the label that triggers the start of the upgrade process to node_eol_label (for consistency with node_readiness_label) and set its default vvalue to lifecycle-status:pending-decomission.	2017-11-30 14:11:49 +01:00
Oleksii Kliukin	1ffe98ba9f	Fix the connection leak and user options sync. - fix the lack of closing the cursor for the query that returned no rows. - fix syncing of the user options, as previously those were not fetched from the database.	2017-11-27 16:46:34 +01:00
Oleksii Kliukin	975b21f633	Rename api roles configuration parameter. Change api_roles_configuration to team_api_role_configuration	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	2352fc9a39	go fmt run	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	415a7fdc4d	Allow global configuration options for API roles. Add options to the PgUser structure, potentially allowing to set per-role options in the cluster definition as well. Introduce api_roles_configuration operator option with the default of log_statement=all	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	6dcd074ea0	Allow per-cluster setting of a docker image. Add dockerImage cluster configuration parameter that overrides global operator defaults when set to a non-empty value.	2017-11-14 11:53:04 +01:00
Oleksii Kliukin	c25e849fe4	Fix a failure to create new statefulset at sync. Also do a fmt run.	2017-11-08 18:24:17 +01:00
Murat Kabilov	86803406db	use sync methods while updating the cluster	2017-11-03 12:00:43 +01:00
Georg Kunz	47dd766fa7	Add node toleration config to PodSpec (#151 ) * Add node toleration config to PodSpec This allows to taint nodes dedicated to Postgres and prevents other pods from running on these nodes. * Document taint and toleration setup And remove setting from default operator ConfigMap * Allow to overwrite tolerations with Postgres manifest	2017-11-02 19:10:44 +01:00
Oleksii Kliukin	ce960e892a	Create new databases and change owners of existing ones during sync. (#153 ) * Create new databases and change owners of existing ones during sync.	2017-11-02 17:46:33 +01:00
Oleksii Kliukin	7a76be7d3e	Minor fixes around PDB (pod-distruption-budget) syncing: (#147 ) - Call comparison function in the case of the sync as well as for update - Include full cluster name in PDB name - Assign cluster labels to the PDB object	2017-10-23 12:26:59 +02:00
Murat Kabilov	c17aabb642	fix pod disruption budget labels (#146 )	2017-10-20 15:01:51 +02:00
Murat Kabilov	661b141849	Fix Pod Disruption Budget null pointer exception	2017-10-20 11:43:50 +02:00
Murat Kabilov	a1deae198b	add missing master matchLabel for the PDB (#144 )	2017-10-20 11:26:40 +02:00
Oleksii Kliukin	eba23279c8	Kube cluster upgrade	2017-10-19 10:49:42 +02:00
Oleksii Kliukin	1dbf259c76	Retry opening DB connections. (#140 ) Make sure DB connection retry also reopens a connection after closing it	2017-10-18 16:28:00 +02:00
Oleksii Kliukin	99870d8eac	Fix division by zero when connecting to the DB. Apparently the retry function's first parameter is the duration of a single attempt and it cannot be zero.	2017-10-18 10:44:49 +02:00
Murat Kabilov	202f2de988	Retry connecting to pg	2017-10-17 17:03:50 +02:00
Murat Kabilov	6c4cb4e9da	Perform manual failover during the scale down	2017-10-16 17:41:23 +02:00
Murat Kabilov	5b29576a8e	Remove redundant constants	2017-10-16 15:52:48 +02:00
Murat Kabilov	3b32265258	Set status of the cluster on sync fail/success	2017-10-12 15:10:42 +02:00
Jan Mussler	cec695d48e	Superuser toggle for team members Make superuser toggleable for team members. Add and "admin" role to team members if superuser is disabled.	2017-10-12 15:01:54 +02:00
Murat Kabilov	8d5faaa5a5	return idle status when worker has nothing to do	2017-10-11 15:42:20 +02:00
Oleksii Kliukin	793defef72	Fix pod wait timeouts. Previously, a timer had been reset on every message received through the pod channel.	2017-10-11 14:58:37 +02:00
Murat Kabilov	83c8d6c419	Extend diagnostic api with worker status info	2017-10-11 12:26:09 +02:00
Murat Kabilov	71a540ff48	Merge branch 'master' into crd	2017-10-09 11:55:18 +02:00
Murat Kabilov	a35e9c6119	move from tpr to crd	2017-10-06 15:12:08 +02:00
Murat Kabilov	3b8c06416e	skip manual failover for 1-pod clusters	2017-10-05 13:30:15 +03:00
Jan Mussler	c4af0ac6a6	Update cluster.go	2017-10-05 10:58:23 +02:00
Jan M	4a1170855a	Adding '_' to allowed chars.	2017-10-05 10:53:19 +02:00
Murat Kabilov	48ec6b35b9	perform manual failover on pg cluster rolling upgrade	2017-10-04 16:56:47 +03:00
Murat Kabilov	00194d0130	create dbs on cluster create	2017-10-04 16:24:27 +03:00
Murat Kabilov	5cfdabb63e	fix regexp for api endpoint urls	2017-09-28 12:00:40 +02:00
Murat Kabilov	be8bf22c00	add missing return	2017-09-28 11:23:56 +02:00
Murat Kabilov	93d4bf2b55	Merge branch 'master' into api-improvements	2017-09-26 14:47:13 +02:00
Murat Kabilov	19de2a24b7	go lint	2017-09-26 13:44:30 +02:00
Murat Kabilov	d876f4d88e	set secret name template via config map	2017-09-18 14:25:09 +02:00
Oleksii Kliukin	7667847bfe	Feature/validate role options (#101 ) Be more rigorous about validating user flags. Only accept CREATE ROLE flags that doesn't have any params (i.e. not ADMIN or CONNECTION LIMIT). Check that both flag and NOflag are not used at the same time.	2017-09-15 13:57:48 +02:00
Murat Kabilov	969a06f521	Use DCS_ENABLE_KUBERNETES_API=true environment to enable kubernetes native deployment	2017-09-14 11:39:49 +02:00
Murat Kabilov	8430ee86c9	add comments on roles	2017-09-11 17:44:32 +02:00
Murat Kabilov	90b49a24ba	make postgresql roles public	2017-09-11 17:44:32 +02:00
Oleksii Kliukin	8b85935a7a	Allow cloning clusters from the operator. (#90 ) Allow cloning clusters from the operator. The changes add a new JSON node `clone` with possible values `cluster` and `timestamp`. `cluster` is mandatory, and setting a non-empty `timestamp` triggers wal-e point in time recovery. Spilo and Patroni do the whole heavy-lifting, the operator just defines certain variables and gathers some data about how to connect to the host to clone or the target S3 bucket. As a minor change, set the image pull policy to IfNotPresent instead of Always to simplify local testing. Change the default replication username to standby.	2017-09-08 16:47:03 +02:00
Oleksii Kliukin	a0a9e8f849	Feature/configure replication role (#97 ) Configure superuser and replication usernames	2017-09-07 10:12:34 +02:00
Murat Kabilov	39c123e96a	fetch cluster resources by name, not by label selectors	2017-09-04 18:03:54 +02:00
Murat Kabilov	8aa11ecee2	Add patroni api client	2017-08-30 16:01:18 +02:00
Murat Kabilov	899c0bef45	Use warningf instead of warnf	2017-08-30 14:35:56 +02:00
Murat Kabilov	f44c8e1206	Make pod termination grace period configurable	2017-08-18 16:52:19 +02:00
Murat Kabilov	71dfb33b2b	make pod termination grace period configurable	2017-08-18 16:38:25 +02:00
Murat Kabilov	5967837875	pass the name of the status in the log message on set cluster status failure	2017-08-17 12:18:53 +02:00
Murat Kabilov	d2828e5ece	remove var shading; fix imports	2017-08-15 15:59:10 +02:00
Murat Kabilov	272d7e1bcf	rename service field to services as it contains service per role	2017-08-15 15:55:56 +02:00
Murat Kabilov	82f58b57d8	add cluster and controller methods for getting status	2017-08-15 12:11:06 +02:00
Murat Kabilov	5470f20be4	always pass a cluster name as a logger field	2017-08-15 10:29:18 +02:00
Murat Kabilov	e26db66cb5	start all the log messages with lowercase letters	2017-08-15 10:12:36 +02:00
Oleksii Kliukin	87a379f663	Avoid reusing closed DB connection. (#79 ) Set DB connection to nil upon closing it.	2017-08-10 18:19:35 +02:00
Oleksii Kliukin	f15f93f479	Bugfix/close db connections (#78 ) Open and close DB connections on-demand. Previously, we used to leave the DB connection open while the cluster was registered with the operator, potentially resutling in dangled connections if the operator terminates abnormally. Small refactoring around the role syncing code.	2017-08-10 10:10:00 +02:00
Oleksii Kliukin	a8b5b77cc4	Fix missing labels in the replica service selector.	2017-08-02 17:46:24 +02:00
Murat Kabilov	cf663cb841	Fix golint warnings	2017-08-01 16:08:56 +02:00
Murat Kabilov	1f8b37f33d	Make use of kubernetes client-go v4 * client-go v4.0.0-beta0 * remove unnecessary methods for tpr object * rest client: use interface instead of structure pointer * proper names for constants; some clean up for log messages * remove teams api client from controller and make it per cluster	2017-07-25 15:25:17 +02:00
Oleksii Kliukin	4455f1b639	Feature/unit tests (#53 ) - Avoid relying on Clientset structure to call Kubernetes API functions. While Clientset is a convinient "catch-all" abstraction for calling REST API related to different Kubernetes objects, it's impossible to mock. Replacing it wih the kubernetes.Interface would be quite straightforward, but would require an exra level of mocked interfaces, because of the versioning. Instead, a new interface is defined, which contains only the objects we need of the pre-defined versions. - Move KubernetesClient to k8sutil package. - Add more tests.	2017-07-24 16:56:46 +02:00
Oleksii Kliukin	a8ed1e25b4	Avoid re-creating master pod if it is empty during sync. (#58 ) Fixes #59	2017-07-12 10:57:20 +02:00
Oleksii Kliukin	00150711e4	Configure load balancer on a per-cluster and operator-wide level (#57 ) * Deny all requests to the load balancer by default. * Operator-wide toggle for the load-balancer. * Define per-cluster useLoadBalancer option. If useLoadBalancer is not set - then operator-wide defaults take place. If it is true - the load balancer is created, otherwise a service type clusterIP is created. Internally, we have to completely replace the service if the service type changes. We cannot patch, since some fields from the old service that will remain after patch are incompatible with the new one, and handling them explicitly when updating the service is ugly and error-prone. We cannot update the service because of the immutable fields, that leaves us the only option of deleting the old service and creating the new one. Unfortunately, there is still an issue of unnecessary removal of endpoints associated with the service, it will be addressed in future commits. * Revert the unintended effect of go fmt * Recreate endpoints on service update. When the service type is changed, the service is deleted and then the one with the new type is created. Unfortnately, endpoints are deleted as well. Re-create them afterwards, preserving the original addresses stored in them. * Improve error messages and comments. Use generate instead of gen in names.	2017-06-30 13:38:49 +02:00
Oleksii Kliukin	987990fb0e	Move service annotation patch template into the constants.	2017-06-12 10:24:23 +02:00
Oleksii Kliukin	17826ee434	Go fmt run.	2017-06-12 10:24:23 +02:00
Oleksii Kliukin	51d73fb172	Replace service annotations when updating services. In case the whole annotation changes (like the external DNS) we don't want to keep the old one hanging around. Unline specs, we don't expect anyone except the operator to change the annotations. Use StrategicMergePatchType in order to replace the annotations map completely.	2017-06-12 10:24:23 +02:00
Murat Kabilov	1540a2ba65	fix typos; remove unnecessary tests; go fmt -s	2017-06-08 15:52:01 +02:00
Oleksii Kliukin	bc0e9ab4bc	Add error checks per report from errcheck-ng	2017-06-08 10:41:44 +02:00
Murat Kabilov	292a9bda05	Check for dns annotation of the service	2017-06-07 16:41:39 +02:00
Oleksii Kliukin	dc36c4ca12	Implement replicaLoadBalancer boolean flag. (#38 ) The flag adds a replica service with the name cluster_name-repl and a DNS name that defaults to {cluster}-repl.{team}.{hostedzone}. The implementation converted Service field of the cluster into a map with one or two elements and deals with the cases when the new flag is changed on a running cluster (the update and the sync should create or delete the replica service). In order to pick up master and replica service and master endpoint when listing cluster resources. * Update the spec when updating the cluster.	2017-06-07 13:54:17 +02:00
Oleksii Kliukin	7b0ca31bfb	Implements EBS volume resizing #35 . In order to support volumes different from EBS and filesystems other than EXT2/3/4 the respective code parts were implemented as interfaces. Adding the new resize for the volume or the filesystem will require implementing the interface, but no other changes in the cluster code itself. Volume resizing first changes the EBS and the filesystem, and only afterwards is reflected in the Kubernetes "PersistentVolume" object. This is done deliberately to be able to check if the volume needs resizing by peeking at the Size of the PersistentVolume structure. We recheck, nevertheless, in the EBSVolumeResizer, whether the actual EBS volume size doesn't match the spec, since call to the AWS ModifyVolume is counted against the resize limit of once every 6 hours, even for those calls that shouldn't result in an actual resize (i.e. when the size matches the one for the running volume). As a collateral, split the constants into multiple files, move the volume code into a separate file and fix minor issues related to the error reporting.	2017-06-06 13:53:27 +02:00
Murat Kabilov	1fb05212a9	Refactor teams API package	2017-05-30 10:14:30 +02:00
Murat Kabilov	009db16c7c	Use queues for the pod events (#30 )	2017-05-23 15:24:14 +02:00
Oleksii Kliukin	afce38f6f0	Fix error messages (#27 ) Use lowercase for kubernetes objects Use %v instead of %s for errors Start error messages with a lowercase letter.	2017-05-22 14:12:06 +02:00
Oleksii Kliukin	8beb5936b1	Don't error out at sync on existence of the object. (#26 )	2017-05-22 12:58:47 +02:00
Murat Kabilov	4acaf27a5d	Remove etcd requests (#25 ) update glide	2017-05-19 17:18:37 +02:00
Murat Kabilov	d34273543e	Fix the golint, gosimple warnings	2017-05-18 17:38:54 +02:00
Murat Kabilov	233e8529c1	Return error instead of logging it	2017-05-18 17:24:44 +02:00
Murat Kabilov	95a57d1e4f	Use named arguments in the DNS name format	2017-05-18 17:23:59 +02:00
Murat Kabilov	3b6454c2dc	add missed return (#20 )	2017-05-17 11:54:50 +02:00
Oleksii Kliukin	c2826b10e2	Merge branch 'master' into fix/go-vet-fixes	2017-05-17 11:30:07 +02:00
Oleksii Kliukin	4457ce4e47	Replace the statefulset if it cannot be updated. (#18 ) Updates to statefulset spec for fields other than 'replicas' and containers' are forbidden. However, it is possible to delete the old statefulset without deleting its pods and create the new one, using the changed specs. The new statefulset shall pick up the orphaned pods. Change the statefulset's comparison to return the combined effect of all checks, not just the first non-matching field.	2017-05-17 11:28:21 +02:00
Murat Kabilov	6e5d7abcc5	pass cluster by reference	2017-05-17 11:05:15 +02:00
Murat Kabilov	356be8f0f1	skip clusters with invalid spec	2017-05-16 16:46:37 +02:00
Oleksii Kliukin	5adceceb36	go fmt run	2017-05-12 17:48:25 +02:00
Oleksii Kliukin	03064637f1	Allow disabling access to the DB and the Teams API. Command-line options --nodatabaseaccess and --noteamsapi disable all teams api interaction and access to the Postgres database. This is useful for debugging purposes when the operator runs out of cluster (with --outofcluster flag). The same effect can be achieved by setting enable_db_access and/or enable_teams_api to false.	2017-05-12 17:40:48 +02:00
Murat Kabilov	92d7fbf372	replace github.bus.zalan.do with github.cm/zalando-incubator	2017-05-12 11:50:16 +02:00
Murat Kabilov	18700b9ef7	Optimize template constant	2017-05-12 11:41:36 +02:00
Oleksii Kliukin	ec3f24c3ee	Honor the "spec-by-example" manifest we have. No longer ignore custom PostgreSQL and Patroni parameters and initdb options. Since all Patroni parameters that are not under initdb or pg_hba are specified as a plain map, there is no way to distinguish those that should go into the bootstrap section from those that should stay in the local configuration. As the example used only bootstrap parameters, currently all such options go into the bootstrap section. Also the initdb options are repsented as a map, while Patroni initdb options are a list of either maps or strings (i.e. "data-checksums" doesn't need an argument). For now, there is a work-around, but in the future we might consider changing the spec.	2017-05-12 11:41:36 +02:00
Oleksii Kliukin	6983f444ed	Periodically sync roles with the running clusters. (#102 ) The sync adds or alters database roles based on the roles defined in the cluster's TPR, Team API and operator's infrastructure roles. At the moment, roles are not deleted, as it would be dangerous for the robot roles in case TPR is misconfigured. In addition, ALTER ROLE does not remove role options, i.e. SUPERUSER or CREATEROLE, neither it removes role membership: only new options are added and new role membership is granted. So far, options like NOSUPERUSER and NOCREATEROLE won't be handed correctly, when mixed with the non-negative counterparts, also NOLOGIN should be processed correctly. The code assumes that only MD5 passwords are stored in the DB and will likely break with the new SCRAM auth in PostgreSQL 10. On the implementation side, create the new interface to abstract roles merge and creation, move most of the role-based functionality from cluster/pg into the new 'users' module, strip create user code of special cases related to human-based users (moving them to init instead) and fixed the password md5 generator to avoid processing already encrypted passwords. In addition, moved the system roles off the slice containing all other roles in order to avoid extra efforts to avoid creating them. Also, fix a leak in DB connections when the new connection is not considered healthy and discarded without being closed. Initialize the database during the sync phase before syncing users.	2017-05-12 11:41:35 +02:00
Oleksii Kliukin	9f9a89185f	Do rolling update after creating of a statefulset if pods were present. (#110 ) Make sure we always re-create pods if we had to create the statefulset, even if the pods from the old statefulset were already there.	2017-05-12 11:41:35 +02:00
Oleksii Kliukin	49cb395aed	Set ELB timeout annotation for the service. (#114 ) By default the ELB terminates the idle connection after 60 seconds. Increase this interval to a more reasonable one of 1 h.	2017-05-12 11:41:35 +02:00
Murat Kabilov	2370659c69	Parallel cluster processing Run operations concerning multiple clusters in parallel. Each cluster gets its own worker in order to create, update, sync or delete clusters. Each worker acquires the lock on a cluster. Subsequent operations on the same cluster have to wait until the current one finishes. There is a pool of parallel workers, configurable with the `workers` parameter in the configmap and set by default to 4. The cluster-related tasks are assigned to the workers based on a cluster name: the tasks for the same cluster will be always assigned to the same worker. There is no blocking between workers, although there is a chance that a single worker will become a bottleneck if too many clusters are assigned to it; therefore, for large-scale deployments it might be necessary to bump up workers from the default value.	2017-05-12 11:41:35 +02:00
Oleksii Kliukin	1c4bce86df	Avoid "bulk-comparing" pod resources during sync. (#109 ) * Avoid "bulk-comparing" pod resources during sync. First attempt to fix bogus restarts due to the reported mismatch of container resources where one of the resources is an empty struct, while the other has all fields set to nil. In addition, add an ability to set limits and requests per pod, as well as the operator-level defaults.	2017-05-12 11:41:35 +02:00
Murat Kabilov	9b0d0d487c	Use PATCH while updating Services and StatefulSets	2017-05-12 11:41:34 +02:00
Murat Kabilov	a7c57874d5	Do not create roles if cluster is masterless fix pod deletion	2017-05-12 11:41:34 +02:00
Murat Kabilov	da438aab3a	Use ConfigMap to store operator's config	2017-05-12 11:41:34 +02:00
Oleksii Kliukin	47e3e29a56	Add version label to the cluster. (#96 ) * Add version label to the cluster. According to the STUPS team the daemon that exports logs to scalyr stops the export if the version label is missing. * Move label names to constants. * Run go fmt	2017-05-12 11:41:34 +02:00
Murat Kabilov	08c0e3b6dd	Use unified type for the namespaced object names	2017-05-12 11:41:34 +02:00
Oleksii Kliukin	71b93b4cc2	Feature/infrastructure roles (#91 ) * Add infrastructure roles configured globally. Those are the roles defined in the operator itself. The operator's configuration refers to the secret containing role names, passwords and membership information. While they are referred to as roles, in reality those are users. In addition, improve the regex to filter out invalid users and make sure user secret names are compatible with DNS name spec. Add an example manifest for the infrastructure roles.	2017-05-12 11:41:33 +02:00
Murat Kabilov	b8fba429df	typo in service name	2017-05-12 11:41:33 +02:00
Murat Kabilov	3bd9b3b42f	typo in config name	2017-05-12 11:41:33 +02:00
Murat Kabilov	16cc517106	Add name for the service port	2017-05-12 11:41:33 +02:00
Murat Kabilov	dd2ed5ff9d	Add team name to tpr object metadata name	2017-05-12 11:41:33 +02:00
Murat Kabilov	db53134cbd	Skip syncing Pods	2017-05-12 11:41:33 +02:00
Murat Kabilov	655f6dcadb	make cluster resources private	2017-05-12 11:41:33 +02:00
Murat Kabilov	101dc06acb	Better logging for teams api calls	2017-05-12 11:41:32 +02:00
Murat Kabilov	322676a6b9	Skip deleting Pods and PVCs if failed to delete StatefulSet	2017-05-12 11:41:32 +02:00
Murat Kabilov	bb4fec25ae	Fix deletion of the failed cluster; more debug messages	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	3b99ce3d2e	Improve the diff in cluster resources. - Use the branch of pretty with this feature fixed: https://github.com/kr/pretty/pull/42 - Add the Limit to the resources declaration to avoid dummy differences between statefulsets (where both Resource structures are empty, but in one case the fields are not mentioned, while in another they are assigned to empty values).	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	455f91128f	Move master/replica role names into the constants.	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	176c6e8b19	Avoid passing the role into the recreatePod. Conceptually, the operator's task is just to change the pod. As it has no influence over the role the pod will take (either the master or a replica), it shouldn't wait for the specific role. This fixes at least one issue, where the pod running in a single-pod cluster has been waited for forever by the operator expecting it to have a wrong role (since Patroni callback assiging it the original replica role has been killed after a quick promote by the next callback.)	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	8e658174e8	Fix the issue with calling a non-existent function.	2017-05-12 11:41:31 +02:00
Murat Kabilov	d4bb72989a	Warn if etcd key for the new cluster already exist	2017-05-12 11:41:31 +02:00
Oleksii Kliukin	a5f0ef10d0	go fmt run	2017-05-12 11:41:31 +02:00
Oleksii Kliukin	7841b85892	Add configuration to support running WAL-E. - Set WAL_S3_BUCKET to point WAL-E where to fetch/store WAL files - Set annotations/iam.amazonaws.com/role to set the role to access AWS" The new env vairables are PGOP_WAL_S3_BUCKET and PGOP_KUBE_IAM_ROLE.	2017-05-12 11:41:31 +02:00
Murat Kabilov	852c5beae5	Check etcd key availability for the new cluster	2017-05-12 11:41:31 +02:00
Oleksii Kliukin	04ed22f73f	Remove unnecessary initializations.	2017-05-12 11:41:31 +02:00
Murat Kabilov	ee83e196a9	Fix secrets sync * log if secret already exists	2017-05-12 11:41:30 +02:00
Oleksii Kliukin	8268b07ad2	Set logger level per package instead of doing this globally	2017-05-12 11:41:30 +02:00
Oleksii Kliukin	8db44d6f18	Avoid unnecessary marshaling.	2017-05-12 11:41:30 +02:00
Oleksii Kliukin	ba8e8d1857	Avoid showing objects alongside diffs. That reduces the amount of clutter in the debug output. Run go fmt on the sources.	2017-05-12 11:41:30 +02:00
Oleksii Kliukin	3a4c6268be	Increase log verbosity, namely for object updates. - add a new environment variable for triggering debug log level - show both new, old object and diff during syncs and updates - use pretty package to pretty-print go structures -	2017-05-12 11:41:29 +02:00
Oleksii Kliukin	19dfa0c2b0	Run pod in a privileged securityContext. This is necessary in order to access devices from the pod. The target is to be able to run resize2fs on a pod itself.	2017-05-12 11:41:29 +02:00
Murat Kabilov	c2d2a67ad5	Get config from environment variables; ignore pg major version change; get rid of resources package;	2017-05-12 11:41:29 +02:00
Murat Kabilov	79a6726d4d	Increase logging verbosity, restructure code	2017-05-12 11:41:28 +02:00
Murat Kabilov	3aaa05fb96	Use encrypted passwords while creating robot users	2017-05-12 11:41:28 +02:00
Oleksii Kliukin	48ba6adf8a	Avoid calling Team API with an expired token. Previously, the controller fetched the Oauth token once at start, so eventually the token would expire and the operator could not create new users. This commit makes the operator fetch the token before each call to the Teams API.	2017-05-12 11:41:28 +02:00
Murat Kabilov	b6e6308bdc	wait for the pods from the previous rolling update	2017-05-12 11:41:28 +02:00
Murat Kabilov	bbdc2f52a9	fix resource load and list	2017-05-12 11:41:28 +02:00
Murat Kabilov	6f7399b36f	Sync clusters states * move statefulset creation from cluster spec to the separate function * sync cluster state with desired state; * move out from arrays for cluster resources; * recreate pods instead of deleting them in case of statefulset change * check for master while creating cluster/updating pods * simplify retryutil * list pvc while listing resources * name kubernetes resources with capital letter * do rolling update in case of env variables change	2017-05-12 11:41:27 +02:00
Oleksii Kliukin	1377724b2e	Fix a compliation error.	2017-05-12 11:41:27 +02:00
Oleksii Kliukin	31d7426327	ClusterTeamName -> ClusterName. Add a TODO item.	2017-05-12 11:41:27 +02:00
Oleksii Kliukin	55dbacdfa6	Assign DNS name to the cluster. DNS name is generated from the team name and cluster name. Use "zalando.org/dnsname" service annotation that makes 'mate' service assign a CNAME to the load balancer name.	2017-05-12 11:41:27 +02:00
Oleksii Kliukin	45fcb2adc9	Assign SUPERUSER to human users by default.	2017-05-12 11:41:27 +02:00
Murat Kabilov	486c8ecb07	use neutral name for set cluster status function	2017-05-12 11:41:26 +02:00
Murat Kabilov	1c6e7ac2e7	loadBalancerSourceRanges update	2017-05-12 11:41:26 +02:00
Murat Kabilov	fc127069ab	remove unnecessary ControllerNamespace	2017-05-12 11:41:26 +02:00
Murat Kabilov	416dace289	get rid of arrays in the kuberesources; use shorter form of checking for errors	2017-05-12 11:41:26 +02:00
Oleksii Kliukin	abd313f2d9	Fix a missing colon.	2017-05-12 11:41:26 +02:00
Oleksii Kliukin	f65fab00dd	Fix a typo	2017-05-12 11:41:26 +02:00
Oleksii Kliukin	033c28f03a	Delete persistent volumes on deletion of the cluster.	2017-05-12 11:41:26 +02:00
Murat Kabilov	caa0eab19b	Move statefulset creation from cluster spec to the separate function	2017-05-12 11:41:25 +02:00
Murat Kabilov	021eedb226	Fix resource already exists log messages	2017-05-12 11:41:25 +02:00
Oleksii Kliukin	a2e78ac2ec	Feature/persistent volumes	2017-05-12 11:41:25 +02:00
Murat Kabilov	ae77fa15e8	Pod Rolling update introduce Pod events channel; add parsing of the MaintenanceWindows section; skip deleting Etcd key on cluster delete; use external etcd host; watch for tpr/pods in the namespace of the operator pod only;	2017-05-12 11:41:25 +02:00
Murat Kabilov	dfde075c66	Use TPR object namespace while creating its objects	2017-05-12 11:37:09 +02:00
Murat Kabilov	6e2d64bd50	Create human users from teams api	2017-05-12 11:37:09 +02:00
Murat Kabilov	58506634c4	Create pg users	2017-05-12 11:37:09 +02:00
Murat Kabilov	7e4d0410c2	Use one secret per user	2017-05-12 11:37:09 +02:00
Murat Kabilov	abb1173035	Code refactor	2017-05-12 11:37:09 +02:00

... 2 3 4 5 6 ...

311 Commits