postgres-operator

Commit Graph

Author	SHA1	Message	Date
Oleksii Kliukin	cca73e30b7	Make code around recreating pods and creating objects in the database less brittle (#213 ) There used to be a masterLess flag that was supposed to indicate whether the cluster it belongs to runs without the acting master by design. At some point, as we didn't really have support for such clusters, the flag has been misused to indicate there is no master in the cluster. However, that was not done consistently (a cluster without all pods running would never be masterless, even when the master is not among the running pods) and it was based on the wrong assumption that the masterless cluster will remain masterless until the next attempt to change that flag, ignoring the possibility of master coming up or some node doing a successful promotion. Therefore, this PR gets rid of that flag completely. When the cluster is running with 0 instances, there is obviously no master and it makes no sense to create any database objects inside the non-existing master. Therefore, this PR introduces an additional check for that. recreatePods were assuming that the roles of the pods recorded when the function has stared will not change; for instance, terminated replica pods should start as replicas. Revisit that assumption by looking at the actual role of the re-spawned pods; that avoids a failover if some replica has promoted to the master role while being re-spawned. In addition, if the failover from the old master was unsuccessful, we used to stop and leave the old master running on an old pod, without recording this fact anywhere. This PR makes the failover failure emit a warning, but not stop recreating the last master pod; in the worst case, the running master will be terminated, however, this case is rather unlikely one. As a side effect, make waitForPodLabel return the pod definition it waited for, avoiding extra API calls in recreatePods and movePodFromEndOfLifeNode	2018-02-22 10:42:05 +01:00
Oleksii Kliukin	23011bdf9a	Migrate only master pods. Migrate single masters. (#199 ) Avoid migrating replica pods, since they will be handled by the node draining anyway (the PDB specifies that only masters are to be kept). Allow migration of the single-pod clusters.	2018-01-09 11:55:11 +01:00
Oleksii Kliukin	8e99518eeb	Improve behavior on node decomissionining (#184 ) * Trigger the node migration on the lack of the readiness label. * Examine the node's readiness status on node add. Make sure we don't miss the not ready node, especially when the operator is killed during the migration.	2018-01-04 11:53:15 +01:00
Oleksii Kliukin	dd0affc390	Tweak our reaction to the cluster upgrade process. Previously, the operator started to move the pods off the nodes to be decomissioned by watching the eol_node_label value. Every new postgres pod has been created with the anti-affinity to that label, making sure that the pods being moved won't land on another to be decomissioned node. The changes introduce another label that indicates the ready node. The new pod affinity will esnure that the pod is only scheduled to the node marked as ready, discarding the previous anti-affinity. That way the nodes can transition from the pending-decomission to the other statuses (drained, terminating) without having pods suddently scaled to them. In addition, rename the label that triggers the start of the upgrade process to node_eol_label (for consistency with node_readiness_label) and set its default vvalue to lifecycle-status:pending-decomission.	2017-11-30 14:11:49 +01:00
Oleksii Kliukin	eba23279c8	Kube cluster upgrade	2017-10-19 10:49:42 +02:00
Murat Kabilov	6c4cb4e9da	Perform manual failover during the scale down	2017-10-16 17:41:23 +02:00
Murat Kabilov	5b29576a8e	Remove redundant constants	2017-10-16 15:52:48 +02:00
Murat Kabilov	83c8d6c419	Extend diagnostic api with worker status info	2017-10-11 12:26:09 +02:00
Murat Kabilov	3b8c06416e	skip manual failover for 1-pod clusters	2017-10-05 13:30:15 +03:00
Murat Kabilov	48ec6b35b9	perform manual failover on pg cluster rolling upgrade	2017-10-04 16:56:47 +03:00
Murat Kabilov	e26db66cb5	start all the log messages with lowercase letters	2017-08-15 10:12:36 +02:00
Murat Kabilov	1f8b37f33d	Make use of kubernetes client-go v4 * client-go v4.0.0-beta0 * remove unnecessary methods for tpr object * rest client: use interface instead of structure pointer * proper names for constants; some clean up for log messages * remove teams api client from controller and make it per cluster	2017-07-25 15:25:17 +02:00
Oleksii Kliukin	a8ed1e25b4	Avoid re-creating master pod if it is empty during sync. (#58 ) Fixes #59	2017-07-12 10:57:20 +02:00
Oleksii Kliukin	7b0ca31bfb	Implements EBS volume resizing #35 . In order to support volumes different from EBS and filesystems other than EXT2/3/4 the respective code parts were implemented as interfaces. Adding the new resize for the volume or the filesystem will require implementing the interface, but no other changes in the cluster code itself. Volume resizing first changes the EBS and the filesystem, and only afterwards is reflected in the Kubernetes "PersistentVolume" object. This is done deliberately to be able to check if the volume needs resizing by peeking at the Size of the PersistentVolume structure. We recheck, nevertheless, in the EBSVolumeResizer, whether the actual EBS volume size doesn't match the spec, since call to the AWS ModifyVolume is counted against the resize limit of once every 6 hours, even for those calls that shouldn't result in an actual resize (i.e. when the size matches the one for the running volume). As a collateral, split the constants into multiple files, move the volume code into a separate file and fix minor issues related to the error reporting.	2017-06-06 13:53:27 +02:00
Murat Kabilov	009db16c7c	Use queues for the pod events (#30 )	2017-05-23 15:24:14 +02:00
Oleksii Kliukin	afce38f6f0	Fix error messages (#27 ) Use lowercase for kubernetes objects Use %v instead of %s for errors Start error messages with a lowercase letter.	2017-05-22 14:12:06 +02:00
Murat Kabilov	d34273543e	Fix the golint, gosimple warnings	2017-05-18 17:38:54 +02:00
Murat Kabilov	92d7fbf372	replace github.bus.zalan.do with github.cm/zalando-incubator	2017-05-12 11:50:16 +02:00
Murat Kabilov	2370659c69	Parallel cluster processing Run operations concerning multiple clusters in parallel. Each cluster gets its own worker in order to create, update, sync or delete clusters. Each worker acquires the lock on a cluster. Subsequent operations on the same cluster have to wait until the current one finishes. There is a pool of parallel workers, configurable with the `workers` parameter in the configmap and set by default to 4. The cluster-related tasks are assigned to the workers based on a cluster name: the tasks for the same cluster will be always assigned to the same worker. There is no blocking between workers, although there is a chance that a single worker will become a bottleneck if too many clusters are assigned to it; therefore, for large-scale deployments it might be necessary to bump up workers from the default value.	2017-05-12 11:41:35 +02:00
Murat Kabilov	a7c57874d5	Do not create roles if cluster is masterless fix pod deletion	2017-05-12 11:41:34 +02:00
Murat Kabilov	da438aab3a	Use ConfigMap to store operator's config	2017-05-12 11:41:34 +02:00
Murat Kabilov	08c0e3b6dd	Use unified type for the namespaced object names	2017-05-12 11:41:34 +02:00
Murat Kabilov	db53134cbd	Skip syncing Pods	2017-05-12 11:41:33 +02:00
Murat Kabilov	322676a6b9	Skip deleting Pods and PVCs if failed to delete StatefulSet	2017-05-12 11:41:32 +02:00
Murat Kabilov	bb4fec25ae	Fix deletion of the failed cluster; more debug messages	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	455f91128f	Move master/replica role names into the constants.	2017-05-12 11:41:32 +02:00
Oleksii Kliukin	176c6e8b19	Avoid passing the role into the recreatePod. Conceptually, the operator's task is just to change the pod. As it has no influence over the role the pod will take (either the master or a replica), it shouldn't wait for the specific role. This fixes at least one issue, where the pod running in a single-pod cluster has been waited for forever by the operator expecting it to have a wrong role (since Patroni callback assiging it the original replica role has been killed after a quick promote by the next callback.)	2017-05-12 11:41:32 +02:00
Murat Kabilov	c2d2a67ad5	Get config from environment variables; ignore pg major version change; get rid of resources package;	2017-05-12 11:41:29 +02:00
Murat Kabilov	b6e6308bdc	wait for the pods from the previous rolling update	2017-05-12 11:41:28 +02:00
Murat Kabilov	6f7399b36f	Sync clusters states * move statefulset creation from cluster spec to the separate function * sync cluster state with desired state; * move out from arrays for cluster resources; * recreate pods instead of deleting them in case of statefulset change * check for master while creating cluster/updating pods * simplify retryutil * list pvc while listing resources * name kubernetes resources with capital letter * do rolling update in case of env variables change	2017-05-12 11:41:27 +02:00
Murat Kabilov	416dace289	get rid of arrays in the kuberesources; use shorter form of checking for errors	2017-05-12 11:41:26 +02:00
Oleksii Kliukin	033c28f03a	Delete persistent volumes on deletion of the cluster.	2017-05-12 11:41:26 +02:00
Murat Kabilov	ae77fa15e8	Pod Rolling update introduce Pod events channel; add parsing of the MaintenanceWindows section; skip deleting Etcd key on cluster delete; use external etcd host; watch for tpr/pods in the namespace of the operator pod only;	2017-05-12 11:41:25 +02:00

33 Commits