Commit Graph

72 Commits

Author SHA1 Message Date
Oleksii Kliukin 332dab5237 Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations 2018-05-08 14:51:10 +02:00
Oleksii Kliukin f41a42f922 Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations 2018-05-07 10:16:30 +02:00
Oleksii Kliukin ce0d4af91c Initial implementation for the statefulset annotations indicating rolling updates. 2018-05-07 08:07:37 +02:00
Oleksii Kliukin 1a20362c5b Initial implementation for the statefulset annotations indicating rolling updates. 2018-05-04 18:59:23 +02:00
Oleksii Kliukin 43a1db2128 Merge branch 'master' into pending_rolling_updates 2018-05-03 11:27:16 +02:00
Oleksii Kliukin fe47f9ebea
Improve the pod moving behavior during the Kubernetes cluster upgrade. (#281)
* Improve the pod moving behavior during the Kubernetes cluster upgrade.

Fix an issue of not waiting for at least one replica to become ready
(if the Statefulset indicates there are replicas) when moving the master
pod off the decomissioned node. Resolves the first part of #279.

Small fixes to error messages.

* Eliminate a race condition during the swithover.

When the operator initiates the failover (switchover) that fails and
then retries it for a second time it may happen that the previous
waitForPodChannel is still active. As a result, the operator subscribes
to the former master pod two times, causing a panic.

The problem was that the original code didn't bother to cancel the
waitForPodLalbel for the new master pod in the case when the failover
fails. This commit fixes it by adding a stop channel to that function.

Code review by @zerg-junior
2018-05-03 10:20:24 +02:00
Sergey Dudoladov bc8b950da4 Tolerate issues of the Teams API 2018-04-23 16:31:53 +02:00
Oleksii Kliukin 0618723a61 Check rolling updates using controller revisions.
Compare pods controller revisions with the one for the statefulset
to determine whether the pod is running the latest revision and,
therefore, no rolling update is necessary. This is performed only
during the operator start, afterwards the rolling update status
that is stored locally in the cluster structure is used for all
rolling update decisions.
2018-04-09 18:07:24 +02:00
Oleksii Kliukin 9bf80afa6b
Remove team from statefulset selector (#271)
* Remove 'team' label from the statefulset selector.

I was never supposed to be there, but implicitely statefulset
creates a selector out of meta.labels field. That is the problem
with recent Kubernetes, since statefulset cannot pick up pods
with non-matching label selectors, and we rely on statefulset
picking up old pods after statefulset replacement.

Make sure selector changes trigger replacement of the statefulset.

In the case new selector has more labels than the old one nothing
should be done with a statefulset, otherwise the new statefulset
won't see orphaned pods from the old one, as they won't match the
selector. 

See https://github.com/kubernetes/kubernetes/issues/46901#issuecomment-356418393
2018-04-06 13:58:47 +02:00
Sergey Dudoladov 071547e5bf Modify to add extra labels only during resource creation 2018-02-26 11:11:50 +01:00
Sergey Dudoladov 00dc810544 Add 'team' label to pods, stateful sets, secrets and pod disruption budgets 2018-02-23 14:36:10 +01:00
Oleksii Kliukin c4aab502b3
Remove Patroni leftover objects on cluster deletion. (#244)
* Remove all endpoints and configmaps from Patroni when Patroni is running with Kubernetes support on cluster deletion.
2018-02-23 09:52:22 +01:00
Dmitry Dolgov bf4b0f0f33
Merge pull request #240 from zalando-incubator/feature/goreport-improvements
Some improvements for golint, ineffassign and misspell
2018-02-22 11:31:08 +01:00
Oleksii Kliukin cca73e30b7
Make code around recreating pods and creating objects in the database less brittle (#213)
There used to be a masterLess flag that was supposed to indicate whether the cluster it belongs to runs without the acting master by design. At some point, as we didn't really have support for such clusters, the flag has been misused to indicate there is no master in the cluster. However, that was not done consistently (a cluster without all pods running would never be masterless, even when the master is not among the running pods) and it was based on the wrong assumption that the masterless cluster will remain masterless until the next attempt to change that flag, ignoring the possibility of master coming up or some node doing a successful promotion. Therefore, this PR gets rid of that flag completely.

When the cluster is running with 0 instances, there is obviously no master and it makes no sense to create any database objects inside the non-existing master. Therefore, this PR introduces an additional check for that.

recreatePods were assuming that the roles of the pods recorded when the function has stared will not change; for instance, terminated replica pods should start as replicas. Revisit that assumption by looking at the actual role of the re-spawned pods; that avoids a failover if some replica has promoted to the master role while being re-spawned. In addition, if the failover from the old master was unsuccessful, we used to stop and leave the old master running on an old pod, without recording this fact anywhere. This PR makes the failover failure emit a warning, but not stop recreating the last master pod; in the worst case, the running master will be terminated, however, this case is rather unlikely one.

As a side effect, make waitForPodLabel return the pod definition it waited for, avoiding extra API calls in recreatePods and movePodFromEndOfLifeNode
2018-02-22 10:42:05 +01:00
Dmitrii Dolgov a7cd859919 Some improvements for golint, ineffassign and misspell 2018-02-19 17:46:31 +01:00
Oleksii Kliukin 9720ac1f7e WIP: Hold the proper locks while examining the list of databases.
Introduce a new lock called specMu lock to protect the cluster spec.
This lock is held on update and sync, and when retrieving the spec in
the API code. There is no need to acquire it for cluster creation and
deletion: creation assigns the spec to the cluster before linking it to
the controller, and deletion just removes the cluster from the list in
the controller, both holding the global clustersMu Lock.
2017-12-22 13:06:11 +01:00
Oleksii Kliukin 1fb8cf7ea0
Avoid overwriting critical users. (#172)
* Avoid overwriting critical users.

Disallow defining new users either in the cluster manifest, teams
API or infrastructure roles with the names mentioned in the new
protected_role_names parameter (list of comma-separated names)

Additionally, forbid defining a user with the name matching either
super_username or replication_username, so that we don't overwrite
system roles required for correct working of the operator itself.

Also, clear PostgreSQL roles on each sync first in order to avoid using
the old definitions that are no longer present in the current manifest,
infrastructure roles secret or the teams API.
2017-12-05 14:27:12 +01:00
Oleksii Kliukin 637921cdee Tests for initHumanUsers and initinitRobotUsers.
Change the Cluster class in the process to implelement Teams API
calls and Oauth token fetches as interfaces, so that we can mock
them in the tests.
2017-12-04 10:49:25 +01:00
Oleksii Kliukin 611cfe96d6 Fix an issue when not assigning the merge result.
Add some tests.
2017-12-04 10:49:25 +01:00
Murat Kabilov 86803406db
use sync methods while updating the cluster 2017-11-03 12:00:43 +01:00
Oleksii Kliukin eba23279c8 Kube cluster upgrade 2017-10-19 10:49:42 +02:00
Murat Kabilov 5b29576a8e Remove redundant constants 2017-10-16 15:52:48 +02:00
Oleksii Kliukin 793defef72 Fix pod wait timeouts.
Previously, a timer had been reset on every message received through
the pod channel.
2017-10-11 14:58:37 +02:00
Murat Kabilov 83c8d6c419 Extend diagnostic api with worker status info 2017-10-11 12:26:09 +02:00
Murat Kabilov a35e9c6119 move from tpr to crd 2017-10-06 15:12:08 +02:00
Murat Kabilov 48ec6b35b9 perform manual failover on pg cluster rolling upgrade 2017-10-04 16:56:47 +03:00
Murat Kabilov d876f4d88e set secret name template via config map 2017-09-18 14:25:09 +02:00
Oleksii Kliukin 7667847bfe Feature/validate role options (#101)
Be more rigorous about validating user flags.

Only accept CREATE ROLE flags that doesn't have any params (i.e.
not ADMIN or CONNECTION LIMIT). Check that both flag and NOflag
are not used at the same time.
2017-09-15 13:57:48 +02:00
Murat Kabilov 90b49a24ba make postgresql roles public 2017-09-11 17:44:32 +02:00
Oleksii Kliukin 8b85935a7a Allow cloning clusters from the operator. (#90)
Allow cloning clusters from the operator.

The changes add a new JSON node `clone` with possible values `cluster`
and `timestamp`. `cluster` is mandatory, and setting a non-empty
`timestamp` triggers wal-e point in time recovery. Spilo and Patroni do
the whole heavy-lifting, the operator just defines certain variables and
gathers some data about how to connect to the host to clone or the
target S3 bucket.

As a minor change, set the image pull policy to IfNotPresent instead
of Always to simplify local testing.

Change the default replication username to standby.
2017-09-08 16:47:03 +02:00
Murat Kabilov d2828e5ece remove var shading; fix imports 2017-08-15 15:59:10 +02:00
Murat Kabilov e26db66cb5 start all the log messages with lowercase letters 2017-08-15 10:12:36 +02:00
Murat Kabilov cf663cb841 Fix golint warnings 2017-08-01 16:08:56 +02:00
Murat Kabilov 1f8b37f33d Make use of kubernetes client-go v4
* client-go v4.0.0-beta0
* remove unnecessary methods for tpr object
* rest client: use interface instead of structure pointer
* proper names for constants; some clean up for log messages
* remove teams api client from controller and make it per cluster
2017-07-25 15:25:17 +02:00
Oleksii Kliukin 00150711e4 Configure load balancer on a per-cluster and operator-wide level (#57)
* Deny all requests to the load balancer by default.
* Operator-wide toggle for the load-balancer.
* Define per-cluster useLoadBalancer option.

If useLoadBalancer is not set - then operator-wide defaults take place. If it
is true - the load balancer is created, otherwise a service type clusterIP is
created.

Internally, we have to completely replace the service if the service type
changes. We cannot patch, since some fields from the old service that will
remain after patch are incompatible with the new one, and handling them
explicitly when updating the service is ugly and error-prone. We cannot
update the service because of the immutable fields, that leaves us the only
option of deleting the old service and creating the new one. Unfortunately,
there is still an issue of unnecessary removal of endpoints associated with
the service, it will be addressed in future commits.

* Revert the unintended effect of go fmt

* Recreate endpoints on service update.

When the service type is changed, the service is deleted and then
the one with the new type is created. Unfortnately, endpoints are
deleted as well. Re-create them afterwards, preserving the original
addresses stored in them.

* Improve error messages and comments. Use generate instead of gen in names.
2017-06-30 13:38:49 +02:00
Oleksii Kliukin 987990fb0e Move service annotation patch template into the constants. 2017-06-12 10:24:23 +02:00
Oleksii Kliukin 17826ee434 Go fmt run. 2017-06-12 10:24:23 +02:00
Oleksii Kliukin 51d73fb172 Replace service annotations when updating services.
In case the whole annotation changes (like the external DNS) we
don't want to keep the old one hanging around. Unline specs, we
don't expect anyone except the operator to change the annotations.

Use StrategicMergePatchType in order to replace the annotations
map completely.
2017-06-12 10:24:23 +02:00
Oleksii Kliukin dc36c4ca12 Implement replicaLoadBalancer boolean flag. (#38)
The flag adds a replica service with the name cluster_name-repl and
a DNS name that defaults to {cluster}-repl.{team}.{hostedzone}.

The implementation converted Service field of the cluster into a map
with one or two elements and deals with the cases when the new flag
is changed on a running cluster
(the update and the sync should create or delete the replica service).
In order to pick up master and replica service and master endpoint
when listing cluster resources.

* Update the spec when updating the cluster.
2017-06-07 13:54:17 +02:00
Oleksii Kliukin 7b0ca31bfb Implements EBS volume resizing #35.
In order to support volumes different from EBS and filesystems other than EXT2/3/4 the respective code parts were implemented as interfaces. Adding the new resize for the volume or the filesystem will require implementing the interface, but no other changes in the cluster code itself.

Volume resizing first changes the EBS and the filesystem, and only afterwards is reflected in the Kubernetes "PersistentVolume" object. This is done deliberately to be able to check if the volume needs resizing by peeking at the Size of the PersistentVolume structure. We recheck, nevertheless, in the EBSVolumeResizer, whether the actual EBS volume size doesn't match the spec, since call to the AWS ModifyVolume is counted against the resize limit of once every 6 hours, even for those calls that shouldn't result in an actual resize (i.e. when the size matches the one for the running volume).

As a collateral, split the constants into multiple files, move the volume code into a separate file and fix minor issues related to the error reporting.
2017-06-06 13:53:27 +02:00
Murat Kabilov 1fb05212a9 Refactor teams API package 2017-05-30 10:14:30 +02:00
Oleksii Kliukin afce38f6f0 Fix error messages (#27)
Use lowercase for kubernetes objects
Use %v instead of %s for errors
Start error messages with a lowercase letter.
2017-05-22 14:12:06 +02:00
Murat Kabilov 4acaf27a5d Remove etcd requests (#25)
update glide
2017-05-19 17:18:37 +02:00
Murat Kabilov d34273543e Fix the golint, gosimple warnings 2017-05-18 17:38:54 +02:00
Murat Kabilov 95a57d1e4f Use named arguments in the DNS name format 2017-05-18 17:23:59 +02:00
Murat Kabilov 92d7fbf372 replace github.bus.zalan.do with github.cm/zalando-incubator 2017-05-12 11:50:16 +02:00
Murat Kabilov 18700b9ef7 Optimize template constant 2017-05-12 11:41:36 +02:00
Oleksii Kliukin 6983f444ed Periodically sync roles with the running clusters. (#102)
The sync adds or alters database roles based on the roles defined
in the cluster's TPR, Team API and operator's infrastructure roles.
At the moment, roles are not deleted, as it would be dangerous for
the robot roles in case TPR is misconfigured. In addition, ALTER
ROLE does not remove role options, i.e. SUPERUSER or CREATEROLE,
neither it removes role membership: only new options are added and
new role membership is granted. So far, options like NOSUPERUSER
and NOCREATEROLE won't be handed correctly, when mixed with the
non-negative counterparts, also NOLOGIN should be processed correctly.
The code assumes that only MD5 passwords are stored in the DB and
will likely break with the new SCRAM auth in PostgreSQL 10.

On the implementation side, create the new interface to abstract
roles merge and creation, move most of the role-based functionality
from cluster/pg into the new 'users' module, strip create user code
of special cases related to human-based users (moving them to init
instead) and fixed the password md5 generator to avoid processing
already encrypted passwords. In addition, moved the system roles
off the slice containing all other roles in order to avoid extra
efforts to avoid creating them.

Also, fix a leak in DB connections when the new connection is not
considered healthy and discarded without being closed. Initialize
the database during the sync phase before syncing users.
2017-05-12 11:41:35 +02:00
Murat Kabilov 2370659c69 Parallel cluster processing
Run operations concerning multiple clusters in parallel. Each cluster gets its
own worker in order to create, update, sync or delete clusters.  Each worker
acquires the lock on a cluster.  Subsequent operations on the same cluster
have to wait until the current one finishes.  There is a pool of parallel
workers, configurable with the `workers` parameter in the configmap and set by
default to 4. The cluster-related tasks  are assigned to the workers based on
a cluster name: the tasks for the same cluster will be always assigned to the
same worker. There is no blocking between workers, although there is a chance
that a single worker will become a bottleneck if too many clusters are
assigned to it; therefore, for large-scale deployments it might be necessary
to bump up workers from the default value.
2017-05-12 11:41:35 +02:00
Murat Kabilov 9b0d0d487c Use PATCH while updating Services and StatefulSets 2017-05-12 11:41:34 +02:00