Commit Graph

366 Commits

Author SHA1 Message Date
Oleksii Kliukin 04b660519a Fix exec into pods to resize volumes for multi-container pods.
The original code assumed only one container per pod.
2018-06-04 14:51:39 +02:00
Oleksii Kliukin 16a710a99a Avoid possible skipping SYNC events.
OB1 bug in the condition deciding whether to sync.
2018-05-31 18:29:15 +02:00
Oleksii Kliukin 48a5744314
Use Patroni API to set bootstrap-only options. (#299)
Call Patroni API /config in order to set special options that are
ignored when set in the configuration file, such as max_connections.
Per https://github.com/zalando-incubator/postgres-operator/issues/297

* Some minor refacoring:

Rename Cluster ManualFailover to Swithover
Rename Patroni Failover to Switchover
Add more details to error messages and comments introduced in this PR.

Review by @zerg-junior
2018-05-29 12:35:25 +02:00
Sergey Dudoladov 2e041c50e6 Bump up default Spilo image 2018-05-28 16:54:27 +02:00
zerg-junior 1352c4a5e2
Merge pull request #302 from zalando-incubator/fix-etcd-host-default
Fix etcd_host default
2018-05-24 17:17:20 +02:00
Manuel Gómez 32a1456a68
Update config.go 2018-05-24 16:58:46 +02:00
Sergey Dudoladov 749d723f55 Shorten the commen 2018-05-24 16:22:13 +02:00
Sergey Dudoladov 9824ddae5e Fix etcd_host default 2018-05-24 16:05:45 +02:00
Oleksii Kliukin 76ea754fc3 Be lenient when asked to shrink a persisten volume.
Do not hard error, emit a warning instead. The cluster is not going
to be broken because of our refusal to shrink a volume.
2018-05-24 11:17:42 +02:00
Oleksii Kliukin 1ea8b3bbe6 Fix a crash on node migration.
After an unsuccessful initial cluster sync it may happen that the
cluster statefulset is empty. This has been made more likely since
88d6a7be3, since it has introduced syncing volumes before statefulsets,
and the volume sync mail fail for different reasons (i.e. the volume has
been shrinked, or too many calls to Amazon).
2018-05-24 11:05:19 +02:00
Oleksii Kliukin e84ecb1d03 Address code review by @zerg-junior 2018-05-23 11:36:38 +02:00
Oleksii Kliukin f5550c337b Put special patroni parameters to the bootstrap.
Some special patroni postgresql parameters, like max_connections,
should reside in the bootstrap.dcs.postgresql.parameters section
to come into effect.
2018-05-22 18:27:12 +02:00
zerg-junior e6d12b3480
Merge pull request #295 from zalando-incubator/continue_on_delete_errors
Avoid terminating delete on errors.
2018-05-22 10:44:43 +02:00
Oleksii Kliukin 27c7245fed Avoid terminating delete on errors.
When there is an error happening upon deletion of the Kubernetes object
belonging to the cluster being removed, it makes no sense to abort the
deletion: the manifest will be removed anyway, therefore all the objects
after the one we aborted at will stay forever.
2018-05-18 18:10:37 +02:00
Oleksii Kliukin a8fdd3f2db Fix crash during sync.
Do not use statefulset number of pods to figure out running ones
for volume resizing, since the statefulset pointer could be nil.
Instead, look at the actual running pods.
2018-05-18 14:42:20 +02:00
Oleksii Kliukin 88d6a7be3f
Sync persistent volumes before statefulsets. (#293)
Avoid the condition of waiting for the pod that cannot start
PostgreSQL because it ran out of disk space.
2018-05-18 12:01:43 +02:00
Oleksii Kliukin 52ddcd25cc Sync persistent volumes before statefulsets.
Avoid the condition of waiting for the pod that cannot start
PostgreSQL because it ran out of disk space.
2018-05-18 11:43:45 +02:00
Oleksii Kliukin da4cc2705b Use deepcopy to propagate the spec to clusters.
Avoid sharing pointers to the same spec data between the informer
and the clusters. The only catch is that the error field is cleared
during deepcopy, since it is an interface that may contain private
fields that cannot be copied, however, the error is only used when
the manifest is parsed and before it is queued, therefore, we never
refer to that field in the cluster structure.
2018-05-17 16:05:12 +02:00
Oleksii Kliukin ebe50abccb
Make sure we never modify informer cached manifest. (#290)
987b434 introduced a new function that modifies the cluster spec in
memory before the cluster processes it. Unfortunately, the instance
being modified appeared to be the one stored internally in the
PostgresInformer, resulting in those modifications to be propagated with
futher cluster events and producing update loops in some occasions.

This commit makes sure we copy the spec before putting it into the
clusterEventQueues.
2018-05-16 18:23:31 +02:00
Oleksii Kliukin cf800aef90 Minor import fix 2018-05-15 16:53:12 +02:00
Oleksii Kliukin 11d568bf65 Address code review by @zerg-junior
- new info messages, rename the annotation flag.
2018-05-15 16:50:03 +02:00
Oleksii Kliukin 0c616a802f Merge branch 'master' into rolling_updates_with_statefulset_annotations
# Conflicts:
#	pkg/cluster/k8sres.go
2018-05-15 15:33:34 +02:00
Oleksii Kliukin 987b43456b
Deprecate old LB options, fix endpoint sync. (#287)
* Depreate old LB options, fix endpoint sync.

- deprecate useLoadBalancer, replicaLoadBalancer from the manifest
  and enable_load_balancer from the operator configuration. The old
  operator configuration options become no-op with this commit. For
  the old manifest options, `useLoadBalancer` and `replicaLoadBalancer`
  are still consulted,  but only in the absense of the new ones
  (enableMasterLoadBalancer and enableReplicaLoadBalancer).

- Make sure the endpoint being created during the sync receives proper
  addresses subset. This is more critical for the replicas, as for the
  masters Patroni will normally re-create the endpoint before the
  operator.

- Avoid creating the replica endpoint, since it will be created automatically
  by the corresponding service.
- Update the README and unit tests.

Code review by @mgomezch and @zerg-junior
2018-05-15 15:19:18 +02:00
Oleksii Kliukin 332dab5237 Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations 2018-05-08 14:51:10 +02:00
Oleksii Kliukin f41a42f922 Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations 2018-05-07 10:16:30 +02:00
Oleksii Kliukin ce0d4af91c Initial implementation for the statefulset annotations indicating rolling updates. 2018-05-07 08:07:37 +02:00
Oleksii Kliukin 1a20362c5b Initial implementation for the statefulset annotations indicating rolling updates. 2018-05-04 18:59:23 +02:00
Oleksii Kliukin 43a1db2128 Merge branch 'master' into pending_rolling_updates 2018-05-03 11:27:16 +02:00
Oleksii Kliukin 4c8dfd7e20
Remove the check for the clone cluster name. (#270)
* Sanity checks for the cluster name, improve tests.

- check that the normal and clone cluster name complies with the valid
  service name. For clone cluster, only do it if clone timestamp is not
  set; with a clone timestamp set, the clone name points to the S3 bucket

 - add tests and improve existing ones, making sure we don't call Error()
   method for an empty error, as well as that we don't miss cases where
   expected error is not empty, but actual call to be tested does not
   return an error.

Code review by @zerg-junior and @Jan-M
2018-05-03 10:21:37 +02:00
Oleksii Kliukin fe47f9ebea
Improve the pod moving behavior during the Kubernetes cluster upgrade. (#281)
* Improve the pod moving behavior during the Kubernetes cluster upgrade.

Fix an issue of not waiting for at least one replica to become ready
(if the Statefulset indicates there are replicas) when moving the master
pod off the decomissioned node. Resolves the first part of #279.

Small fixes to error messages.

* Eliminate a race condition during the swithover.

When the operator initiates the failover (switchover) that fails and
then retries it for a second time it may happen that the previous
waitForPodChannel is still active. As a result, the operator subscribes
to the former master pod two times, causing a panic.

The problem was that the original code didn't bother to cancel the
waitForPodLalbel for the new master pod in the case when the failover
fails. This commit fixes it by adding a stop channel to that function.

Code review by @zerg-junior
2018-05-03 10:20:24 +02:00
Sergey Dudoladov 59ded0c212 Shorten bucket name 2018-05-02 14:05:57 +02:00
Sergey Dudoladov c45219bafa Set up an S3 bucket for the postgres daily logs 2018-05-02 12:52:42 +02:00
Oleksii Kliukin 37caa3f60b Fix a bug with syncing services
Avoid showing "there is no service in the cluster" when syncing a
service for the cluster if the operator has been restarted after
the cluster had been created.
2018-04-27 12:35:25 +02:00
zerg-junior 8f08bef67c
Merge pull request #277 from zalando-incubator/automatically-deploy-service-account
Deploy service account for pod creation on demand
2018-04-26 14:44:37 +02:00
Sergey Dudoladov 1b718fd4c2 Minor improvemets in reporting service account creation 2018-04-26 13:47:25 +02:00
Sergey Dudoladov 4255e702bc Always empty account's namespace after parsing 2018-04-25 13:57:24 +02:00
Sergey Dudoladov d99b553ec1 Convert default account definiton into JSON 2018-04-25 12:35:16 +02:00
Sergey Dudoladov e3f7fac443 Comment on the default value for pod service account name 2018-04-24 15:41:28 +02:00
Sergey Dudoladov 3d0ab40d64 Explicitly warn on account name mismatch 2018-04-24 15:31:22 +02:00
Sergey Dudoladov 485ec4b8ea Move service account to Controller 2018-04-24 15:13:08 +02:00
Sergey Dudoladov bc8b950da4 Tolerate issues of the Teams API 2018-04-23 16:31:53 +02:00
Sergey Dudoladov c31c76281c Make operator unaware of its own service account 2018-04-23 14:38:20 +02:00
Sergey Dudoladov 5daf0a4172 Fix error reporting during pod service account creation 2018-04-20 14:20:38 +02:00
Sergey Dudoladov bd51d2922b Turn ServiceAccount into struct value to avoid race conditon during account creation 2018-04-20 13:05:05 +02:00
Sergey Dudoladov a5a65e93f4 Name service account consistenly 2018-04-19 16:15:52 +02:00
Sergey Dudoladov 23f893647c Remove sync of pod service accounts 2018-04-19 15:48:58 +02:00
Sergey Dudoladov 214ae04aa7 Deploy service account for pod creation on demand 2018-04-18 16:20:20 +02:00
Oleksii Kliukin 0618723a61 Check rolling updates using controller revisions.
Compare pods controller revisions with the one for the statefulset
to determine whether the pod is running the latest revision and,
therefore, no rolling update is necessary. This is performed only
during the operator start, afterwards the rolling update status
that is stored locally in the cluster structure is used for all
rolling update decisions.
2018-04-09 18:07:24 +02:00
Manuel Gómez 88c68712b6
Fix statefulset label selector diffing (#273)
Otherwise, rolling updates are done unnecessarily.
2018-04-06 17:21:57 +02:00
Oleksii Kliukin 9bf80afa6b
Remove team from statefulset selector (#271)
* Remove 'team' label from the statefulset selector.

I was never supposed to be there, but implicitely statefulset
creates a selector out of meta.labels field. That is the problem
with recent Kubernetes, since statefulset cannot pick up pods
with non-matching label selectors, and we rely on statefulset
picking up old pods after statefulset replacement.

Make sure selector changes trigger replacement of the statefulset.

In the case new selector has more labels than the old one nothing
should be done with a statefulset, otherwise the new statefulset
won't see orphaned pods from the old one, as they won't match the
selector. 

See https://github.com/kubernetes/kubernetes/issues/46901#issuecomment-356418393
2018-04-06 13:58:47 +02:00