Commit Graph

663 Commits

Author SHA1 Message Date
Sergey Dudoladov 60a2c2e810 Log conditions that prevent master pod migration 2018-12-21 16:31:03 +01:00
Sergey Dudoladov dc381c29e0 Merge branch 'master' into fix-reporting-masters-under-migration 2018-12-21 14:35:04 +01:00
Sergey Dudoladov 8c598a87da Make run operator locally ignore smoke-tested image 2018-12-21 14:34:31 +01:00
Isaev Denis ff5c63ddf1 add .golangci.yml (#422) 2018-11-27 12:00:15 +01:00
Dmitry Dolgov a0e09cd6a6 Add golangci badge (#423)
Since we've connected it, it makes sense also to display the results.
2018-11-27 11:59:59 +01:00
zerg-junior 45c89b3da4
[WIP] Add set_memory_request_to_limit option (#406)
* Add set_memory_request_to_limit option
2018-11-15 14:00:08 +01:00
jens-totemic f25351c36a Make OperatorConfiguration work (#410)
* Fixes # 404
2018-11-13 11:22:07 +01:00
zerg-junior e39915c968
Restore .zappr.yaml (#405)
* Restore .zappr.yaml

* remove approvals as no longer used
2018-11-07 13:06:53 +01:00
zerg-junior 96e3ea9511
Properly overwrite empty allowed source ranges for load balancers (#392)
* Properly overwrite empty allowed source ranges for load balancers
2018-11-06 11:08:45 +01:00
zerg-junior ccaee94a35
Minor improvements (#381)
* Minor improvements

* Document empty list vs null for users without privileges

* Change the wording for null values

* Add talk by Oleksii in Atmosphere
2018-11-06 11:08:13 +01:00
zerg-junior 86ba92ad02
Rename 'permanent_slots' field to 'slots' (#401) 2018-10-31 16:11:28 +01:00
Dmitry Dolgov 78e83308fc API url regexps (#400)
* Make url regexp more flexible, to accept identifier with dashes

* Add few simple tests

* Check also numerics
2018-10-31 14:52:41 +01:00
zerg-junior 1b4181a724
[WIP] Add the ability to configure replications slots in Patroni (#398)
* Add the ability to configure replication slots in Patroni

* Add debugging  to Makefile for CDP builds
2018-10-31 13:10:56 +01:00
Dmitry Dolgov 83dfae2a6d
Editing documentation for cloning
Clear a bit the section about timestamp (from @zalandoAlex)
2018-10-31 11:08:49 +01:00
Sergey Dudoladov 3fb58e9077 refactor error reporting 2018-09-25 16:06:52 +02:00
zerg-junior 9f4a73afb7
Update operator_parameters.md (#379) 2018-09-24 15:43:22 +02:00
zerg-junior 7907f95d2f
Improve reporting about rolling updates (#391) 2018-09-24 11:57:43 +02:00
Sergey Dudoladov 85e8dea086 Mention node name in error messages to ease log filtering 2018-09-18 15:12:17 +02:00
Sergey Dudoladov 2df8d522af Unify warnings about unmovable pods 2018-09-17 17:45:49 +02:00
Noah Kantrowitz 688d252752 Some tweaks to ensure compat with newer Go. (#383) 2018-09-17 10:13:07 +02:00
zerg-junior f9cbed9be9
Clarify what a default values is (#368)
Clarify what a default values is
2018-09-17 10:12:58 +02:00
Noah Kantrowitz 0b75a89920 Fix the casing of github.com/Sirupsen/logrus to match what the project itself uses. (#380)
Dep enforces this.
2018-09-06 10:26:48 +02:00
Noah Kantrowitz a4224f6063 Move CRD definitions into a formal API to allow access from other controllers. (#378) 2018-08-31 11:20:02 +02:00
zerg-junior 25fa45fd58 [WIP] Grant 'superuser' to the members of Postgres admin teams (#371)
Added support for superuser team in addition to the admin team that owns the postgres cluster.
2018-08-30 10:51:37 +02:00
zerg-junior 1e53e22773 Improve error reporting for short cluster names (#377)
* Improve error reporting for short cluster names

* Revert to clusterName
2018-08-29 17:08:59 +02:00
zerg-junior 75a1d782b0
Update CODEOWNERS (#376)
Add @Jan-M @CyberDem0n @avaczi as codeowners
2018-08-29 14:00:00 +02:00
zerg-junior 4543bfde96
Document code generation (#370)
* Document code generation
2018-08-22 11:34:15 +02:00
Valer Cara 5ed109678c Add GoDoc badge to readme (#372) 2018-08-22 11:33:17 +02:00
zerg-junior aeae0a6ef2
Use cluster's own namespace to patch the cluster manifest (#373) 2018-08-22 11:07:12 +02:00
Oleksii Kliukin e1ed4b847d
Use code-generation for CRD API and deepcopy methods (#369)
Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on.

Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API.

All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1

Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator.

Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into

Change the type of  the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it.

Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with
the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes.

Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files.

Update client-go dependencies.

Minor reformatting and renaming.
2018-08-15 17:22:25 +02:00
Jan Mussler 6e8dcabac7
Update postgres-operator.yaml
Bump manifest to use v1.0.0 operator
2018-08-10 14:17:44 +02:00
Oleksii Kliukin e933908084
Configure pg_hba in the local postgresql configuration of Patroni. (#361)
Previously, the operator put pg_hba into the bootstrap/pg_hba key of
Patroni. That had 2 adverse effects:
 - pg_hba.conf was shadowed by Spilo default section in the local
   postgresql configuration
 - when updating pg_hba in the cluster manifest, the updated lines were
   not propagated to DCS, since the key was defined in the boostrap
   section of Patroni.

Include some minor refactoring, moving methods to unexported when
possible and commenting out usage of md5, so that gosec won't complain.

Per https://github.com/zalando-incubator/postgres-operator/issues/330

Review by @zerg-junior
2018-08-08 11:01:26 +02:00
Oleksii Kliukin 199aa6508c
Populate list of clusters in the controller at startup. (#364)
Assign the list of clusters in the controller with the up-to-date list
of Postgres manifests on Kubernetes during the startup.

Node migration routines launched asynchronously to the cluster
processing rely on an up-to-date list of clusters in the controller to
detect clusters affected by the migration of the node and lock them
when doing migration of master pods. Without the initial list the
operator was subject to race conditions like the one described at
https://github.com/zalando-incubator/postgres-operator/issues/363

Restructure the code to decouple list cluster function required by the
postgresql informer from the one that emits cluster sync events. No
extra work is introduced, since cluster sync already runs in a separate
goroutine (clusterResync).

Introduce explicit initial cluster sync at the end of
acquireInitialListOfClusters instead of relying on an implicit one
coming from list function of the PostgreSQL informer.

Some minor refactoring.

Review by @zerg-junior
2018-08-08 11:00:56 +02:00
Oleksii Kliukin acf46bfa62
Include CREATEROLE to the list of allowed flags. (#365)
Previously it has been supported by the operator, but the validity check
excluded it for no reason.
2018-08-08 10:53:08 +02:00
Oleksii Kliukin 14050588ee
Move to client-go 8. (#362)
Not much changes, except for one function that has been deprecated.
However, unless we find a way to use semantic version comparisons like
'^' on a branch name, we would have to update the apimachinery,
apiextensions-apiserver and code-generator dependencies manually.

Also, slash a linter warning about RoleOriginUnknown being not used.
2018-08-07 12:31:08 +02:00
Oleksii Kliukin b06186eb41
Linter-induced code refactoring, run round 2. (#360)
Run more linters in the gometalinter, i.e. deadcode, megacheck,
nakedret, dup.

More consistent code formatting, remove two dead functions, eliminate
naked a bunch of naked returns, refactor a few functions to avoid code
duplication.
2018-08-06 12:09:19 +02:00
zerg-junior 50f079c633 [WIP] Draft codeowners, update maintainers (#358)
* Draft codeowners, update maintainers

* Minor reformatting
2018-08-06 08:59:00 +02:00
Oleksii Kliukin 59f0c5551e
Allow configuring pod priority globally and per cluster. (#353)
* Allow configuring pod priority globally and per cluster.

Allow to specify pod priority class for all pods managed by the operator,
as well as for those belonging to individual clusters.

Controlled by the pod_priority_class_name operator configuration
parameter and the podPriorityClassName manifest option.

See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
for the explanation on how to define priority classes since Kubernetes 1.8.

Some import order changes are due to go fmt.
Removal of OrphanDependents deprecated field.

Code review by @zerg-junior
2018-08-03 14:03:37 +02:00
Oleksii Kliukin ac7b132314
Refactoring inspired by gometalinter. (#357)
Among other things, fix a few issues with deepcopy implementation.
2018-08-03 11:09:45 +02:00
Oleksii Kliukin d0f4148cd3
Fix a link to the CRD manifest. (#356)
Per a gripe from @angapov: https://github.com/zalando-incubator/postgres-operator/issues/355
2018-08-02 11:13:23 +02:00
Oleksii Kliukin d2d3f21dc2 Client go upgrade v6 (#352)
There are shortcuts in this code, i.e. we created the deepcopy function
by using the deepcopy package instead of the generated code, that will
be addressed once migrated to client-go v8. Also, some objects,
particularly statefulsets, are still taken from v1beta, this will also
be addressed in further commits once the changes are stabilized.
2018-08-01 11:08:01 +02:00
Oleksii Kliukin f27833b5eb Fix disabling database access and teams API via command-line options. (#351) 2018-07-27 10:24:05 +02:00
Oleksii Kliukin 0181a1b5b1
Introduce a repair scan to fix failing clusters (#304)
A repair is a sync scan that acts only on those clusters that indicate
that the last add, update or sync operation on them has failed. It is
supposed to kick in more frequently than the repair scan. The repair
scan still remains to be useful to fix the consequences of external
actions (i.e. someone deletes a postgres-related service by mistake)
unbeknownst to the operator.

The repair scan is controlled by the new repair_period parameter in the
operator configuration. It has to be at least 2 times more frequent than
a sync scan to have any effect (a normal sync scan will update both last
synced and last repaired attributes of the controller, since repair is
just a sync underneath).

A repair scan could be queued for a cluster that is already being synced
if the sync period exceeds the interval between repairs. In that case a
repair event will be discarded once the corresponding worker finds out
that the cluster is not failing anymore.

Review by @zerg-junior
2018-07-24 11:21:45 +02:00
Oleksii Kliukin 1a0e5357dc
Improve generation of Scalyr container environment. (#346)
* Improve generting of Scalyr container environment.

Avoid duplicating POD_NAME and POD_NAMESPACE that already bundled
every sidecar.

Do not complain on the lack of SCLALYR_SERVER_HOST, since it is set to
https://upload.eu.scalyr.com in the container we use.

Do not mentioned SCALYR_SERVER_HOST in the error messages, since it is
derived from the cluster name automatically.
2018-07-24 11:16:24 +02:00
Oleksii Kliukin 12871aad1a
Avoid showing an extra error when resizing volume fails (#350)
Do not show 'persistent volumes are not compatible' errors for the
volumes that failed to be resized because of the other reasons (i.e.
the new size is smaller than the existing one).
2018-07-20 14:12:25 +02:00
zerg-junior accbe20804
Upgrade version to enable RBAC in multiple namespace (#348) 2018-07-19 18:22:30 +02:00
zerg-junior 417f13c0bd
Submit RBAC credentials during initial Event processing (#344)
* During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.
2018-07-19 16:40:40 +02:00
Oleksii Kliukin 3a9378d3b8
Allow configuring the operator via the YAML manifest. (#326)
* Up until now, the operator read its own configuration from the
configmap.  That has a number of limitations, i.e. when the
configuration value is not a scalar, but a map or a list. We use a
custom code based on github.com/kelseyhightower/envconfig to decode
non-scalar values out of plain text keys, but that breaks when the data
inside the keys contains both YAML-special elememtns (i.e. commas) and
complex quotes, one good example for that is search_path inside
`team_api_role_configuration`. In addition, reliance on the configmap
forced a flag structure on the configuration, making it hard to write
and to read (see
https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778).

The changes allow to supply the operator configuration in a proper YAML
file. That required registering a custom CRD to support the operator
configuration and provide an example at
manifests/postgresql-operator-default-configuration.yaml. At the moment,
both old configmap and the new CRD configuration is supported, so no
compatibility issues, however, in the future I'd like to deprecate the
configmap-based configuration altogether. Contrary to the
configmap-based configuration, the CRD one doesn't embed defaults into
the operator code, however, one can use the
manifests/postgresql-operator-default-configuration.yaml as a starting
point in order to build a custom configuration.

Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters
used to create the CRD were taken from the operator configuration, which
is not possible if the configuration itself is stored in the CRD object,
I've added the ability to specify them as environment variables
`CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively.

Per review by @zerg-junior  and  @Jan-M.
2018-07-16 16:20:46 +02:00
Oleksii Kliukin e90a01050c
Switchover must wait for the inner goroutine before it returns. (#343)
* Switchover must wait for the inner goroutine before it returns.

Otherwise, two corner cases may happen:

 - waitForPodLabel writes to the podLabelErr channel that has been
   already closed by the outer routine

 - the outer routine exists and the caller subscribes to the pod
   the inner goroutine has already subscribed to, resulting in panic.

 The previous commit fe47f9ebea
 that touched that code added the cancellation channel, but didn't bother
 to actually wait for the goroutine to be cancelled.

 Per report and review from @valer-cara.
 Original issue: https://github.com/zalando-incubator/postgres-operator/issues/342
2018-07-16 11:50:35 +02:00
Oleksii Kliukin b7b950eb28
Use the StorageClassName field of the volumeClaimTemplate. (#338)
The old way of specifying it with the annotation is deprecated and not
available in recent Kubernetes versions. We will keep it there anyway
until upgrading to the new go-client that is incompatible with those
versions.

Per report from @schmitch
2018-07-16 11:49:58 +02:00