Commit Graph

634 Commits

Author SHA1 Message Date
Oleksii Kliukin e1ed4b847d
Use code-generation for CRD API and deepcopy methods (#369)
Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on.

Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API.

All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1

Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator.

Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into

Change the type of  the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it.

Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with
the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes.

Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files.

Update client-go dependencies.

Minor reformatting and renaming.
2018-08-15 17:22:25 +02:00
Jan Mussler 6e8dcabac7
Update postgres-operator.yaml
Bump manifest to use v1.0.0 operator
2018-08-10 14:17:44 +02:00
Oleksii Kliukin e933908084
Configure pg_hba in the local postgresql configuration of Patroni. (#361)
Previously, the operator put pg_hba into the bootstrap/pg_hba key of
Patroni. That had 2 adverse effects:
 - pg_hba.conf was shadowed by Spilo default section in the local
   postgresql configuration
 - when updating pg_hba in the cluster manifest, the updated lines were
   not propagated to DCS, since the key was defined in the boostrap
   section of Patroni.

Include some minor refactoring, moving methods to unexported when
possible and commenting out usage of md5, so that gosec won't complain.

Per https://github.com/zalando-incubator/postgres-operator/issues/330

Review by @zerg-junior
2018-08-08 11:01:26 +02:00
Oleksii Kliukin 199aa6508c
Populate list of clusters in the controller at startup. (#364)
Assign the list of clusters in the controller with the up-to-date list
of Postgres manifests on Kubernetes during the startup.

Node migration routines launched asynchronously to the cluster
processing rely on an up-to-date list of clusters in the controller to
detect clusters affected by the migration of the node and lock them
when doing migration of master pods. Without the initial list the
operator was subject to race conditions like the one described at
https://github.com/zalando-incubator/postgres-operator/issues/363

Restructure the code to decouple list cluster function required by the
postgresql informer from the one that emits cluster sync events. No
extra work is introduced, since cluster sync already runs in a separate
goroutine (clusterResync).

Introduce explicit initial cluster sync at the end of
acquireInitialListOfClusters instead of relying on an implicit one
coming from list function of the PostgreSQL informer.

Some minor refactoring.

Review by @zerg-junior
2018-08-08 11:00:56 +02:00
Oleksii Kliukin acf46bfa62
Include CREATEROLE to the list of allowed flags. (#365)
Previously it has been supported by the operator, but the validity check
excluded it for no reason.
2018-08-08 10:53:08 +02:00
Oleksii Kliukin 14050588ee
Move to client-go 8. (#362)
Not much changes, except for one function that has been deprecated.
However, unless we find a way to use semantic version comparisons like
'^' on a branch name, we would have to update the apimachinery,
apiextensions-apiserver and code-generator dependencies manually.

Also, slash a linter warning about RoleOriginUnknown being not used.
2018-08-07 12:31:08 +02:00
Oleksii Kliukin b06186eb41
Linter-induced code refactoring, run round 2. (#360)
Run more linters in the gometalinter, i.e. deadcode, megacheck,
nakedret, dup.

More consistent code formatting, remove two dead functions, eliminate
naked a bunch of naked returns, refactor a few functions to avoid code
duplication.
2018-08-06 12:09:19 +02:00
zerg-junior 50f079c633 [WIP] Draft codeowners, update maintainers (#358)
* Draft codeowners, update maintainers

* Minor reformatting
2018-08-06 08:59:00 +02:00
Oleksii Kliukin 59f0c5551e
Allow configuring pod priority globally and per cluster. (#353)
* Allow configuring pod priority globally and per cluster.

Allow to specify pod priority class for all pods managed by the operator,
as well as for those belonging to individual clusters.

Controlled by the pod_priority_class_name operator configuration
parameter and the podPriorityClassName manifest option.

See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
for the explanation on how to define priority classes since Kubernetes 1.8.

Some import order changes are due to go fmt.
Removal of OrphanDependents deprecated field.

Code review by @zerg-junior
2018-08-03 14:03:37 +02:00
Oleksii Kliukin ac7b132314
Refactoring inspired by gometalinter. (#357)
Among other things, fix a few issues with deepcopy implementation.
2018-08-03 11:09:45 +02:00
Oleksii Kliukin d0f4148cd3
Fix a link to the CRD manifest. (#356)
Per a gripe from @angapov: https://github.com/zalando-incubator/postgres-operator/issues/355
2018-08-02 11:13:23 +02:00
Oleksii Kliukin d2d3f21dc2 Client go upgrade v6 (#352)
There are shortcuts in this code, i.e. we created the deepcopy function
by using the deepcopy package instead of the generated code, that will
be addressed once migrated to client-go v8. Also, some objects,
particularly statefulsets, are still taken from v1beta, this will also
be addressed in further commits once the changes are stabilized.
2018-08-01 11:08:01 +02:00
Oleksii Kliukin f27833b5eb Fix disabling database access and teams API via command-line options. (#351) 2018-07-27 10:24:05 +02:00
Oleksii Kliukin 0181a1b5b1
Introduce a repair scan to fix failing clusters (#304)
A repair is a sync scan that acts only on those clusters that indicate
that the last add, update or sync operation on them has failed. It is
supposed to kick in more frequently than the repair scan. The repair
scan still remains to be useful to fix the consequences of external
actions (i.e. someone deletes a postgres-related service by mistake)
unbeknownst to the operator.

The repair scan is controlled by the new repair_period parameter in the
operator configuration. It has to be at least 2 times more frequent than
a sync scan to have any effect (a normal sync scan will update both last
synced and last repaired attributes of the controller, since repair is
just a sync underneath).

A repair scan could be queued for a cluster that is already being synced
if the sync period exceeds the interval between repairs. In that case a
repair event will be discarded once the corresponding worker finds out
that the cluster is not failing anymore.

Review by @zerg-junior
2018-07-24 11:21:45 +02:00
Oleksii Kliukin 1a0e5357dc
Improve generation of Scalyr container environment. (#346)
* Improve generting of Scalyr container environment.

Avoid duplicating POD_NAME and POD_NAMESPACE that already bundled
every sidecar.

Do not complain on the lack of SCLALYR_SERVER_HOST, since it is set to
https://upload.eu.scalyr.com in the container we use.

Do not mentioned SCALYR_SERVER_HOST in the error messages, since it is
derived from the cluster name automatically.
2018-07-24 11:16:24 +02:00
Oleksii Kliukin 12871aad1a
Avoid showing an extra error when resizing volume fails (#350)
Do not show 'persistent volumes are not compatible' errors for the
volumes that failed to be resized because of the other reasons (i.e.
the new size is smaller than the existing one).
2018-07-20 14:12:25 +02:00
zerg-junior accbe20804
Upgrade version to enable RBAC in multiple namespace (#348) 2018-07-19 18:22:30 +02:00
zerg-junior 417f13c0bd
Submit RBAC credentials during initial Event processing (#344)
* During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.
2018-07-19 16:40:40 +02:00
Oleksii Kliukin 3a9378d3b8
Allow configuring the operator via the YAML manifest. (#326)
* Up until now, the operator read its own configuration from the
configmap.  That has a number of limitations, i.e. when the
configuration value is not a scalar, but a map or a list. We use a
custom code based on github.com/kelseyhightower/envconfig to decode
non-scalar values out of plain text keys, but that breaks when the data
inside the keys contains both YAML-special elememtns (i.e. commas) and
complex quotes, one good example for that is search_path inside
`team_api_role_configuration`. In addition, reliance on the configmap
forced a flag structure on the configuration, making it hard to write
and to read (see
https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778).

The changes allow to supply the operator configuration in a proper YAML
file. That required registering a custom CRD to support the operator
configuration and provide an example at
manifests/postgresql-operator-default-configuration.yaml. At the moment,
both old configmap and the new CRD configuration is supported, so no
compatibility issues, however, in the future I'd like to deprecate the
configmap-based configuration altogether. Contrary to the
configmap-based configuration, the CRD one doesn't embed defaults into
the operator code, however, one can use the
manifests/postgresql-operator-default-configuration.yaml as a starting
point in order to build a custom configuration.

Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters
used to create the CRD were taken from the operator configuration, which
is not possible if the configuration itself is stored in the CRD object,
I've added the ability to specify them as environment variables
`CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively.

Per review by @zerg-junior  and  @Jan-M.
2018-07-16 16:20:46 +02:00
Oleksii Kliukin e90a01050c
Switchover must wait for the inner goroutine before it returns. (#343)
* Switchover must wait for the inner goroutine before it returns.

Otherwise, two corner cases may happen:

 - waitForPodLabel writes to the podLabelErr channel that has been
   already closed by the outer routine

 - the outer routine exists and the caller subscribes to the pod
   the inner goroutine has already subscribed to, resulting in panic.

 The previous commit fe47f9ebea
 that touched that code added the cancellation channel, but didn't bother
 to actually wait for the goroutine to be cancelled.

 Per report and review from @valer-cara.
 Original issue: https://github.com/zalando-incubator/postgres-operator/issues/342
2018-07-16 11:50:35 +02:00
Oleksii Kliukin b7b950eb28
Use the StorageClassName field of the volumeClaimTemplate. (#338)
The old way of specifying it with the annotation is deprecated and not
available in recent Kubernetes versions. We will keep it there anyway
until upgrading to the new go-client that is incompatible with those
versions.

Per report from @schmitch
2018-07-16 11:49:58 +02:00
Oleksii Kliukin 25a306244f
Support for per-cluster and operator global sidecars (#331)
* Define sidecars in the operator configuration.

Right now only the name and the docker image can be defined, but with
the help of the pod_environment_configmap parameter arbitrary
environment variables can be passed to the sidecars.

* Refactoring around generatePodTemplate.

Original implementation of per-cluster sidecars by @theRealWardo 

Per review by @zerg-junior and @Jan-M
2018-07-02 16:25:27 +02:00
zerg-junior 7394c15d0a
Make AWS region configurable in the operator cofig map (#333) 2018-06-27 17:29:02 +02:00
Oleksii Kliukin 74b19b449e
Update travis configuration. (#332)
- explicitely set sudo to false, since we don't need it and it
  slows-down builds.
- use the newest go toolchain.
2018-06-27 12:30:24 +02:00
Oleksii Kliukin d9d2c5cbe5
Minor formatting fix 2018-06-13 12:32:56 +02:00
Oleksii Kliukin 5d02c57e04 Docs/reference (#323)
Document operator command-line options and environment variables.
2018-06-12 19:12:11 +02:00
Oleksii Kliukin b518a31d0c
Document cluster manifests. (#320)
Document cluster manifests options.

Review by  @erthalion and @zerg-junior.
2018-06-12 11:57:00 +02:00
Oleksii Kliukin 9cb48e0889
Document operator configuration parameters. (#313) 2018-06-08 13:21:57 +02:00
Dmitry Dolgov c26962ba62
Merge pull request #315 from zalando-incubator/feature/doc-volume-increase
Add section about volume increase
2018-06-07 13:46:08 +02:00
erthalion df40cd831d Adjust the wording 2018-06-07 10:25:50 +02:00
erthalion dab6c01cc7 Change and clarify wording 2018-06-06 17:36:21 +02:00
erthalion 4d20a38106 Add section about volume increase 2018-06-05 11:42:01 +02:00
Dmitry Dolgov 681656cbb7
Merge pull request #312 from zalando-incubator/feature/doc-clone
Add an example of clone feature
2018-06-04 16:26:44 +02:00
erthalion 2a05179f63 Adjust article for title 2018-06-04 16:17:48 +02:00
erthalion d0e6932641 Adjustments for clone section 2018-06-04 15:36:17 +02:00
Oleksii Kliukin 59795d48d2
Merge pull request #314 from zalando-incubator/volume_resize_with_multiple_containers
Fix exec into pods to resize volumes for multi-container pods.
2018-06-04 15:34:05 +02:00
Oleksii Kliukin 04b660519a Fix exec into pods to resize volumes for multi-container pods.
The original code assumed only one container per pod.
2018-06-04 14:51:39 +02:00
erthalion 5151b43c82 Split into two parts and reformulate a bit 2018-06-04 13:41:18 +02:00
erthalion e661ea1ea7 Mention `uid` field 2018-06-01 16:44:57 +02:00
erthalion b82faf66bb Unrelated chaotic good adjustments of snippets 2018-06-01 16:21:37 +02:00
erthalion 07d9dff847 Add an example of clone feature 2018-06-01 16:15:42 +02:00
Dmitry Dolgov 6ee0349536
Merge pull request #309 from zalando-incubator/feature/split-documentation
Split already existing documentation into parts
2018-06-01 15:49:21 +02:00
erthalion 69c8d3784a Use PostgreSQL specifically 2018-06-01 13:29:09 +02:00
erthalion f356225c04 Change to user/admin/developer 2018-06-01 11:32:50 +02:00
Oleksii Kliukin 16a710a99a Avoid possible skipping SYNC events.
OB1 bug in the condition deciding whether to sync.
2018-05-31 18:29:15 +02:00
erthalion 86f87ac31e Add links to subdocuments 2018-05-31 14:03:01 +02:00
erthalion 749085b29a Split already existing documentation into parts
To improve the documentation we need to split it into smaller parts:

* quickstart (in the readme)
* general concepts
* tutorials
* how to
* references

And then add the missing information. So far I just split the existing
documentation and left references almost empty. I assume that references
may duplicate the rest of the documentation in a way that the doc will
have references to this section, that contains all the formal details.
2018-05-31 11:23:29 +02:00
Oleksii Kliukin 48a5744314
Use Patroni API to set bootstrap-only options. (#299)
Call Patroni API /config in order to set special options that are
ignored when set in the configuration file, such as max_connections.
Per https://github.com/zalando-incubator/postgres-operator/issues/297

* Some minor refacoring:

Rename Cluster ManualFailover to Swithover
Rename Patroni Failover to Switchover
Add more details to error messages and comments introduced in this PR.

Review by @zerg-junior
2018-05-29 12:35:25 +02:00
zerg-junior 24df918dda
Merge pull request #306 from zalando-incubator/update-default-spilo
Bump up default Spilo image
2018-05-28 17:02:38 +02:00
Sergey Dudoladov 2e041c50e6 Bump up default Spilo image 2018-05-28 16:54:27 +02:00