* delete secrets the right way
* make a one function
* continue deleting secrets even if one delete fails
Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
* try to emit error for missing team name in cluster name
* skip creation after new cluster object
* move SetStatus to k8sclient and emit event when skipping creation and rename to SetPostgresCRDStatus
Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
* do not block rolling updates with lazy spilo update enabled
* treat initContainers like Spilo image
Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
deleteConnectionPooler function incorrectly checks that the delete api response is ResourceNotFound. Looks like the only consequence is a confusing log message, but obviously it's wrong. Remove negation, since having ResourceNotFound as error is the good case.
Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>
* Initial commit
* Corrections
- set the type of the new configuration parameter to be array of
strings
- propagate the annotations to statefulset at sync
* Enable regular expression matching
* Improvements
-handle rollingUpdate flag
-modularize code
-rename config parameter name
* fix merge error
* Pass annotations to connection pooler deployment
* update code-gen
* Add documentation and update manifests
* add e2e test and introduce option in configmap
* fix service annotations test
* Add unit test
* fix e2e tests
* better key lookup of annotations tests
* add debug message for annotation tests
* Fix typos
* minor fix for looping
* Handle update path and renaming
- handle the update path to update sts and connection pooler deployment.
This way no need to wait for sync
- rename the parameter to downscaler_annotations
- handle other review comments
* another try to fix python loops
* Avoid unneccessary update events
* Update manifests
* some final polishing
* fix cluster_test after polishing
Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* PreparedDatabases with default role setup
* merge changes from master
* include preparedDatabases spec check when syncing databases
* create a default preparedDB if not specified
* add more default privileges for schemas
* use empty brackets block for undefined objects
* cover more default privilege scenarios and always define admin role
* add DefaultUsers flag
* support extensions and defaultUsers for preparedDatabases
* remove exact version in deployment manifest
* enable CRD validation for new field
* update generated code
* reflect code review
* fix typo in SQL command
* add documentation for preparedDatabases feature + minor changes
* some datname should stay
* add unit tests
* reflect some feedback
* init users for preparedDatabases also on update
* only change DB default privileges on creation
* add one more section in user docs
* one more sentence
* initial implementation
* describe forcing the rolling upgrade
* make parameter name more descriptive
* add missing pieces
* address review
* address review
* fix bug in e2e tests
* fix cluster name label in e2e test
* raise test timeout
* load spilo test image
* use available spilo image
* delete replica pod for lazy update test
* fix e2e
* fix e2e with a vengeance
* lets wait for another 30m
* print pod name in error msg
* print pod name in error msg 2
* raise timeout, comment other tests
* subsequent updates of config
* add comma
* fix e2e test
* run unit tests before e2e
* remove conflicting dependency
* Revert "remove conflicting dependency"
This reverts commit 65fc09054b.
* improve cdp build
* dont run unit before e2e tests
* Revert "improve cdp build"
This reverts commit e2a8fa12aa.
Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* Add EventsGetter to KubeClient to enable to sending K8S events
* Add eventRecorder to the controller, initialize it and hand it down to cluster via its constructor to enable it to emit events this way
* Add first set of events which then go to the postgresql custom resource the user interacts with to provide some feedback
* Add right to "create" events to operator cluster role
* Adapt cluster tests to new function sigurature with eventRecord (via NewFakeRecorder)
* Get a proper reference before sending events to a resource
Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>
* further compatibility with k8sUseConfigMaps - skip further endpoints related actions
* Update pkg/cluster/cluster.go
thanks!
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* Update pkg/cluster/cluster.go
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* Update pkg/cluster/cluster.go
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
There is a possibility to pass nil as one of the specs and an empty spec
into syncConnectionPooler. In this case it will perfom a syncronization
because nil != empty struct. Avoid such cases and make it testable by
returning list of syncronization reasons on top together with the final
error.
* Protected and system users can't be a connection pool user
It's not supported, neither it's a best practice. Also fix potential null
pointer access. For protected users it makes sense by intent of protecting this
users (e.g. from being overriden or used as something else than supposed). For
system users the reason is the same as for superuser, it's about replicastion
user and it's under patroni control.
This is implemented on both levels, operator config and postgresql manifest.
For the latter we just use default name in this case, assuming that operator
config is always correct. For the former, since it's a serious
misconfiguration, operator panics.
Connection pooler support
Add support for a connection pooler. The idea is to make it generic enough to
be able to switch between different implementations (e.g. pgbouncer or
odyssey). Operator needs to create a deployment with pooler and a service for
it to access.
For connection pool to work properly, a database needs to be prepared by
operator, namely a separate user have to be created with an access to an
installed lookup function (to fetch credential for other users).
This setups is supposed to be used only by robot/application users. Usually a
connection pool implementation is more CPU bounded, so it makes sense to create
several pods for connection pool with more emphasize on cpu resources. At the
moment there are no special affinity or tolerations assigned to bring those
pods closer to the database. For availability purposes minimal number of
connection pool pods is 2, ideally they have to be distributed between
different nodes/AZ, but it's not enforced in the operator itself. Available
configuration supposed to be ergonomic and in the normal case require minimum
changes to a manifest to enable connection pool. To have more control over the
configuration and functionality on the pool side one can customize the
corresponding docker image.
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* add validation for PG resources and volume size
* check resource requests also on UPDATE and SYNC + update docs
* if cluster was running don't error on sync
This will set up a continuous wal streaming cluster, by adding the corresponding section in postgres manifest. Instead of having a full-fledged standby cluster as in Patroni, here we use only the wal path of the source cluster and stream from there.
Since, standby cluster is streaming from the master and does not require to create or use databases of it's own. Hence, it bypasses the creation of users or databases.
There is a separate sample manifest added to set up a standby-cluster.
* turns PostgresStatus type into a struct with field PostgresClusterStatus
* setStatus patch target is now /status subresource
* unmarshalling PostgresStatus takes care of previous status field convention
* new simple bool functions status.Running(), status.Creating()
Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on.
Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API.
All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1
Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator.
Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into
Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it.
Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with
the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes.
Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files.
Update client-go dependencies.
Minor reformatting and renaming.
Previously, the operator put pg_hba into the bootstrap/pg_hba key of
Patroni. That had 2 adverse effects:
- pg_hba.conf was shadowed by Spilo default section in the local
postgresql configuration
- when updating pg_hba in the cluster manifest, the updated lines were
not propagated to DCS, since the key was defined in the boostrap
section of Patroni.
Include some minor refactoring, moving methods to unexported when
possible and commenting out usage of md5, so that gosec won't complain.
Per https://github.com/zalando-incubator/postgres-operator/issues/330
Review by @zerg-junior
Run more linters in the gometalinter, i.e. deadcode, megacheck,
nakedret, dup.
More consistent code formatting, remove two dead functions, eliminate
naked a bunch of naked returns, refactor a few functions to avoid code
duplication.
* Allow configuring pod priority globally and per cluster.
Allow to specify pod priority class for all pods managed by the operator,
as well as for those belonging to individual clusters.
Controlled by the pod_priority_class_name operator configuration
parameter and the podPriorityClassName manifest option.
See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
for the explanation on how to define priority classes since Kubernetes 1.8.
Some import order changes are due to go fmt.
Removal of OrphanDependents deprecated field.
Code review by @zerg-junior
There are shortcuts in this code, i.e. we created the deepcopy function
by using the deepcopy package instead of the generated code, that will
be addressed once migrated to client-go v8. Also, some objects,
particularly statefulsets, are still taken from v1beta, this will also
be addressed in further commits once the changes are stabilized.
A repair is a sync scan that acts only on those clusters that indicate
that the last add, update or sync operation on them has failed. It is
supposed to kick in more frequently than the repair scan. The repair
scan still remains to be useful to fix the consequences of external
actions (i.e. someone deletes a postgres-related service by mistake)
unbeknownst to the operator.
The repair scan is controlled by the new repair_period parameter in the
operator configuration. It has to be at least 2 times more frequent than
a sync scan to have any effect (a normal sync scan will update both last
synced and last repaired attributes of the controller, since repair is
just a sync underneath).
A repair scan could be queued for a cluster that is already being synced
if the sync period exceeds the interval between repairs. In that case a
repair event will be discarded once the corresponding worker finds out
that the cluster is not failing anymore.
Review by @zerg-junior
* During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.
* Up until now, the operator read its own configuration from the
configmap. That has a number of limitations, i.e. when the
configuration value is not a scalar, but a map or a list. We use a
custom code based on github.com/kelseyhightower/envconfig to decode
non-scalar values out of plain text keys, but that breaks when the data
inside the keys contains both YAML-special elememtns (i.e. commas) and
complex quotes, one good example for that is search_path inside
`team_api_role_configuration`. In addition, reliance on the configmap
forced a flag structure on the configuration, making it hard to write
and to read (see
https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778).
The changes allow to supply the operator configuration in a proper YAML
file. That required registering a custom CRD to support the operator
configuration and provide an example at
manifests/postgresql-operator-default-configuration.yaml. At the moment,
both old configmap and the new CRD configuration is supported, so no
compatibility issues, however, in the future I'd like to deprecate the
configmap-based configuration altogether. Contrary to the
configmap-based configuration, the CRD one doesn't embed defaults into
the operator code, however, one can use the
manifests/postgresql-operator-default-configuration.yaml as a starting
point in order to build a custom configuration.
Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters
used to create the CRD were taken from the operator configuration, which
is not possible if the configuration itself is stored in the CRD object,
I've added the ability to specify them as environment variables
`CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively.
Per review by @zerg-junior and @Jan-M.
* Switchover must wait for the inner goroutine before it returns.
Otherwise, two corner cases may happen:
- waitForPodLabel writes to the podLabelErr channel that has been
already closed by the outer routine
- the outer routine exists and the caller subscribes to the pod
the inner goroutine has already subscribed to, resulting in panic.
The previous commit fe47f9ebea
that touched that code added the cancellation channel, but didn't bother
to actually wait for the goroutine to be cancelled.
Per report and review from @valer-cara.
Original issue: https://github.com/zalando-incubator/postgres-operator/issues/342
Call Patroni API /config in order to set special options that are
ignored when set in the configuration file, such as max_connections.
Per https://github.com/zalando-incubator/postgres-operator/issues/297
* Some minor refacoring:
Rename Cluster ManualFailover to Swithover
Rename Patroni Failover to Switchover
Add more details to error messages and comments introduced in this PR.
Review by @zerg-junior
When there is an error happening upon deletion of the Kubernetes object
belonging to the cluster being removed, it makes no sense to abort the
deletion: the manifest will be removed anyway, therefore all the objects
after the one we aborted at will stay forever.