The code added on #818 depends on map sorting to return a static reason
for service annotation changes. To avoid tests flakiness and map sorting
the tests include a `strings.HasPrefix` instead of comparing the whole
string. One of the test cases,
`service_removes_a_custom_annotation,_adds_a_new_one_and_change_another`,
is trying to test the whole reason string.
This commit replaces the test case reason, for only the reason prefix.
It removes the flakiness from the tests. As all the cases (annotation
adding, removing and value changing) are tested before, it's safe to
test only prefixes.
Also, it renames the test name from `TestServiceAnnotations` to
`TestSameService` and introduces a better description in case of test
failure, describing that only prefixes are tested.
The current implementations for `pkg.util.k8sutil.SameService` considers
only service annotations change on the default annotations created by the
operator. Custom annotations are not compared and consequently not
applied after the first service creation.
This commit introduces a complete annotations comparison between the
current service created by the operator and the new one generated based on
the configs. Also, it adds tests on the above-mentioned function.
* add validation for PG resources and volume size
* check resource requests also on UPDATE and SYNC + update docs
* if cluster was running don't error on sync
* add CRD manifests with validation
* update documentation
* patroni slots is not an array but a nested hash map
* make deps call tools
* cover validation in docs and export it in crds.go
* add toggle to disable creation of CRD validation and document it
* use templated service account also for CRD-configured helm deployment
* Added possibility to add custom annotations to LoadBalancer service.
* Added parameters for custom endpoint, access and secret key for logical backup.
* Modified dump.sh so it knows how to handle new features. Configurable S3 SSE
* align config map, operator config, helm chart values and templates
* follow helm chart conventions also in CRD templates
* split up values files and add comments
* avoid yaml confusion in postgres manifests
* bump spilo version and use example for logical_backup_s3_bucket
* add ConfigTarget switch to values
* StatefulSet fsGroup config option to allow non-root spilo
* Allow Postgres CRD to overide SpiloFSGroup of the Operator.
* Document FSGroup of a Pod cannot be changed after creation.
* database.go: substitute hardcoded .svc.cluster.local dns suffix with config parameter
Use the pod's configured dns search path, for clusters where .svc.cluster.local is not correct.
* turns PostgresStatus type into a struct with field PostgresClusterStatus
* setStatus patch target is now /status subresource
* unmarshalling PostgresStatus takes care of previous status field convention
* new simple bool functions status.Running(), status.Creating()
* Config option to allow Spilo container to run non-privileged.
Runs non-privileged by default.
Fixes#395
* add spilo_privileged to manifests/configmap.yaml
* add spilo_privileged to helm chart's values.yaml
Add possibility to mount a tmpfs volume to /dev/shm to avoid issues like
[this](https://github.com/docker-library/postgres/issues/416). To achieve that
two new options were introduced:
* `enableShmVolume` to PostgreSQL manifest, to specify whether or not mount
this volume per database cluster
* `enable_shm_volume` to operator configuration, to specify whether or not mount
per operator.
The first one, `enableShmVolume` takes precedence to allow us to be more flexible.
Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on.
Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API.
All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1
Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator.
Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into
Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it.
Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with
the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes.
Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files.
Update client-go dependencies.
Minor reformatting and renaming.
Previously, the operator put pg_hba into the bootstrap/pg_hba key of
Patroni. That had 2 adverse effects:
- pg_hba.conf was shadowed by Spilo default section in the local
postgresql configuration
- when updating pg_hba in the cluster manifest, the updated lines were
not propagated to DCS, since the key was defined in the boostrap
section of Patroni.
Include some minor refactoring, moving methods to unexported when
possible and commenting out usage of md5, so that gosec won't complain.
Per https://github.com/zalando-incubator/postgres-operator/issues/330
Review by @zerg-junior
Assign the list of clusters in the controller with the up-to-date list
of Postgres manifests on Kubernetes during the startup.
Node migration routines launched asynchronously to the cluster
processing rely on an up-to-date list of clusters in the controller to
detect clusters affected by the migration of the node and lock them
when doing migration of master pods. Without the initial list the
operator was subject to race conditions like the one described at
https://github.com/zalando-incubator/postgres-operator/issues/363
Restructure the code to decouple list cluster function required by the
postgresql informer from the one that emits cluster sync events. No
extra work is introduced, since cluster sync already runs in a separate
goroutine (clusterResync).
Introduce explicit initial cluster sync at the end of
acquireInitialListOfClusters instead of relying on an implicit one
coming from list function of the PostgreSQL informer.
Some minor refactoring.
Review by @zerg-junior
Run more linters in the gometalinter, i.e. deadcode, megacheck,
nakedret, dup.
More consistent code formatting, remove two dead functions, eliminate
naked a bunch of naked returns, refactor a few functions to avoid code
duplication.
* Allow configuring pod priority globally and per cluster.
Allow to specify pod priority class for all pods managed by the operator,
as well as for those belonging to individual clusters.
Controlled by the pod_priority_class_name operator configuration
parameter and the podPriorityClassName manifest option.
See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
for the explanation on how to define priority classes since Kubernetes 1.8.
Some import order changes are due to go fmt.
Removal of OrphanDependents deprecated field.
Code review by @zerg-junior
There are shortcuts in this code, i.e. we created the deepcopy function
by using the deepcopy package instead of the generated code, that will
be addressed once migrated to client-go v8. Also, some objects,
particularly statefulsets, are still taken from v1beta, this will also
be addressed in further commits once the changes are stabilized.
A repair is a sync scan that acts only on those clusters that indicate
that the last add, update or sync operation on them has failed. It is
supposed to kick in more frequently than the repair scan. The repair
scan still remains to be useful to fix the consequences of external
actions (i.e. someone deletes a postgres-related service by mistake)
unbeknownst to the operator.
The repair scan is controlled by the new repair_period parameter in the
operator configuration. It has to be at least 2 times more frequent than
a sync scan to have any effect (a normal sync scan will update both last
synced and last repaired attributes of the controller, since repair is
just a sync underneath).
A repair scan could be queued for a cluster that is already being synced
if the sync period exceeds the interval between repairs. In that case a
repair event will be discarded once the corresponding worker finds out
that the cluster is not failing anymore.
Review by @zerg-junior
* During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.
* Up until now, the operator read its own configuration from the
configmap. That has a number of limitations, i.e. when the
configuration value is not a scalar, but a map or a list. We use a
custom code based on github.com/kelseyhightower/envconfig to decode
non-scalar values out of plain text keys, but that breaks when the data
inside the keys contains both YAML-special elememtns (i.e. commas) and
complex quotes, one good example for that is search_path inside
`team_api_role_configuration`. In addition, reliance on the configmap
forced a flag structure on the configuration, making it hard to write
and to read (see
https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778).
The changes allow to supply the operator configuration in a proper YAML
file. That required registering a custom CRD to support the operator
configuration and provide an example at
manifests/postgresql-operator-default-configuration.yaml. At the moment,
both old configmap and the new CRD configuration is supported, so no
compatibility issues, however, in the future I'd like to deprecate the
configmap-based configuration altogether. Contrary to the
configmap-based configuration, the CRD one doesn't embed defaults into
the operator code, however, one can use the
manifests/postgresql-operator-default-configuration.yaml as a starting
point in order to build a custom configuration.
Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters
used to create the CRD were taken from the operator configuration, which
is not possible if the configuration itself is stored in the CRD object,
I've added the ability to specify them as environment variables
`CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively.
Per review by @zerg-junior and @Jan-M.
* Define sidecars in the operator configuration.
Right now only the name and the docker image can be defined, but with
the help of the pod_environment_configmap parameter arbitrary
environment variables can be passed to the sidecars.
* Refactoring around generatePodTemplate.
Original implementation of per-cluster sidecars by @theRealWardo
Per review by @zerg-junior and @Jan-M
Call Patroni API /config in order to set special options that are
ignored when set in the configuration file, such as max_connections.
Per https://github.com/zalando-incubator/postgres-operator/issues/297
* Some minor refacoring:
Rename Cluster ManualFailover to Swithover
Rename Patroni Failover to Switchover
Add more details to error messages and comments introduced in this PR.
Review by @zerg-junior
* Depreate old LB options, fix endpoint sync.
- deprecate useLoadBalancer, replicaLoadBalancer from the manifest
and enable_load_balancer from the operator configuration. The old
operator configuration options become no-op with this commit. For
the old manifest options, `useLoadBalancer` and `replicaLoadBalancer`
are still consulted, but only in the absense of the new ones
(enableMasterLoadBalancer and enableReplicaLoadBalancer).
- Make sure the endpoint being created during the sync receives proper
addresses subset. This is more critical for the replicas, as for the
masters Patroni will normally re-create the endpoint before the
operator.
- Avoid creating the replica endpoint, since it will be created automatically
by the corresponding service.
- Update the README and unit tests.
Code review by @mgomezch and @zerg-junior
There used to be a masterLess flag that was supposed to indicate whether the cluster it belongs to runs without the acting master by design. At some point, as we didn't really have support for such clusters, the flag has been misused to indicate there is no master in the cluster. However, that was not done consistently (a cluster without all pods running would never be masterless, even when the master is not among the running pods) and it was based on the wrong assumption that the masterless cluster will remain masterless until the next attempt to change that flag, ignoring the possibility of master coming up or some node doing a successful promotion. Therefore, this PR gets rid of that flag completely.
When the cluster is running with 0 instances, there is obviously no master and it makes no sense to create any database objects inside the non-existing master. Therefore, this PR introduces an additional check for that.
recreatePods were assuming that the roles of the pods recorded when the function has stared will not change; for instance, terminated replica pods should start as replicas. Revisit that assumption by looking at the actual role of the re-spawned pods; that avoids a failover if some replica has promoted to the master role while being re-spawned. In addition, if the failover from the old master was unsuccessful, we used to stop and leave the old master running on an old pod, without recording this fact anywhere. This PR makes the failover failure emit a warning, but not stop recreating the last master pod; in the worst case, the running master will be terminated, however, this case is rather unlikely one.
As a side effect, make waitForPodLabel return the pod definition it waited for, avoiding extra API calls in recreatePods and movePodFromEndOfLifeNode
Previously, it was set to the lifecycle-status:ready, breaking a
lot of minikube deployments. Also it was not possible befor to run
with this label set to an empty value.
Document the effect of the label in the new section of the
documentation.
* Trigger the node migration on the lack of the readiness label.
* Examine the node's readiness status on node add.
Make sure we don't miss the not ready node, especially when the
operator is killed during the migration.
* Scalyr agent sidecar for log shipping
* Remove the default for the Scalyr image
Now the image needs to be specified explicitly to enable log shipping to
Scalyr. This removes the problem of having to generate the config file
or publish our agent image repository.
* Add configuration variable for Scalyr server URL
Defaults to the EU address.
* Alter style
Newlines are cheap and make code easier to edit/refactor, but ok.
* Fix StatefulSet comparison logic
I broke it when I made the comparison consider all containers in the
PostgreSQL pod.
* Introduce higher and lower bounds for the number of instances
Reduce the number of instances to the min_instances if it is lower and
to the max_instances if it is higher. -1 for either of those means there
is no lower or upper bound.
In addition, terminate the operator when there is a nonsense in the
configuration (i.e. max_instances < min_instances).
Reviewed by Jan Mußler and Sergey Dudoladov.
- make sure that the secrets for the system users (superuser, replication)
are not deleted when the main cluster is. Therefore, we can re-create
the cluster, potentially forcing Patroni to restore it from the backup
and enable Patroni to connect, since it will use the old password, not
the newly generated random one.
- when syncing users, always check whether they are already in the DB.
Previously, we did this only for the sync cluster case, but the new
cluster could be actually the one restored from the backup by Patroni,
having all or some of the users already in place.
- delete endponts last. Patroni uses the $clustername endpoint in order
to store the leader related metadata. If we remove it before removing
all pods, one of those pods running Patroni will re-create it and the
next attempt to create the cluster with the same name will stuble on
the existing endpoint.
- Use db.Exec instead of db.Query for queries that expect no result.
This also fixes the issue with the DB creation, since we didn't
release an empty Row object it was not possible to create more than
one database for a cluster.
* Avoid overwriting critical users.
Disallow defining new users either in the cluster manifest, teams
API or infrastructure roles with the names mentioned in the new
protected_role_names parameter (list of comma-separated names)
Additionally, forbid defining a user with the name matching either
super_username or replication_username, so that we don't overwrite
system roles required for correct working of the operator itself.
Also, clear PostgreSQL roles on each sync first in order to avoid using
the old definitions that are no longer present in the current manifest,
infrastructure roles secret or the teams API.
Previously, the operator started to move the pods off the nodes to be
decomissioned by watching the eol_node_label value. Every new postgres
pod has been created with the anti-affinity to that label, making sure
that the pods being moved won't land on another to be decomissioned
node.
The changes introduce another label that indicates the ready node. The
new pod affinity will esnure that the pod is only scheduled to the node
marked as ready, discarding the previous anti-affinity. That way the
nodes can transition from the pending-decomission to the other statuses
(drained, terminating) without having pods suddently scaled to them.
In addition, rename the label that triggers the start of the upgrade
process to node_eol_label (for consistency with node_readiness_label)
and set its default vvalue to lifecycle-status:pending-decomission.
- fix the lack of closing the cursor for the query that returned no
rows.
- fix syncing of the user options, as previously those were not
fetched from the database.