Previously, it was set to the lifecycle-status:ready, breaking a
lot of minikube deployments. Also it was not possible befor to run
with this label set to an empty value.
Document the effect of the label in the new section of the
documentation.
* Trigger the node migration on the lack of the readiness label.
* Examine the node's readiness status on node add.
Make sure we don't miss the not ready node, especially when the
operator is killed during the migration.
* Scalyr agent sidecar for log shipping
* Remove the default for the Scalyr image
Now the image needs to be specified explicitly to enable log shipping to
Scalyr. This removes the problem of having to generate the config file
or publish our agent image repository.
* Add configuration variable for Scalyr server URL
Defaults to the EU address.
* Alter style
Newlines are cheap and make code easier to edit/refactor, but ok.
* Fix StatefulSet comparison logic
I broke it when I made the comparison consider all containers in the
PostgreSQL pod.
* Introduce higher and lower bounds for the number of instances
Reduce the number of instances to the min_instances if it is lower and
to the max_instances if it is higher. -1 for either of those means there
is no lower or upper bound.
In addition, terminate the operator when there is a nonsense in the
configuration (i.e. max_instances < min_instances).
Reviewed by Jan Mußler and Sergey Dudoladov.
- make sure that the secrets for the system users (superuser, replication)
are not deleted when the main cluster is. Therefore, we can re-create
the cluster, potentially forcing Patroni to restore it from the backup
and enable Patroni to connect, since it will use the old password, not
the newly generated random one.
- when syncing users, always check whether they are already in the DB.
Previously, we did this only for the sync cluster case, but the new
cluster could be actually the one restored from the backup by Patroni,
having all or some of the users already in place.
- delete endponts last. Patroni uses the $clustername endpoint in order
to store the leader related metadata. If we remove it before removing
all pods, one of those pods running Patroni will re-create it and the
next attempt to create the cluster with the same name will stuble on
the existing endpoint.
- Use db.Exec instead of db.Query for queries that expect no result.
This also fixes the issue with the DB creation, since we didn't
release an empty Row object it was not possible to create more than
one database for a cluster.
* Avoid overwriting critical users.
Disallow defining new users either in the cluster manifest, teams
API or infrastructure roles with the names mentioned in the new
protected_role_names parameter (list of comma-separated names)
Additionally, forbid defining a user with the name matching either
super_username or replication_username, so that we don't overwrite
system roles required for correct working of the operator itself.
Also, clear PostgreSQL roles on each sync first in order to avoid using
the old definitions that are no longer present in the current manifest,
infrastructure roles secret or the teams API.
Previously, the operator started to move the pods off the nodes to be
decomissioned by watching the eol_node_label value. Every new postgres
pod has been created with the anti-affinity to that label, making sure
that the pods being moved won't land on another to be decomissioned
node.
The changes introduce another label that indicates the ready node. The
new pod affinity will esnure that the pod is only scheduled to the node
marked as ready, discarding the previous anti-affinity. That way the
nodes can transition from the pending-decomission to the other statuses
(drained, terminating) without having pods suddently scaled to them.
In addition, rename the label that triggers the start of the upgrade
process to node_eol_label (for consistency with node_readiness_label)
and set its default vvalue to lifecycle-status:pending-decomission.
- fix the lack of closing the cursor for the query that returned no
rows.
- fix syncing of the user options, as previously those were not
fetched from the database.
- search_path accepts a list of values that cannot be quoted, as
quoting would make PostgreSQL interpret the result as a single
value. Since we require quoting of values with commas in the
operator's configMap in order to avoid confusing them with the
separate map entities, we need to strip those quotes before
passing the value to PostgreSQL.
- make ftm run
A value in a configMap that is a map itself
(a key:value string separated by commas) may include commans inside
quotes (i.e. search_path:"public,"$user"). The changes make marshaling
code process such cases correctly.
Add options to the PgUser structure, potentially allowing to set
per-role options in the cluster definition as well.
Introduce api_roles_configuration operator option with the default
of log_statement=all
* Add node toleration config to PodSpec
This allows to taint nodes dedicated to Postgres and prevents other pods from running on these nodes.
* Document taint and toleration setup
And remove setting from default operator ConfigMap
* Allow to overwrite tolerations with Postgres manifest
Be more rigorous about validating user flags.
Only accept CREATE ROLE flags that doesn't have any params (i.e.
not ADMIN or CONNECTION LIMIT). Check that both flag and NOflag
are not used at the same time.
Allow cloning clusters from the operator.
The changes add a new JSON node `clone` with possible values `cluster`
and `timestamp`. `cluster` is mandatory, and setting a non-empty
`timestamp` triggers wal-e point in time recovery. Spilo and Patroni do
the whole heavy-lifting, the operator just defines certain variables and
gathers some data about how to connect to the host to clone or the
target S3 bucket.
As a minor change, set the image pull policy to IfNotPresent instead
of Always to simplify local testing.
Change the default replication username to standby.