Avoid sharing pointers to the same spec data between the informer
and the clusters. The only catch is that the error field is cleared
during deepcopy, since it is an interface that may contain private
fields that cannot be copied, however, the error is only used when
the manifest is parsed and before it is queued, therefore, we never
refer to that field in the cluster structure.
* Sanity checks for the cluster name, improve tests.
- check that the normal and clone cluster name complies with the valid
service name. For clone cluster, only do it if clone timestamp is not
set; with a clone timestamp set, the clone name points to the S3 bucket
- add tests and improve existing ones, making sure we don't call Error()
method for an empty error, as well as that we don't miss cases where
expected error is not empty, but actual call to be tested does not
return an error.
Code review by @zerg-junior and @Jan-M
Avoid reusing WAL S3 buckets of the older cluster with the same name as the existing one.
For the new cluster, the S3 bucket name will include a suffix that is equal to the UID of the PostgreSQL object describing the cluster. That way, the bucket name will stay the same for all members iff they correspond to the same PostgreSQL cluster object.
When "clone: uid:" key is present in the cluster manifest and the cluster is cloned from an S3 bucket (currently that happens if the endTimestamp is present in the clone description) the S3 bucket to clone from is suffixed with the -uid value.
* Add node toleration config to PodSpec
This allows to taint nodes dedicated to Postgres and prevents other pods from running on these nodes.
* Document taint and toleration setup
And remove setting from default operator ConfigMap
* Allow to overwrite tolerations with Postgres manifest
Allow cloning clusters from the operator.
The changes add a new JSON node `clone` with possible values `cluster`
and `timestamp`. `cluster` is mandatory, and setting a non-empty
`timestamp` triggers wal-e point in time recovery. Spilo and Patroni do
the whole heavy-lifting, the operator just defines certain variables and
gathers some data about how to connect to the host to clone or the
target S3 bucket.
As a minor change, set the image pull policy to IfNotPresent instead
of Always to simplify local testing.
Change the default replication username to standby.
* client-go v4.0.0-beta0
* remove unnecessary methods for tpr object
* rest client: use interface instead of structure pointer
* proper names for constants; some clean up for log messages
* remove teams api client from controller and make it per cluster
* Deny all requests to the load balancer by default.
* Operator-wide toggle for the load-balancer.
* Define per-cluster useLoadBalancer option.
If useLoadBalancer is not set - then operator-wide defaults take place. If it
is true - the load balancer is created, otherwise a service type clusterIP is
created.
Internally, we have to completely replace the service if the service type
changes. We cannot patch, since some fields from the old service that will
remain after patch are incompatible with the new one, and handling them
explicitly when updating the service is ugly and error-prone. We cannot
update the service because of the immutable fields, that leaves us the only
option of deleting the old service and creating the new one. Unfortunately,
there is still an issue of unnecessary removal of endpoints associated with
the service, it will be addressed in future commits.
* Revert the unintended effect of go fmt
* Recreate endpoints on service update.
When the service type is changed, the service is deleted and then
the one with the new type is created. Unfortnately, endpoints are
deleted as well. Re-create them afterwards, preserving the original
addresses stored in them.
* Improve error messages and comments. Use generate instead of gen in names.
The flag adds a replica service with the name cluster_name-repl and
a DNS name that defaults to {cluster}-repl.{team}.{hostedzone}.
The implementation converted Service field of the cluster into a map
with one or two elements and deals with the cases when the new flag
is changed on a running cluster
(the update and the sync should create or delete the replica service).
In order to pick up master and replica service and master endpoint
when listing cluster resources.
* Update the spec when updating the cluster.
* Avoid "bulk-comparing" pod resources during sync.
First attempt to fix bogus restarts due to the reported mismatch
of container resources where one of the resources is an empty struct,
while the other has all fields set to nil.
In addition, add an ability to set limits and requests per pod, as well as the operator-level defaults.
* Add version label to the cluster.
According to the STUPS team the daemon that exports logs to scalyr
stops the export if the version label is missing.
* Move label names to constants.
* Run go fmt
* Add infrastructure roles configured globally.
Those are the roles defined in the operator itself. The operator's
configuration refers to the secret containing role names, passwords
and membership information. While they are referred to as roles, in
reality those are users.
In addition, improve the regex to filter out invalid users and
make sure user secret names are compatible with DNS name spec.
Add an example manifest for the infrastructure roles.
introduce Pod events channel;
add parsing of the MaintenanceWindows section;
skip deleting Etcd key on cluster delete;
use external etcd host;
watch for tpr/pods in the namespace of the operator pod only;