* During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.
* Up until now, the operator read its own configuration from the
configmap. That has a number of limitations, i.e. when the
configuration value is not a scalar, but a map or a list. We use a
custom code based on github.com/kelseyhightower/envconfig to decode
non-scalar values out of plain text keys, but that breaks when the data
inside the keys contains both YAML-special elememtns (i.e. commas) and
complex quotes, one good example for that is search_path inside
`team_api_role_configuration`. In addition, reliance on the configmap
forced a flag structure on the configuration, making it hard to write
and to read (see
https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778).
The changes allow to supply the operator configuration in a proper YAML
file. That required registering a custom CRD to support the operator
configuration and provide an example at
manifests/postgresql-operator-default-configuration.yaml. At the moment,
both old configmap and the new CRD configuration is supported, so no
compatibility issues, however, in the future I'd like to deprecate the
configmap-based configuration altogether. Contrary to the
configmap-based configuration, the CRD one doesn't embed defaults into
the operator code, however, one can use the
manifests/postgresql-operator-default-configuration.yaml as a starting
point in order to build a custom configuration.
Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters
used to create the CRD were taken from the operator configuration, which
is not possible if the configuration itself is stored in the CRD object,
I've added the ability to specify them as environment variables
`CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively.
Per review by @zerg-junior and @Jan-M.
When there is an error happening upon deletion of the Kubernetes object
belonging to the cluster being removed, it makes no sense to abort the
deletion: the manifest will be removed anyway, therefore all the objects
after the one we aborted at will stay forever.
Avoid sharing pointers to the same spec data between the informer
and the clusters. The only catch is that the error field is cleared
during deepcopy, since it is an interface that may contain private
fields that cannot be copied, however, the error is only used when
the manifest is parsed and before it is queued, therefore, we never
refer to that field in the cluster structure.
987b434 introduced a new function that modifies the cluster spec in
memory before the cluster processes it. Unfortunately, the instance
being modified appeared to be the one stored internally in the
PostgresInformer, resulting in those modifications to be propagated with
futher cluster events and producing update loops in some occasions.
This commit makes sure we copy the spec before putting it into the
clusterEventQueues.
* Depreate old LB options, fix endpoint sync.
- deprecate useLoadBalancer, replicaLoadBalancer from the manifest
and enable_load_balancer from the operator configuration. The old
operator configuration options become no-op with this commit. For
the old manifest options, `useLoadBalancer` and `replicaLoadBalancer`
are still consulted, but only in the absense of the new ones
(enableMasterLoadBalancer and enableReplicaLoadBalancer).
- Make sure the endpoint being created during the sync receives proper
addresses subset. This is more critical for the replicas, as for the
masters Patroni will normally re-create the endpoint before the
operator.
- Avoid creating the replica endpoint, since it will be created automatically
by the corresponding service.
- Update the README and unit tests.
Code review by @mgomezch and @zerg-junior
* client-go v4.0.0-beta0
* remove unnecessary methods for tpr object
* rest client: use interface instead of structure pointer
* proper names for constants; some clean up for log messages
* remove teams api client from controller and make it per cluster
In order to support volumes different from EBS and filesystems other than EXT2/3/4 the respective code parts were implemented as interfaces. Adding the new resize for the volume or the filesystem will require implementing the interface, but no other changes in the cluster code itself.
Volume resizing first changes the EBS and the filesystem, and only afterwards is reflected in the Kubernetes "PersistentVolume" object. This is done deliberately to be able to check if the volume needs resizing by peeking at the Size of the PersistentVolume structure. We recheck, nevertheless, in the EBSVolumeResizer, whether the actual EBS volume size doesn't match the spec, since call to the AWS ModifyVolume is counted against the resize limit of once every 6 hours, even for those calls that shouldn't result in an actual resize (i.e. when the size matches the one for the running volume).
As a collateral, split the constants into multiple files, move the volume code into a separate file and fix minor issues related to the error reporting.
The sync adds or alters database roles based on the roles defined
in the cluster's TPR, Team API and operator's infrastructure roles.
At the moment, roles are not deleted, as it would be dangerous for
the robot roles in case TPR is misconfigured. In addition, ALTER
ROLE does not remove role options, i.e. SUPERUSER or CREATEROLE,
neither it removes role membership: only new options are added and
new role membership is granted. So far, options like NOSUPERUSER
and NOCREATEROLE won't be handed correctly, when mixed with the
non-negative counterparts, also NOLOGIN should be processed correctly.
The code assumes that only MD5 passwords are stored in the DB and
will likely break with the new SCRAM auth in PostgreSQL 10.
On the implementation side, create the new interface to abstract
roles merge and creation, move most of the role-based functionality
from cluster/pg into the new 'users' module, strip create user code
of special cases related to human-based users (moving them to init
instead) and fixed the password md5 generator to avoid processing
already encrypted passwords. In addition, moved the system roles
off the slice containing all other roles in order to avoid extra
efforts to avoid creating them.
Also, fix a leak in DB connections when the new connection is not
considered healthy and discarded without being closed. Initialize
the database during the sync phase before syncing users.
Run operations concerning multiple clusters in parallel. Each cluster gets its
own worker in order to create, update, sync or delete clusters. Each worker
acquires the lock on a cluster. Subsequent operations on the same cluster
have to wait until the current one finishes. There is a pool of parallel
workers, configurable with the `workers` parameter in the configmap and set by
default to 4. The cluster-related tasks are assigned to the workers based on
a cluster name: the tasks for the same cluster will be always assigned to the
same worker. There is no blocking between workers, although there is a chance
that a single worker will become a bottleneck if too many clusters are
assigned to it; therefore, for large-scale deployments it might be necessary
to bump up workers from the default value.
* move statefulset creation from cluster spec to the separate function
* sync cluster state with desired state;
* move out from arrays for cluster resources;
* recreate pods instead of deleting them in case of statefulset change
* check for master while creating cluster/updating pods
* simplify retryutil
* list pvc while listing resources
* name kubernetes resources with capital letter
* do rolling update in case of env variables change