* Protected and system users can't be a connection pool user
It's not supported, neither it's a best practice. Also fix potential null
pointer access. For protected users it makes sense by intent of protecting this
users (e.g. from being overriden or used as something else than supposed). For
system users the reason is the same as for superuser, it's about replicastion
user and it's under patroni control.
This is implemented on both levels, operator config and postgresql manifest.
For the latter we just use default name in this case, assuming that operator
config is always correct. For the former, since it's a serious
misconfiguration, operator panics.
* kubernetes_use_configmap
* Update manifests/postgresql-operator-default-configuration.yaml
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* Update manifests/configmap.yaml
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* Update charts/postgres-operator/values.yaml
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* go.fmt
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
Connection pooler support
Add support for a connection pooler. The idea is to make it generic enough to
be able to switch between different implementations (e.g. pgbouncer or
odyssey). Operator needs to create a deployment with pooler and a service for
it to access.
For connection pool to work properly, a database needs to be prepared by
operator, namely a separate user have to be created with an access to an
installed lookup function (to fetch credential for other users).
This setups is supposed to be used only by robot/application users. Usually a
connection pool implementation is more CPU bounded, so it makes sense to create
several pods for connection pool with more emphasize on cpu resources. At the
moment there are no special affinity or tolerations assigned to bring those
pods closer to the database. For availability purposes minimal number of
connection pool pods is 2, ideally they have to be distributed between
different nodes/AZ, but it's not enforced in the operator itself. Available
configuration supposed to be ergonomic and in the normal case require minimum
changes to a manifest to enable connection pool. To have more control over the
configuration and functionality on the pool side one can customize the
corresponding docker image.
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* bump version to 1.4.0 + some polishing
* align version for UI chart
* update user docs to warn for standby replicas
* minor log message changes for RBAC resources
* define postgres-pod clusterrole and align rbac in chart
* align UI chart rbac with operator and update doc
* operator RBAC needs podsecuritypolicy to grant it to postgres-pod
* add CRD manifests with validation
* update documentation
* patroni slots is not an array but a nested hash map
* make deps call tools
* cover validation in docs and export it in crds.go
* add toggle to disable creation of CRD validation and document it
* use templated service account also for CRD-configured helm deployment
* Added possibility to add custom annotations to LoadBalancer service.
* Added parameters for custom endpoint, access and secret key for logical backup.
* Modified dump.sh so it knows how to handle new features. Configurable S3 SSE
* align config map, operator config, helm chart values and templates
* follow helm chart conventions also in CRD templates
* split up values files and add comments
* avoid yaml confusion in postgres manifests
* bump spilo version and use example for logical_backup_s3_bucket
* add ConfigTarget switch to values
* StatefulSet fsGroup config option to allow non-root spilo
* Allow Postgres CRD to overide SpiloFSGroup of the Operator.
* Document FSGroup of a Pod cannot be changed after creation.
* database.go: substitute hardcoded .svc.cluster.local dns suffix with config parameter
Use the pod's configured dns search path, for clusters where .svc.cluster.local is not correct.
* Config option to allow Spilo container to run non-privileged.
Runs non-privileged by default.
Fixes#395
* add spilo_privileged to manifests/configmap.yaml
* add spilo_privileged to helm chart's values.yaml
Add possibility to mount a tmpfs volume to /dev/shm to avoid issues like
[this](https://github.com/docker-library/postgres/issues/416). To achieve that
two new options were introduced:
* `enableShmVolume` to PostgreSQL manifest, to specify whether or not mount
this volume per database cluster
* `enable_shm_volume` to operator configuration, to specify whether or not mount
per operator.
The first one, `enableShmVolume` takes precedence to allow us to be more flexible.
Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on.
Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API.
All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1
Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator.
Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into
Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it.
Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with
the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes.
Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files.
Update client-go dependencies.
Minor reformatting and renaming.
* Allow configuring pod priority globally and per cluster.
Allow to specify pod priority class for all pods managed by the operator,
as well as for those belonging to individual clusters.
Controlled by the pod_priority_class_name operator configuration
parameter and the podPriorityClassName manifest option.
See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
for the explanation on how to define priority classes since Kubernetes 1.8.
Some import order changes are due to go fmt.
Removal of OrphanDependents deprecated field.
Code review by @zerg-junior
A repair is a sync scan that acts only on those clusters that indicate
that the last add, update or sync operation on them has failed. It is
supposed to kick in more frequently than the repair scan. The repair
scan still remains to be useful to fix the consequences of external
actions (i.e. someone deletes a postgres-related service by mistake)
unbeknownst to the operator.
The repair scan is controlled by the new repair_period parameter in the
operator configuration. It has to be at least 2 times more frequent than
a sync scan to have any effect (a normal sync scan will update both last
synced and last repaired attributes of the controller, since repair is
just a sync underneath).
A repair scan could be queued for a cluster that is already being synced
if the sync period exceeds the interval between repairs. In that case a
repair event will be discarded once the corresponding worker finds out
that the cluster is not failing anymore.
Review by @zerg-junior
* During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.
* Define sidecars in the operator configuration.
Right now only the name and the docker image can be defined, but with
the help of the pod_environment_configmap parameter arbitrary
environment variables can be passed to the sidecars.
* Refactoring around generatePodTemplate.
Original implementation of per-cluster sidecars by @theRealWardo
Per review by @zerg-junior and @Jan-M
* Depreate old LB options, fix endpoint sync.
- deprecate useLoadBalancer, replicaLoadBalancer from the manifest
and enable_load_balancer from the operator configuration. The old
operator configuration options become no-op with this commit. For
the old manifest options, `useLoadBalancer` and `replicaLoadBalancer`
are still consulted, but only in the absense of the new ones
(enableMasterLoadBalancer and enableReplicaLoadBalancer).
- Make sure the endpoint being created during the sync receives proper
addresses subset. This is more critical for the replicas, as for the
masters Patroni will normally re-create the endpoint before the
operator.
- Avoid creating the replica endpoint, since it will be created automatically
by the corresponding service.
- Update the README and unit tests.
Code review by @mgomezch and @zerg-junior
Previously, it was set to the lifecycle-status:ready, breaking a
lot of minikube deployments. Also it was not possible befor to run
with this label set to an empty value.
Document the effect of the label in the new section of the
documentation.
* Trigger the node migration on the lack of the readiness label.
* Examine the node's readiness status on node add.
Make sure we don't miss the not ready node, especially when the
operator is killed during the migration.
* Scalyr agent sidecar for log shipping
* Remove the default for the Scalyr image
Now the image needs to be specified explicitly to enable log shipping to
Scalyr. This removes the problem of having to generate the config file
or publish our agent image repository.
* Add configuration variable for Scalyr server URL
Defaults to the EU address.
* Alter style
Newlines are cheap and make code easier to edit/refactor, but ok.
* Fix StatefulSet comparison logic
I broke it when I made the comparison consider all containers in the
PostgreSQL pod.
* Introduce higher and lower bounds for the number of instances
Reduce the number of instances to the min_instances if it is lower and
to the max_instances if it is higher. -1 for either of those means there
is no lower or upper bound.
In addition, terminate the operator when there is a nonsense in the
configuration (i.e. max_instances < min_instances).
Reviewed by Jan Mußler and Sergey Dudoladov.
* Avoid overwriting critical users.
Disallow defining new users either in the cluster manifest, teams
API or infrastructure roles with the names mentioned in the new
protected_role_names parameter (list of comma-separated names)
Additionally, forbid defining a user with the name matching either
super_username or replication_username, so that we don't overwrite
system roles required for correct working of the operator itself.
Also, clear PostgreSQL roles on each sync first in order to avoid using
the old definitions that are no longer present in the current manifest,
infrastructure roles secret or the teams API.
Previously, the operator started to move the pods off the nodes to be
decomissioned by watching the eol_node_label value. Every new postgres
pod has been created with the anti-affinity to that label, making sure
that the pods being moved won't land on another to be decomissioned
node.
The changes introduce another label that indicates the ready node. The
new pod affinity will esnure that the pod is only scheduled to the node
marked as ready, discarding the previous anti-affinity. That way the
nodes can transition from the pending-decomission to the other statuses
(drained, terminating) without having pods suddently scaled to them.
In addition, rename the label that triggers the start of the upgrade
process to node_eol_label (for consistency with node_readiness_label)
and set its default vvalue to lifecycle-status:pending-decomission.
- fix the lack of closing the cursor for the query that returned no
rows.
- fix syncing of the user options, as previously those were not
fetched from the database.
Add options to the PgUser structure, potentially allowing to set
per-role options in the cluster definition as well.
Introduce api_roles_configuration operator option with the default
of log_statement=all
* Add node toleration config to PodSpec
This allows to taint nodes dedicated to Postgres and prevents other pods from running on these nodes.
* Document taint and toleration setup
And remove setting from default operator ConfigMap
* Allow to overwrite tolerations with Postgres manifest
Allow cloning clusters from the operator.
The changes add a new JSON node `clone` with possible values `cluster`
and `timestamp`. `cluster` is mandatory, and setting a non-empty
`timestamp` triggers wal-e point in time recovery. Spilo and Patroni do
the whole heavy-lifting, the operator just defines certain variables and
gathers some data about how to connect to the host to clone or the
target S3 bucket.
As a minor change, set the image pull policy to IfNotPresent instead
of Always to simplify local testing.
Change the default replication username to standby.
* Deny all requests to the load balancer by default.
* Operator-wide toggle for the load-balancer.
* Define per-cluster useLoadBalancer option.
If useLoadBalancer is not set - then operator-wide defaults take place. If it
is true - the load balancer is created, otherwise a service type clusterIP is
created.
Internally, we have to completely replace the service if the service type
changes. We cannot patch, since some fields from the old service that will
remain after patch are incompatible with the new one, and handling them
explicitly when updating the service is ugly and error-prone. We cannot
update the service because of the immutable fields, that leaves us the only
option of deleting the old service and creating the new one. Unfortunately,
there is still an issue of unnecessary removal of endpoints associated with
the service, it will be addressed in future commits.
* Revert the unintended effect of go fmt
* Recreate endpoints on service update.
When the service type is changed, the service is deleted and then
the one with the new type is created. Unfortnately, endpoints are
deleted as well. Re-create them afterwards, preserving the original
addresses stored in them.
* Improve error messages and comments. Use generate instead of gen in names.
The flag adds a replica service with the name cluster_name-repl and
a DNS name that defaults to {cluster}-repl.{team}.{hostedzone}.
The implementation converted Service field of the cluster into a map
with one or two elements and deals with the cases when the new flag
is changed on a running cluster
(the update and the sync should create or delete the replica service).
In order to pick up master and replica service and master endpoint
when listing cluster resources.
* Update the spec when updating the cluster.
Command-line options --nodatabaseaccess and --noteamsapi disable all
teams api interaction and access to the Postgres database. This is
useful for debugging purposes when the operator runs out of cluster
(with --outofcluster flag).
The same effect can be achieved by setting enable_db_access and/or
enable_teams_api to false.
Run operations concerning multiple clusters in parallel. Each cluster gets its
own worker in order to create, update, sync or delete clusters. Each worker
acquires the lock on a cluster. Subsequent operations on the same cluster
have to wait until the current one finishes. There is a pool of parallel
workers, configurable with the `workers` parameter in the configmap and set by
default to 4. The cluster-related tasks are assigned to the workers based on
a cluster name: the tasks for the same cluster will be always assigned to the
same worker. There is no blocking between workers, although there is a chance
that a single worker will become a bottleneck if too many clusters are
assigned to it; therefore, for large-scale deployments it might be necessary
to bump up workers from the default value.
* Avoid "bulk-comparing" pod resources during sync.
First attempt to fix bogus restarts due to the reported mismatch
of container resources where one of the resources is an empty struct,
while the other has all fields set to nil.
In addition, add an ability to set limits and requests per pod, as well as the operator-level defaults.
* Add infrastructure roles configured globally.
Those are the roles defined in the operator itself. The operator's
configuration refers to the secret containing role names, passwords
and membership information. While they are referred to as roles, in
reality those are users.
In addition, improve the regex to filter out invalid users and
make sure user secret names are compatible with DNS name spec.
Add an example manifest for the infrastructure roles.
- Set WAL_S3_BUCKET to point WAL-E where to fetch/store WAL files
- Set annotations/iam.amazonaws.com/role to set the role to access AWS"
The new env vairables are PGOP_WAL_S3_BUCKET and PGOP_KUBE_IAM_ROLE.
- add a new environment variable for triggering debug log level
- show both new, old object and diff during syncs and updates
- use pretty package to pretty-print go structures
-