Commit Graph

102 Commits

Author SHA1 Message Date
Manuel Gómez 15c278d4e8
Scalyr agent sidecar for log shipping (#190)
* Scalyr agent sidecar for log shipping

* Remove the default for the Scalyr image

Now the image needs to be specified explicitly to enable log shipping to
Scalyr.  This removes the problem of having to generate the config file
or publish our agent image repository.

* Add configuration variable for Scalyr server URL

Defaults to the EU address.

* Alter style

Newlines are cheap and make code easier to edit/refactor, but ok.

* Fix StatefulSet comparison logic

I broke it when I made the comparison consider all containers in the
PostgreSQL pod.
2017-12-21 15:34:26 +01:00
Oleksii Kliukin bf80f5225e
Introduce higher and lower bounds for the number of instances (#178)
* Introduce higher and lower bounds for the number of instances

Reduce the number of instances to the min_instances if it is lower and
to the max_instances if it is higher. -1 for either of those means there
is no lower or upper bound.

In addition, terminate the operator when there is a nonsense in the
configuration (i.e. max_instances < min_instances).

Reviewed by Jan Mußler and Sergey Dudoladov.
2017-12-15 16:02:50 +01:00
Georg Kunz e8d9c75949 Allow custom Postgres pod environment variables 2017-12-14 14:39:33 +01:00
Oleksii Kliukin 1fb8cf7ea0
Avoid overwriting critical users. (#172)
* Avoid overwriting critical users.

Disallow defining new users either in the cluster manifest, teams
API or infrastructure roles with the names mentioned in the new
protected_role_names parameter (list of comma-separated names)

Additionally, forbid defining a user with the name matching either
super_username or replication_username, so that we don't overwrite
system roles required for correct working of the operator itself.

Also, clear PostgreSQL roles on each sync first in order to avoid using
the old definitions that are no longer present in the current manifest,
infrastructure roles secret or the teams API.
2017-12-05 14:27:12 +01:00
Oleksii Kliukin dd0affc390 Tweak our reaction to the cluster upgrade process.
Previously, the operator started to move the pods off the nodes to be
decomissioned by watching the eol_node_label value. Every new postgres
pod has been created with the anti-affinity to that label, making sure
that the pods being moved won't land on another to be decomissioned
node.

The changes introduce another label that indicates the ready node.  The
new pod affinity will esnure that the pod is only scheduled to the node
marked as ready, discarding the previous anti-affinity.  That way the
nodes can transition from the pending-decomission to the other statuses
(drained, terminating) without having pods suddently scaled to them.

In addition, rename the label that triggers the start of the upgrade
process to node_eol_label (for consistency with node_readiness_label)
and set its default vvalue to lifecycle-status:pending-decomission.
2017-11-30 14:11:49 +01:00
Oleksii Kliukin 1ffe98ba9f Fix the connection leak and user options sync.
- fix the lack of closing the cursor for the query that returned no
rows.
- fix syncing of the user options, as previously those were not
  fetched from the database.
2017-11-27 16:46:34 +01:00
Oleksii Kliukin 086ead03f5 Warn about attempts to use escape quotes. 2017-11-22 10:43:35 +01:00
Oleksii Kliukin 975b21f633 Rename api roles configuration parameter.
Change api_roles_configuration to team_api_role_configuration
2017-11-22 10:43:35 +01:00
Oleksii Kliukin 2352fc9a39 go fmt run 2017-11-22 10:43:35 +01:00
Oleksii Kliukin 71f57c9fe3 Fix escaping of parameter values and extra spaces.
- document the newly introduced option (for now in the main README)
- make query error output more readable.
2017-11-22 10:43:35 +01:00
Oleksii Kliukin 415a7fdc4d Allow global configuration options for API roles.
Add options to the PgUser structure, potentially allowing to set
per-role options in the cluster definition as well.

Introduce api_roles_configuration operator option with the default
of log_statement=all
2017-11-22 10:43:35 +01:00
Oleksii Kliukin c25e849fe4 Fix a failure to create new statefulset at sync.
Also do a fmt run.
2017-11-08 18:24:17 +01:00
Georg Kunz 47dd766fa7 Add node toleration config to PodSpec (#151)
* Add node toleration config to PodSpec

This allows to taint nodes dedicated to Postgres and prevents other pods from running on these nodes.

* Document taint and toleration setup

And remove setting from default operator ConfigMap

* Allow to overwrite tolerations with Postgres manifest
2017-11-02 19:10:44 +01:00
Oleksii Kliukin eba23279c8 Kube cluster upgrade 2017-10-19 10:49:42 +02:00
Jan Mussler cec695d48e Superuser toggle for team members
Make superuser toggleable for team members. Add and "admin" role to team members if superuser is disabled.
2017-10-12 15:01:54 +02:00
Murat Kabilov a35e9c6119 move from tpr to crd 2017-10-06 15:12:08 +02:00
Murat Kabilov 93d4bf2b55 Merge branch 'master' into api-improvements 2017-09-26 14:47:13 +02:00
Murat Kabilov 9a66e09b88 cluster history api endpoint 2017-09-26 14:30:45 +02:00
Murat Kabilov d876f4d88e set secret name template via config map 2017-09-18 14:25:09 +02:00
Oleksii Kliukin 8b85935a7a Allow cloning clusters from the operator. (#90)
Allow cloning clusters from the operator.

The changes add a new JSON node `clone` with possible values `cluster`
and `timestamp`. `cluster` is mandatory, and setting a non-empty
`timestamp` triggers wal-e point in time recovery. Spilo and Patroni do
the whole heavy-lifting, the operator just defines certain variables and
gathers some data about how to connect to the host to clone or the
target S3 bucket.

As a minor change, set the image pull policy to IfNotPresent instead
of Always to simplify local testing.

Change the default replication username to standby.
2017-09-08 16:47:03 +02:00
Murat Kabilov 71dfb33b2b make pod termination grace period configurable 2017-08-18 16:38:25 +02:00
Murat Kabilov 82d5583809 add diagnostic api http server 2017-08-15 12:20:09 +02:00
Murat Kabilov 51fdfb90f7 log cluster and controller events in the ringlog via logrus hook 2017-08-15 12:16:09 +02:00
Oleksii Kliukin 8b58782a4a fix pam_role_name parameter name. 2017-08-02 17:55:06 +02:00
Murat Kabilov cf663cb841 Fix golint warnings 2017-08-01 16:08:56 +02:00
Murat Kabilov 4f36e447c3 Skip config params with no values (#62) 2017-07-14 17:22:25 +02:00
Oleksii Kliukin 00150711e4 Configure load balancer on a per-cluster and operator-wide level (#57)
* Deny all requests to the load balancer by default.
* Operator-wide toggle for the load-balancer.
* Define per-cluster useLoadBalancer option.

If useLoadBalancer is not set - then operator-wide defaults take place. If it
is true - the load balancer is created, otherwise a service type clusterIP is
created.

Internally, we have to completely replace the service if the service type
changes. We cannot patch, since some fields from the old service that will
remain after patch are incompatible with the new one, and handling them
explicitly when updating the service is ugly and error-prone. We cannot
update the service because of the immutable fields, that leaves us the only
option of deleting the old service and creating the new one. Unfortunately,
there is still an issue of unnecessary removal of endpoints associated with
the service, it will be addressed in future commits.

* Revert the unintended effect of go fmt

* Recreate endpoints on service update.

When the service type is changed, the service is deleted and then
the one with the new type is created. Unfortnately, endpoints are
deleted as well. Re-create them afterwards, preserving the original
addresses stored in them.

* Improve error messages and comments. Use generate instead of gen in names.
2017-06-30 13:38:49 +02:00
Murat Kabilov e104a67260 Fix resync of the clusters 2017-06-08 11:51:48 +02:00
Oleksii Kliukin dc36c4ca12 Implement replicaLoadBalancer boolean flag. (#38)
The flag adds a replica service with the name cluster_name-repl and
a DNS name that defaults to {cluster}-repl.{team}.{hostedzone}.

The implementation converted Service field of the cluster into a map
with one or two elements and deals with the cases when the new flag
is changed on a running cluster
(the update and the sync should create or delete the replica service).
In order to pick up master and replica service and master endpoint
when listing cluster resources.

* Update the spec when updating the cluster.
2017-06-07 13:54:17 +02:00
Murat Kabilov d34273543e Fix the golint, gosimple warnings 2017-05-18 17:38:54 +02:00
Murat Kabilov 95a57d1e4f Use named arguments in the DNS name format 2017-05-18 17:23:59 +02:00
Oleksii Kliukin 5adceceb36 go fmt run 2017-05-12 17:48:25 +02:00
Oleksii Kliukin abd04e6f5a Avoid abbreviations in user-facing parameters. 2017-05-12 17:44:51 +02:00
Oleksii Kliukin 03064637f1 Allow disabling access to the DB and the Teams API.
Command-line options --nodatabaseaccess and --noteamsapi disable all
teams api interaction and access to the Postgres database. This is
useful for debugging purposes when the operator runs out of cluster
(with --outofcluster flag).

The same effect can be achieved by setting enable_db_access and/or
enable_teams_api to false.
2017-05-12 17:40:48 +02:00
Murat Kabilov 92d7fbf372 replace github.bus.zalan.do with github.cm/zalando-incubator 2017-05-12 11:50:16 +02:00
Murat Kabilov 2370659c69 Parallel cluster processing
Run operations concerning multiple clusters in parallel. Each cluster gets its
own worker in order to create, update, sync or delete clusters.  Each worker
acquires the lock on a cluster.  Subsequent operations on the same cluster
have to wait until the current one finishes.  There is a pool of parallel
workers, configurable with the `workers` parameter in the configmap and set by
default to 4. The cluster-related tasks  are assigned to the workers based on
a cluster name: the tasks for the same cluster will be always assigned to the
same worker. There is no blocking between workers, although there is a chance
that a single worker will become a bottleneck if too many clusters are
assigned to it; therefore, for large-scale deployments it might be necessary
to bump up workers from the default value.
2017-05-12 11:41:35 +02:00
Oleksii Kliukin 1c4bce86df Avoid "bulk-comparing" pod resources during sync. (#109)
* Avoid "bulk-comparing" pod resources during sync.

First attempt to fix bogus restarts due to the reported mismatch
of container resources where one of the resources is an empty struct,
while the other has all fields set to nil.

In addition, add an ability to set limits and requests per pod, as well as the operator-level defaults.
2017-05-12 11:41:35 +02:00
Murat Kabilov 8026c69222 update default config param values 2017-05-12 11:41:34 +02:00
Murat Kabilov da438aab3a Use ConfigMap to store operator's config 2017-05-12 11:41:34 +02:00
Murat Kabilov 08c0e3b6dd Use unified type for the namespaced object names 2017-05-12 11:41:34 +02:00
Oleksii Kliukin 71b93b4cc2 Feature/infrastructure roles (#91)
* Add infrastructure roles configured globally.

Those are the roles defined in the operator itself. The operator's
configuration refers to the secret containing role names, passwords
and membership information. While they are referred to as roles, in
reality those are users.

In addition, improve the regex to filter out invalid users and
make sure user secret names are compatible with DNS name spec.

Add an example manifest for the infrastructure roles.
2017-05-12 11:41:33 +02:00
Murat Kabilov dd2ed5ff9d Add team name to tpr object metadata name 2017-05-12 11:41:33 +02:00
Oleksii Kliukin 5b66d0adba Correct go json tags (extra space). 2017-05-12 11:41:32 +02:00
Oleksii Kliukin a5f0ef10d0 go fmt run 2017-05-12 11:41:31 +02:00
Oleksii Kliukin 0764505a10 correct the wal bucket parameter name. 2017-05-12 11:41:31 +02:00
Oleksii Kliukin 7841b85892 Add configuration to support running WAL-E.
- Set WAL_S3_BUCKET to point WAL-E where to fetch/store WAL files
- Set annotations/iam.amazonaws.com/role to set the role to access AWS"

The new env vairables are PGOP_WAL_S3_BUCKET and PGOP_KUBE_IAM_ROLE.
2017-05-12 11:41:31 +02:00
Murat Kabilov 852c5beae5 Check etcd key availability for the new cluster 2017-05-12 11:41:31 +02:00
Oleksii Kliukin b69b6b26e5 git fmt run 2017-05-12 11:41:30 +02:00
Murat Kabilov 310c119dfa Display config on operator start up 2017-05-12 11:41:30 +02:00
Murat Kabilov a97dfb07de fix struct tag delimiter 2017-05-12 11:41:30 +02:00
Oleksii Kliukin 3a4c6268be Increase log verbosity, namely for object updates.
- add a new environment variable for triggering debug log level
- show both new, old object and diff during syncs and updates
- use pretty package to pretty-print go structures
-
2017-05-12 11:41:29 +02:00
Murat Kabilov c2d2a67ad5 Get config from environment variables;
ignore pg major version change;
get rid of resources package;
2017-05-12 11:41:29 +02:00