Commit Graph

91 Commits

Author SHA1 Message Date
Murat Kabilov d2828e5ece remove var shading; fix imports 2017-08-15 15:59:10 +02:00
Murat Kabilov 272d7e1bcf rename service field to services as it contains service per role 2017-08-15 15:55:56 +02:00
Murat Kabilov 82f58b57d8 add cluster and controller methods for getting status 2017-08-15 12:11:06 +02:00
Murat Kabilov e26db66cb5 start all the log messages with lowercase letters 2017-08-15 10:12:36 +02:00
Oleksii Kliukin f15f93f479 Bugfix/close db connections (#78)
Open and close DB connections on-demand.

Previously, we used to leave the DB connection open while the
cluster was registered with the operator, potentially resutling
in dangled connections if the operator terminates abnormally.

Small refactoring around the role syncing code.
2017-08-10 10:10:00 +02:00
Murat Kabilov cf663cb841 Fix golint warnings 2017-08-01 16:08:56 +02:00
Murat Kabilov 1f8b37f33d Make use of kubernetes client-go v4
* client-go v4.0.0-beta0
* remove unnecessary methods for tpr object
* rest client: use interface instead of structure pointer
* proper names for constants; some clean up for log messages
* remove teams api client from controller and make it per cluster
2017-07-25 15:25:17 +02:00
Oleksii Kliukin 00150711e4 Configure load balancer on a per-cluster and operator-wide level (#57)
* Deny all requests to the load balancer by default.
* Operator-wide toggle for the load-balancer.
* Define per-cluster useLoadBalancer option.

If useLoadBalancer is not set - then operator-wide defaults take place. If it
is true - the load balancer is created, otherwise a service type clusterIP is
created.

Internally, we have to completely replace the service if the service type
changes. We cannot patch, since some fields from the old service that will
remain after patch are incompatible with the new one, and handling them
explicitly when updating the service is ugly and error-prone. We cannot
update the service because of the immutable fields, that leaves us the only
option of deleting the old service and creating the new one. Unfortunately,
there is still an issue of unnecessary removal of endpoints associated with
the service, it will be addressed in future commits.

* Revert the unintended effect of go fmt

* Recreate endpoints on service update.

When the service type is changed, the service is deleted and then
the one with the new type is created. Unfortnately, endpoints are
deleted as well. Re-create them afterwards, preserving the original
addresses stored in them.

* Improve error messages and comments. Use generate instead of gen in names.
2017-06-30 13:38:49 +02:00
Oleksii Kliukin 17826ee434 Go fmt run. 2017-06-12 10:24:23 +02:00
Oleksii Kliukin 51d73fb172 Replace service annotations when updating services.
In case the whole annotation changes (like the external DNS) we
don't want to keep the old one hanging around. Unline specs, we
don't expect anyone except the operator to change the annotations.

Use StrategicMergePatchType in order to replace the annotations
map completely.
2017-06-12 10:24:23 +02:00
Oleksii Kliukin dc36c4ca12 Implement replicaLoadBalancer boolean flag. (#38)
The flag adds a replica service with the name cluster_name-repl and
a DNS name that defaults to {cluster}-repl.{team}.{hostedzone}.

The implementation converted Service field of the cluster into a map
with one or two elements and deals with the cases when the new flag
is changed on a running cluster
(the update and the sync should create or delete the replica service).
In order to pick up master and replica service and master endpoint
when listing cluster resources.

* Update the spec when updating the cluster.
2017-06-07 13:54:17 +02:00
Oleksii Kliukin afce38f6f0 Fix error messages (#27)
Use lowercase for kubernetes objects
Use %v instead of %s for errors
Start error messages with a lowercase letter.
2017-05-22 14:12:06 +02:00
Murat Kabilov d34273543e Fix the golint, gosimple warnings 2017-05-18 17:38:54 +02:00
Murat Kabilov 3b6454c2dc add missed return (#20) 2017-05-17 11:54:50 +02:00
Oleksii Kliukin 4457ce4e47 Replace the statefulset if it cannot be updated. (#18)
Updates to statefulset spec for fields other than 'replicas' and
containers' are forbidden. However, it is possible to delete the old
statefulset without deleting its pods and create the new one, using the
changed specs. The new statefulset shall pick up the orphaned pods.

Change the statefulset's comparison to return the combined effect of
all checks, not just the first non-matching field.
2017-05-17 11:28:21 +02:00
Murat Kabilov 356be8f0f1 skip clusters with invalid spec 2017-05-16 16:46:37 +02:00
Murat Kabilov 92d7fbf372 replace github.bus.zalan.do with github.cm/zalando-incubator 2017-05-12 11:50:16 +02:00
Oleksii Kliukin ec3f24c3ee Honor the "spec-by-example" manifest we have.
No longer ignore custom PostgreSQL and Patroni parameters and initdb
options. Since all Patroni parameters that are not under initdb or
pg_hba are specified as a plain map, there is no way to distinguish
those that should go into the bootstrap section from those that should
stay in the local configuration. As the example used only bootstrap
parameters, currently all such options go into the bootstrap section.

Also the initdb options are repsented as a map, while Patroni initdb
options are a list of either maps or strings (i.e. "data-checksums"
doesn't need an argument). For now, there is a work-around, but in the
future we might consider changing the spec.
2017-05-12 11:41:36 +02:00
Oleksii Kliukin 6983f444ed Periodically sync roles with the running clusters. (#102)
The sync adds or alters database roles based on the roles defined
in the cluster's TPR, Team API and operator's infrastructure roles.
At the moment, roles are not deleted, as it would be dangerous for
the robot roles in case TPR is misconfigured. In addition, ALTER
ROLE does not remove role options, i.e. SUPERUSER or CREATEROLE,
neither it removes role membership: only new options are added and
new role membership is granted. So far, options like NOSUPERUSER
and NOCREATEROLE won't be handed correctly, when mixed with the
non-negative counterparts, also NOLOGIN should be processed correctly.
The code assumes that only MD5 passwords are stored in the DB and
will likely break with the new SCRAM auth in PostgreSQL 10.

On the implementation side, create the new interface to abstract
roles merge and creation, move most of the role-based functionality
from cluster/pg into the new 'users' module, strip create user code
of special cases related to human-based users (moving them to init
instead) and fixed the password md5 generator to avoid processing
already encrypted passwords. In addition, moved the system roles
off the slice containing all other roles in order to avoid extra
efforts to avoid creating them.

Also, fix a leak in DB connections when the new connection is not
considered healthy and discarded without being closed. Initialize
the database during the sync phase before syncing users.
2017-05-12 11:41:35 +02:00
Murat Kabilov 2370659c69 Parallel cluster processing
Run operations concerning multiple clusters in parallel. Each cluster gets its
own worker in order to create, update, sync or delete clusters.  Each worker
acquires the lock on a cluster.  Subsequent operations on the same cluster
have to wait until the current one finishes.  There is a pool of parallel
workers, configurable with the `workers` parameter in the configmap and set by
default to 4. The cluster-related tasks  are assigned to the workers based on
a cluster name: the tasks for the same cluster will be always assigned to the
same worker. There is no blocking between workers, although there is a chance
that a single worker will become a bottleneck if too many clusters are
assigned to it; therefore, for large-scale deployments it might be necessary
to bump up workers from the default value.
2017-05-12 11:41:35 +02:00
Murat Kabilov 9b0d0d487c Use PATCH while updating Services and StatefulSets 2017-05-12 11:41:34 +02:00
Murat Kabilov db53134cbd Skip syncing Pods 2017-05-12 11:41:33 +02:00
Murat Kabilov 322676a6b9 Skip deleting Pods and PVCs if failed to delete StatefulSet 2017-05-12 11:41:32 +02:00
Murat Kabilov bb4fec25ae Fix deletion of the failed cluster; more debug messages 2017-05-12 11:41:32 +02:00
Murat Kabilov ee83e196a9 Fix secrets sync
* log if secret already exists
2017-05-12 11:41:30 +02:00
Oleksii Kliukin 3a4c6268be Increase log verbosity, namely for object updates.
- add a new environment variable for triggering debug log level
- show both new, old object and diff during syncs and updates
- use pretty package to pretty-print go structures
-
2017-05-12 11:41:29 +02:00
Murat Kabilov c2d2a67ad5 Get config from environment variables;
ignore pg major version change;
get rid of resources package;
2017-05-12 11:41:29 +02:00
Murat Kabilov 79a6726d4d Increase logging verbosity, restructure code 2017-05-12 11:41:28 +02:00
Murat Kabilov bbdc2f52a9 fix resource load and list 2017-05-12 11:41:28 +02:00
Murat Kabilov 6f7399b36f Sync clusters states
* move statefulset creation from cluster spec to the separate function
* sync cluster state with desired state;
* move out from arrays for cluster resources;
* recreate pods instead of deleting them in case of statefulset change
* check for master while creating cluster/updating pods
* simplify retryutil
* list pvc while listing resources
* name kubernetes resources with capital letter
* do rolling update in case of env variables change
2017-05-12 11:41:27 +02:00
Oleksii Kliukin 31d7426327 ClusterTeamName -> ClusterName. Add a TODO item. 2017-05-12 11:41:27 +02:00
Oleksii Kliukin 55dbacdfa6 Assign DNS name to the cluster.
DNS name is generated from the team name and cluster name.
Use "zalando.org/dnsname" service annotation that makes 'mate' service assign a CNAME to the load balancer name.
2017-05-12 11:41:27 +02:00
Oleksii Kliukin 45fcb2adc9 Assign SUPERUSER to human users by default. 2017-05-12 11:41:27 +02:00
Murat Kabilov 416dace289 get rid of arrays in the kuberesources;
use shorter form of checking for errors
2017-05-12 11:41:26 +02:00
Murat Kabilov caa0eab19b Move statefulset creation from cluster spec to the separate function 2017-05-12 11:41:25 +02:00
Murat Kabilov 021eedb226 Fix resource already exists log messages 2017-05-12 11:41:25 +02:00
Oleksii Kliukin a2e78ac2ec Feature/persistent volumes 2017-05-12 11:41:25 +02:00
Murat Kabilov ae77fa15e8 Pod Rolling update
introduce Pod events channel;
add parsing of the MaintenanceWindows section;
skip deleting Etcd key on cluster delete;
use external etcd host;
watch for tpr/pods in the namespace of the operator pod only;
2017-05-12 11:41:25 +02:00
Murat Kabilov dfde075c66 Use TPR object namespace while creating its objects 2017-05-12 11:37:09 +02:00
Murat Kabilov 6e2d64bd50 Create human users from teams api 2017-05-12 11:37:09 +02:00
Murat Kabilov 58506634c4 Create pg users 2017-05-12 11:37:09 +02:00