Commit Graph

373 Commits

Author SHA1 Message Date
Dmitrii Dolgov bf7df5d5b5 Add debug mode
You can specify an environment variable DEBUG 1/0 to enable/disable
debug mode. Debug mode assumes delve dependency for remote debugging,
and debugging symbols for go build.
2018-02-02 10:39:39 +01:00
zerg-junior 4cbb86ea13
Merge pull request #215 from erthalion/feature/debugger-documentation-section
Add a section about debugger
2018-02-01 18:10:47 +01:00
Dmitrii Dolgov 4e629ef7ca Update section to be more specific 2018-02-01 17:59:05 +01:00
Dmitrii Dolgov 53014ca28b Add a section about debugger
So far only delve is described in details
2018-02-01 17:21:37 +01:00
Oleksii Kliukin b90a36c909
Set node_readiness_label default to an empty value. (#204)
Previously, it was set to the lifecycle-status:ready, breaking a
lot of minikube deployments. Also it was not possible befor to run
with this label set to an empty value.

Document the effect of the label in the new section of the
documentation.
2018-01-16 15:43:03 +01:00
Manuel Gómez bf4406d2a4 Consider container names in Statefulset diffs (#210)
This includes a comparison on container names being equal in the
decision of whether a Statefulset has been updated.
2018-01-16 12:06:11 +01:00
zerg-junior 8c9033df28
Merge pull request #205 from zalando-incubator/scalyr-api-key
Overwrite scalyr api key if the relevant env var is present in the operator pod
2018-01-12 15:17:05 +01:00
Sergey Dudoladov ec6799f34a Overwrite scalyr api key if the relevant env variable is present in the operator pod 2018-01-12 14:56:14 +01:00
Jan Mussler 56359d23c9
Update README.md (#201) 2018-01-10 18:27:32 +01:00
Oleksii Kliukin 23011bdf9a
Migrate only master pods. Migrate single masters. (#199)
Avoid migrating replica pods, since they will be handled by the
node draining anyway (the PDB specifies that only masters are to
be kept).

Allow migration of the single-pod clusters.
2018-01-09 11:55:11 +01:00
zerg-junior bb5ce6cbbe
Merge pull request #195 from zalando-incubator/databases-rest-endpoint
Add a REST endpoint to list databases in all clusters
2018-01-09 11:53:32 +01:00
Oleksii Kliukin 8e99518eeb
Improve behavior on node decomissionining (#184)
* Trigger the node migration on the lack of  the readiness label.

* Examine the node's readiness status on node add.

Make sure we don't miss the not ready node, especially when the
operator is killed during the migration.
2018-01-04 11:53:15 +01:00
Manuel Gómez 1109cfa7a1
Add PostgreSQL pod namespace Scalyr sidecar environment (#196)
Another tiny bit of information that could be useful for log filters
once we start deploying clusters into separate namespaces.
2017-12-22 17:12:50 +01:00
Oleksii Kliukin 5c8bd04169 Sort database by name. 2017-12-22 15:48:13 +01:00
Oleksii Kliukin 6102b0368c Merge remote-tracking branch 'origin/databases-rest-endpoint' into databases-rest-endpoint
# Conflicts:
#	pkg/apiserver/apiserver.go
#	pkg/controller/status.go
2017-12-22 13:08:50 +01:00
Oleksii Kliukin 9720ac1f7e WIP: Hold the proper locks while examining the list of databases.
Introduce a new lock called specMu lock to protect the cluster spec.
This lock is held on update and sync, and when retrieving the spec in
the API code. There is no need to acquire it for cluster creation and
deletion: creation assigns the spec to the cluster before linking it to
the controller, and deletion just removes the cluster from the list in
the controller, both holding the global clustersMu Lock.
2017-12-22 13:06:11 +01:00
Sergey Dudoladov b8bf97ab76 Integrate comments from code reviews 2017-12-22 12:53:57 +01:00
Sergey Dudoladov 011458fb05 Add a REST endpoint to list databases in all clusters 2017-12-21 17:28:55 +01:00
Manuel Gómez cd9bc7bdc5
Add PostgreSQL pod name Scalyr sidecar environment (#194)
This will allow the Scalyr image to add a custom attribute to shipped
log entries that notes the name of the originating pod.
2017-12-21 16:52:27 +01:00
Manuel Gómez 15c278d4e8
Scalyr agent sidecar for log shipping (#190)
* Scalyr agent sidecar for log shipping

* Remove the default for the Scalyr image

Now the image needs to be specified explicitly to enable log shipping to
Scalyr.  This removes the problem of having to generate the config file
or publish our agent image repository.

* Add configuration variable for Scalyr server URL

Defaults to the EU address.

* Alter style

Newlines are cheap and make code easier to edit/refactor, but ok.

* Fix StatefulSet comparison logic

I broke it when I made the comparison consider all containers in the
PostgreSQL pod.
2017-12-21 15:34:26 +01:00
Oleksii Kliukin da0de8cff7
Make sure the statefulset that is deleted manually gets re-created. (#191)
* Make sure the statefulset that is deleted manually gets re-created.

Per report and analysis by Manuel Gomez.

* Move the existence checks for other objects out of the Create functions.

create{Object} for services, endpoints and PDBs refused to continue if
there is a cached definition in the cluster, however, the only place
where it makes sense is when creating a new cluster. Note that contrary
to the statefulset this doesn't fix any issues, since those definitions
were nullified correspondingly when the sync code detected there is no
object present in the Kubernetes cluster.
2017-12-21 15:20:43 +01:00
zerg-junior 6c57334666 Add an example for cloning a backup from existing cluster (#189)
Add an example for cloning a backup from existing cluster
2017-12-19 16:21:06 +01:00
zerg-junior 60aaa78dfc
Merge pull request #187 from zalando-incubator/fix-loadbalancer-config
Fix loadBalancerConfig
2017-12-18 17:53:21 +01:00
Sergey Dudoladov c1b3ce8028 Fix loadBalancerConfig 2017-12-18 17:32:22 +01:00
zerg-junior a8e1c00e65
Merge pull request #185 from zalando-incubator/update_documentation
Mention the talk by Josh Berkus and the blog post by Jan Mußler.
2017-12-18 17:22:46 +01:00
Oleksii Kliukin 80d55c1072 Mention the talk by Josh Berkus and the blog post by Jan Mußler. 2017-12-18 16:25:50 +01:00
Jan Mussler 04024f91e4 Update README.md
Removed a bit about the staging only use case and describe our setup.
2017-12-17 16:19:22 +01:00
zerg-junior 3c178f68df Warn on infrastructure-roles.yaml format violations (#177)
Emit a warning if there are unprocessed entries in the infrastructure-roles secret.
2017-12-15 17:21:41 +01:00
zerg-junior 5d5fa680a3
Merge pull request #180 from zalando-incubator/container-name
Make pod's single container name static
2017-12-15 16:13:33 +01:00
Oleksii Kliukin bf80f5225e
Introduce higher and lower bounds for the number of instances (#178)
* Introduce higher and lower bounds for the number of instances

Reduce the number of instances to the min_instances if it is lower and
to the max_instances if it is higher. -1 for either of those means there
is no lower or upper bound.

In addition, terminate the operator when there is a nonsense in the
configuration (i.e. max_instances < min_instances).

Reviewed by Jan Mußler and Sergey Dudoladov.
2017-12-15 16:02:50 +01:00
Sergey Dudoladov 52e358ba8f Make pod's single container name static 2017-12-15 15:53:53 +01:00
Oleksii Kliukin 0e255f82c6 Provide more information about variable conflicts.
They are mentioned in the documentation and the operator will emit a
warning each time the variable from the pod environment configmap is
ignored because the same variable is defined by the operator.

Some minor changes in the variable names to make the code more readable.

Per review from Sergey Dudoladov.
2017-12-14 14:39:33 +01:00
Oleksii Kliukin da4b66210a Expand variables from the PodEnvironmentConfigMap (#4)
Inject PodEnvironmentConfigMap variables inline into the
statefulset definition in order to be able to figure out
changes to the statefulset when only PodEnvironmentConfigMap
has changed.
2017-12-14 14:39:33 +01:00
Oleksii Kliukin 1c5451cd7d Spelling fix. 2017-12-14 14:39:33 +01:00
Oleksii Kliukin 55dc12e512 Examine custom environment sources when syncing.
When comparing statefulsets, make sure EnvFrom fields are compared
as well.
2017-12-14 14:39:33 +01:00
Georg Kunz e8d9c75949 Allow custom Postgres pod environment variables 2017-12-14 14:39:33 +01:00
Oleksii Kliukin 87bc47d8d0 Fixes for the case of re-creating the cluster after deletion.
- make sure that the secrets for the system users (superuser, replication)
  are not deleted when the main cluster is. Therefore, we can re-create
  the cluster, potentially forcing Patroni to restore it from the backup
  and enable Patroni to connect, since it will use the old password, not
  the newly generated random one.

- when syncing users, always check whether they are already in the DB.
  Previously, we did this only for the sync cluster case, but the new
  cluster could be actually the one restored from the backup by Patroni,
  having all or some of the users already in place.

 - delete endponts last. Patroni uses the $clustername endpoint in order
   to store the leader related metadata. If we remove it before removing
   all pods, one of those pods running Patroni will re-create it and the
   next attempt to create the cluster with the same name will stuble on
   the existing endpoint.

 - Use db.Exec instead of db.Query for queries that expect no result.
   This also fixes the issue with the DB creation, since we didn't
   release an empty Row object it was not possible to create more than
   one database for a cluster.
2017-12-13 16:49:00 +01:00
Oleksii Kliukin 1fb8cf7ea0
Avoid overwriting critical users. (#172)
* Avoid overwriting critical users.

Disallow defining new users either in the cluster manifest, teams
API or infrastructure roles with the names mentioned in the new
protected_role_names parameter (list of comma-separated names)

Additionally, forbid defining a user with the name matching either
super_username or replication_username, so that we don't overwrite
system roles required for correct working of the operator itself.

Also, clear PostgreSQL roles on each sync first in order to avoid using
the old definitions that are no longer present in the current manifest,
infrastructure roles secret or the teams API.
2017-12-05 14:27:12 +01:00
Oleksii Kliukin 022ce29314 Make an error message more verbose. 2017-12-04 10:49:25 +01:00
Oleksii Kliukin 637921cdee Tests for initHumanUsers and initinitRobotUsers.
Change the Cluster class in the process to implelement Teams API
calls and Oauth token fetches as interfaces, so that we can mock
them in the tests.
2017-12-04 10:49:25 +01:00
Oleksii Kliukin 611cfe96d6 Fix an issue when not assigning the merge result.
Add some tests.
2017-12-04 10:49:25 +01:00
Oleksii Kliukin 831ebb1f32 Fix the error reporting. 2017-12-04 10:49:25 +01:00
Oleksii Kliukin 2e226dee26 Avoid overwriting infrastrure roles.
When a role is defined in the infrastructure roles and the cluster
manifest use the infrastructure role definition and add flags
defined in the manifest.

Previously the role has been overwritten by the definition from the
manifest.  Because a random password is generated for each role from the
manifest the applications relying on the infrastructure role credentials
from the infrastructure roles secret were unable to connect.
2017-12-04 10:49:25 +01:00
Oleksii Kliukin dd0affc390 Tweak our reaction to the cluster upgrade process.
Previously, the operator started to move the pods off the nodes to be
decomissioned by watching the eol_node_label value. Every new postgres
pod has been created with the anti-affinity to that label, making sure
that the pods being moved won't land on another to be decomissioned
node.

The changes introduce another label that indicates the ready node.  The
new pod affinity will esnure that the pod is only scheduled to the node
marked as ready, discarding the previous anti-affinity.  That way the
nodes can transition from the pending-decomission to the other statuses
(drained, terminating) without having pods suddently scaled to them.

In addition, rename the label that triggers the start of the upgrade
process to node_eol_label (for consistency with node_readiness_label)
and set its default vvalue to lifecycle-status:pending-decomission.
2017-11-30 14:11:49 +01:00
Manuel Gómez a39e89d155
Merge pull request #170 from zalando-incubator/bugfix/connection_leak_and_sync_role_options
Fix the connection leak and user options sync.
2017-11-28 11:10:16 +01:00
Oleksii Kliukin 1ffe98ba9f Fix the connection leak and user options sync.
- fix the lack of closing the cursor for the query that returned no
rows.
- fix syncing of the user options, as previously those were not
  fetched from the database.
2017-11-27 16:46:34 +01:00
Oleksii Kliukin 68bb3cd52d Add an example of the new parameter with multiple values. 2017-11-22 10:43:35 +01:00
Oleksii Kliukin 086ead03f5 Warn about attempts to use escape quotes. 2017-11-22 10:43:35 +01:00
Oleksii Kliukin f6a2225c38 rename the parameter in the README. 2017-11-22 10:43:35 +01:00
Oleksii Kliukin 975b21f633 Rename api roles configuration parameter.
Change api_roles_configuration to team_api_role_configuration
2017-11-22 10:43:35 +01:00