extend docs and polish manifest examples (#762)

2019-12-12 17:55:41 +01:00 · 2019-12-12 17:55:41 +01:00 · 97e0d6d388
parent bfe2e709a1
commit 97e0d6d388
12 changed files with 111 additions and 109 deletions
--- a/docs/administrator.md
+++ b/docs/administrator.md
@ -3,6 +3,26 @@
 Learn how to configure and manage the Postgres Operator in your Kubernetes (K8s)
 environment.

+## Minor and major version upgrade
+
+Minor version upgrades for PostgreSQL are handled via updating the Spilo Docker
+image. The operator will carry out a rolling update of Pods which includes a
+switchover (planned failover) of the master to the Pod with new minor version.
+The switch should usually take less than 5 seconds, still clients have to
+reconnect.
+
+Major version upgrades are supported via [cloning](user.md#clone-directly). The
+new cluster manifest must have a higher `version` string than the source cluster
+and will be created from a basebackup. Depending of the cluster size, downtime
+in this case can be significant as writes to the database should be stopped and
+all WAL files should be archived first before cloning is started.
+
+Note, that simply changing the version string in the `postgresql` manifest does
+not work at present and leads to errors. Neither Patroni nor Postgres Operator
+can do in place `pg_upgrade`. Still, it can be executed manually in the Postgres
+container, which is tricky (i.e. systems need to be stopped, replicas have to be
+synced) but of course faster than cloning.
+
 ## CRD Validation

 [CustomResourceDefinitions](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions)
@ -95,8 +115,6 @@ is used by the operator to connect to the clusters after creation.

 ## Role-based access control for the operator

-### Service account and cluster roles
-
 The manifest [`operator-service-account-rbac.yaml`](../manifests/operator-service-account-rbac.yaml)
 defines the service account, cluster roles and bindings needed for the operator
 to function under access control restrictions. To deploy the operator with this
@ -109,6 +127,8 @@ kubectl create -f manifests/postgres-operator.yaml
 kubectl create -f manifests/minimal-postgres-manifest.yaml
 ```

+### Service account and cluster roles
+
 Note that the service account is named `zalando-postgres-operator`. You may have
 to change the `service_account_name` in the operator ConfigMap and
 `serviceAccountName` in the `postgres-operator` deployment appropriately. This
@ -116,12 +136,6 @@ is done intentionally to avoid breaking those setups that already work with the
 default `operator` account. In the future the operator should ideally be run
 under the `zalando-postgres-operator` service account.

-The service account defined in `operator-service-account-rbac.yaml` acquires
-some privileges not used by the operator (i.e. we only need `list` and `watch`
-on `configmaps` resources). This is also done intentionally to avoid breaking
-things if someone decides to configure the same service account in the
-operator's ConfigMap to run Postgres clusters.
-
 ### Give K8s users access to create/list `postgresqls`

 By default `postgresql` custom resources can only be listed and changed by
@ -157,7 +171,6 @@ metadata:
  name: postgres-operator
 data:
  toleration: "key:postgres,operator:Exists,effect:NoSchedule"
-  ...
 ```

 For an OperatorConfiguration resource the toleration should be defined like
@ -172,7 +185,6 @@ configuration:
  kubernetes:
    toleration:
      postgres: "key:postgres,operator:Exists,effect:NoSchedule"
-  ...
 ```

 Note that the K8s version 1.13 brings [taint-based eviction](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#taint-based-evictions)
@ -250,7 +262,6 @@ metadata:
  name: postgres-operator
 data:
  inherited_labels: application,environment
-  ...
 ```

 **OperatorConfiguration**
@ -265,7 +276,6 @@ configuration:
    inherited_labels:
    - application
    - environment
-...
 ```

 **cluster manifest**
@ -294,7 +304,6 @@ spec:
    matchLabels:
      application: my-app
      environment: demo
-...
 ```


@ -317,7 +326,6 @@ metadata:
 data:
  # referencing config map with custom settings
  pod_environment_configmap: postgres-pod-config
-  ...
 ```

 **OperatorConfiguration**
@ -331,7 +339,6 @@ configuration:
  kubernetes:
    # referencing config map with custom settings
    pod_environment_configmap: postgres-pod-config
-    ...
 ```

 **referenced ConfigMap `postgres-pod-config`**
@ -412,12 +419,12 @@ external systems but defined for an individual Postgres cluster in its manifest.
 A typical example is a role for connections from an application that uses the
 database.

-* **Human users** originate from the Teams API that returns a list of the team
-members given a team id. The operator differentiates between (a) product teams
-that own a particular Postgres cluster and are granted admin rights to maintain
-it, and (b) Postgres superuser teams that get the superuser access to all
-Postgres databases running in a K8s cluster for the purposes of maintaining and
-troubleshooting.
+* **Human users** originate from the [Teams API](user.md#teams-api-roles) that
+returns a list of the team members given a team id. The operator differentiates
+between (a) product teams that own a particular Postgres cluster and are granted
+admin rights to maintain it, and (b) Postgres superuser teams that get the
+superuser access to all Postgres databases running in a K8s cluster for the
+purposes of maintaining and troubleshooting.

 ## Understanding rolling update of Spilo pods

@ -481,7 +488,7 @@ A secret can be pre-provisioned in different ways:

 With the v1.2 release the Postgres Operator is shipped with a browser-based
 configuration user interface (UI) that simplifies managing Postgres clusters
-with the operator. The UI runs with Node.js and comes with it's own docker
+with the operator. The UI runs with Node.js and comes with it's own Docker
 image.

 Run NPM to continuously compile `tags/js` code. Basically, it creates an
@ -493,14 +500,14 @@ Run NPM to continuously compile `tags/js` code. Basically, it creates an

 To build the Docker image open a shell and change to the `ui` folder. Then run:

-```
+```bash
 docker build -t registry.opensource.zalan.do/acid/postgres-operator-ui:v1.2.0 .
 ```

 Apply all manifests for the `ui/manifests` folder to deploy the Postgres
 Operator UI on K8s. For local tests you don't need the Ingress resource.

-```
+```bash
 kubectl apply -f ui/manifests
 ```

@ -510,6 +517,6 @@ to the K8s and Postgres Operator REST API. You can use the provided
 `run_local.sh` script for this. Make sure it uses the correct URL to your K8s
 API server, e.g. for minikube it would be `https://192.168.99.100:8443`.

-```
+```bash
 ./run_local.sh
 ```
--- a/docs/developer.md
+++ b/docs/developer.md
@ -40,7 +40,7 @@ This would take a while to complete. You have to redo `make deps` every time
 your dependencies list changes, i.e. after adding a new library dependency.

 Build the operator with the `make docker` command. You may define the TAG
-variable to assign an explicit tag to your docker image and the IMAGE to set
+variable to assign an explicit tag to your Docker image and the IMAGE to set
 the image name. By default, the tag is computed with
 `git describe --tags --always --dirty` and the image is
 `registry.opensource.zalan.do/acid/postgres-operator`
@ -60,10 +60,10 @@ The binary will be placed into the build directory.

 ## Deploying self build image

-The fastest way to run and test your docker image locally is to reuse the docker
-from [minikube](https://github.com/kubernetes/minikube/releases) or use the
-`load docker-image` from [kind](https://kind.sigs.k8s.io/). The following steps
-will get you the docker image built and deployed.
+The fastest way to run and test your Docker image locally is to reuse the Docker
+environment from [minikube](https://github.com/kubernetes/minikube/releases)
+or use the `load docker-image` from [kind](https://kind.sigs.k8s.io/). The
+following steps will get you the Docker image built and deployed.

 ```bash
 # minikube
@ -162,7 +162,7 @@ The operator also supports pprof endpoints listed at the
 * /debug/pprof/trace

 It's possible to attach a debugger to troubleshoot postgres-operator inside a
-docker container. It's possible with [gdb](https://www.gnu.org/software/gdb/)
+Docker container. It's possible with [gdb](https://www.gnu.org/software/gdb/)
 and [delve](https://github.com/derekparker/delve). Since the latter one is a
 specialized debugger for Go, we will use it as an example. To use it you need:

--- a/docs/index.md
+++ b/docs/index.md
@ -13,7 +13,7 @@ manages PostgreSQL clusters on Kubernetes (K8s):

 2. The operator also watches updates to [its own configuration](../manifests/configmap.yaml)
   and alters running Postgres clusters if necessary.  For instance, if the
-   docker image in a pod is changed, the operator carries out the rolling
+   Docker image in a pod is changed, the operator carries out the rolling
   update, which means it re-spawns pods of each managed StatefulSet one-by-one
   with the new Docker image.

--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@ -155,9 +155,12 @@ export PGPORT=$(echo $HOST_PORT | cut -d: -f 2)
 ```

 Retrieve the password from the K8s Secret that is created in your cluster.
+Non-encrypted connections are rejected by default, so set the SSL mode to
+require:

 ```bash
 export PGPASSWORD=$(kubectl get secret postgres.acid-minimal-cluster.credentials -o 'jsonpath={.data.password}' | base64 -d)
+export PGSSLMODE=require
 psql -U postgres
 ```

--- a/docs/reference/cluster_manifest.md
+++ b/docs/reference/cluster_manifest.md
@ -62,7 +62,7 @@ These parameters are grouped directly under  the `spec` key in the manifest.
  field.

 * **dockerImage**
-  custom docker image that overrides the **docker_image** operator parameter.
+  custom Docker image that overrides the **docker_image** operator parameter.
  It should be a [Spilo](https://github.com/zalando/spilo) image. Optional.

 * **spiloFSGroup**
@ -124,7 +124,7 @@ These parameters are grouped directly under  the `spec` key in the manifest.


 * **enableShmVolume**
-  Start a database pod without limitations on shm memory. By default docker
+  Start a database pod without limitations on shm memory. By default Docker
  limit `/dev/shm` to `64M` (see e.g. the [docker
  issue](https://github.com/docker-library/postgres/issues/416), which could be
  not enough if PostgreSQL uses parallel workers heavily. If this option is
@ -185,19 +185,19 @@ explanation of `ttl` and `loop_wait` parameters.

 * **ttl**
  Patroni `ttl` parameter value, optional. The default is set by the Spilo
-  docker image. Optional.
+  Docker image. Optional.

 * **loop_wait**
  Patroni `loop_wait` parameter value, optional. The default is set by the
-  Spilo docker image. Optional.
+  Spilo Docker image. Optional.

 * **retry_timeout**
  Patroni `retry_timeout` parameter value, optional. The default is set by the
-  Spilo docker image. Optional.
+  Spilo Docker image. Optional.

 * **maximum_lag_on_failover**
  Patroni `maximum_lag_on_failover` parameter value, optional. The default is
-  set by the Spilo docker image. Optional.
+  set by the Spilo Docker image. Optional.

 * **slots**
  permanent replication slots that Patroni preserves after failover by
@ -320,7 +320,7 @@ defined in the sidecar dictionary:
  name of the sidecar. Required.

 * **image**
-  docker image of the sidecar. Required.
+  Docker image of the sidecar. Required.

 * **env**
  a dictionary of environment variables. Use usual Kubernetes definition
--- a/docs/reference/operator_parameters.md
+++ b/docs/reference/operator_parameters.md
@ -81,13 +81,13 @@ Those are top-level keys, containing both leaf keys and groups.
  Kubernetes-native DCS).

 * **docker_image**
-  Spilo docker image for Postgres instances. For production, don't rely on the
+  Spilo Docker image for Postgres instances. For production, don't rely on the
  default image, as it might be not the most up-to-date one. Instead, build
  your own Spilo image from the [github
  repository](https://github.com/zalando/spilo).

 * **sidecar_docker_images**
-  a map of sidecar names to docker images to run with Spilo. In case of the name
+  a map of sidecar names to Docker images to run with Spilo. In case of the name
  conflict with the definition in the cluster manifest the cluster-specific one
  is preferred.

--- a/docs/user.md
+++ b/docs/user.md
@ -13,7 +13,7 @@ kind: postgresql
 metadata:
  name: acid-minimal-cluster
 spec:
-  teamId: "ACID"
+  teamId: "acid"
  volume:
    size: 1Gi
  numberOfInstances: 2
@ -40,8 +40,16 @@ you can find this example also in the manifests folder:
 kubectl create -f manifests/minimal-postgres-manifest.yaml
 ```

-Note, that the minimum volume size to run the `postgresql` resource on Elastic
-Block Storage (EBS) is `1Gi`.
+Make sure, the `spec` section of the manifest contains at least a `teamId`, the
+`numberOfInstances` and the `postgresql` object with the `version` specified.
+The minimum volume size to run the `postgresql` resource on Elastic Block
+Storage (EBS) is `1Gi`.
+
+Note, that the name of the cluster must start with the `teamId` and `-`. At
+Zalando we use team IDs (nicknames) to lower the chance of duplicate cluster
+names and colliding entities. The team ID would also be used to query an API to
+get all members of a team and create [database roles](#teams-api-roles) for
+them.

 ## Watch pods being created

@ -65,10 +73,12 @@ kubectl port-forward $PGMASTER 6432:5432

 Open another CLI and connect to the database. Use the generated secret of the
 `postgres` robot user to connect to our `acid-minimal-cluster` master running
-in Minikube:
+in Minikube. As non-encrypted connections are rejected by default set the SSL
+mode to require:

 ```bash
 export PGPASSWORD=$(kubectl get secret postgres.acid-minimal-cluster.credentials -o 'jsonpath={.data.password}' | base64 -d)
+export PGSSLMODE=require
 psql -U postgres -p 6432
 ```

@ -80,8 +90,7 @@ cluster. It covers three use-cases:
 * `manifest roles`: create application roles specific to the cluster described
 in the manifest.
 * `infrastructure roles`: create application roles that should be automatically
-created on every
-  cluster managed by the operator.
+created on every cluster managed by the operator.
 * `teams API roles`: automatically create users for every member of the team
 owning the database cluster.

@ -185,16 +194,34 @@ See [infrastructure roles secret](../manifests/infrastructure-roles.yaml)
 and [infrastructure roles configmap](../manifests/infrastructure-roles-configmap.yaml)
 for the examples.

+### Teams API roles
+
+These roles are meant for database activity of human users. It's possible to
+configure the operator to automatically create database roles for lets say all
+employees of one team. They are not listed in the manifest and there are no K8s
+secrets created for them. Instead they would use an OAuth2 token to connect. To
+get all members of the team the operator queries a defined API endpoint that
+returns usernames. A minimal Teams API should work like this:
+
+```
+/.../<teamname> -> ["name","anothername"]
+```
+
+A ["fake" Teams API](../manifests/fake-teams-api.yaml) deployment is provided
+in the manifests folder to set up a basic API around whatever services is used
+for user management. The Teams API's URL is set in the operator's
+[configuration](reference/operator_parameters.md#automatic-creation-of-human-users-in-the-database)
+and `enable_teams_api` must be set to `true`. There are more settings available
+to choose superusers, group roles, [PAM configuration](https://github.com/CyberDem0n/pam-oauth2)
+etc. An OAuth2 token can be passed to the Teams API via a secret. The name for
+this secret is configurable with the `oauth_token_secret_name` parameter.
+
 ## Resource definition

 The compute resources to be used for the Postgres containers in the pods can be
 specified in the postgresql cluster manifest.

 ```yaml
-apiVersion: "acid.zalan.do/v1"
-kind: postgresql
-metadata:
-  name: acid-minimal-cluster
 spec:
  resources:
    requests:
@ -218,12 +245,7 @@ you can use [taints and tolerations](https://kubernetes.io/docs/concepts/configu
 and configure the required toleration in the manifest.

 ```yaml
-apiVersion: "acid.zalan.do/v1"
-kind: postgresql
-metadata:
-  name: acid-minimal-cluster
 spec:
-  teamId: "ACID"
  tolerations:
  - key: postgres
    operator: Exists
@ -241,11 +263,6 @@ section in the spec. There are two options here:
 ### Clone directly

 ```yaml
-apiVersion: "acid.zalan.do/v1"
-kind: postgresql
-
-metadata:
-  name: acid-test-cluster
 spec:
  clone:
    cluster: "acid-batman"
@ -261,11 +278,6 @@ means that you can clone only from clusters within the same namespace.
 ### Clone from S3

 ```yaml
-apiVersion: "acid.zalan.do/v1"
-kind: postgresql
-
-metadata:
-  name: acid-test-cluster
 spec:
  clone:
    uid: "efd12e58-5786-11e8-b5a7-06148230260c"
@ -294,10 +306,6 @@ For non AWS S3 following settings can be set to support cloning from other S3
 implementations:

 ```yaml
-apiVersion: "acid.zalan.do/v1"
-kind: postgresql
-metadata:
-  name: acid-test-cluster
 spec:
  clone:
    uid: "efd12e58-5786-11e8-b5a7-06148230260c"
@ -346,13 +354,7 @@ used for log aggregation, monitoring, backups or other tasks. A sidecar can be
 specified like this:

 ```yaml
-apiVersion: "acid.zalan.do/v1"
-kind: postgresql
-
-metadata:
-  name: acid-minimal-cluster
 spec:
-  ...
  sidecars:
    - name: "container-name"
      image: "company/image:tag"
@ -390,13 +392,7 @@ be used to run custom actions before any normal and sidecar containers start.
 An init container can be specified like this:

 ```yaml
-apiVersion: "acid.zalan.do/v1"
-kind: postgresql
-
-metadata:
-  name: acid-minimal-cluster
 spec:
-  ...
  initContainers:
    - name: "container-name"
      image: "company/image:tag"
@ -417,12 +413,7 @@ Postgres operator supports statefulset volume resize if you're using the
 operator on top of AWS. For that you need to change the size field of the
 volume description in the cluster manifest and apply the change:

-```
-apiVersion: "acid.zalan.do/v1"
-kind: postgresql
-
-metadata:
-  name: acid-test-cluster
+```yaml
 spec:
  volume:
    size: 5Gi # new volume size
@ -451,7 +442,8 @@ size of volumes that correspond to the previously running pods is not changed.
 You can enable logical backups from the cluster manifest by adding the following
 parameter in the spec section:

-```
+```yaml
+spec:
  enableLogicalBackup: true
 ```

--- a/manifests/complete-postgres-manifest.yaml
+++ b/manifests/complete-postgres-manifest.yaml
@ -10,7 +10,7 @@ spec:
  - name: date
    image: busybox
    command: [ "/bin/date" ]
-  teamId: "ACID"
+  teamId: "acid"
  volume:
    size: 1Gi
 #    storageClass: my-sc
--- a/manifests/minimal-postgres-manifest.yaml
+++ b/manifests/minimal-postgres-manifest.yaml
@ -4,7 +4,7 @@ metadata:
  name: acid-minimal-cluster
  namespace: default
 spec:
-  teamId: "ACID"
+  teamId: "acid"
  volume:
    size: 1Gi
  numberOfInstances: 2
--- a/manifests/standby-manifest.yaml
+++ b/manifests/standby-manifest.yaml
@ -4,7 +4,7 @@ metadata:
  name: acid-standby-cluster
  namespace: default
 spec:
-  teamId: "ACID"
+  teamId: "acid"
  volume:
    size: 1Gi
  numberOfInstances: 1
--- a/pkg/apis/acid.zalan.do/v1/util_test.go
+++ b/pkg/apis/acid.zalan.do/v1/util_test.go
@ -180,7 +180,7 @@ var unmarshalCluster = []struct {
 	    "name": "acid-testcluster1"
 	  },
 	  "spec": {
-	    "teamId": "ACID",
+	    "teamId": "acid",
 		"pod_priority_class_name": "spilo-pod-priority",
 	    "volume": {
 	      "size": "5Gi",
@ -290,7 +290,7 @@ var unmarshalCluster = []struct {
 					ResourceLimits:   ResourceDescription{CPU: "300m", Memory: "3000Mi"},
 				},

-				TeamID:              "ACID",
+				TeamID:              "acid",
 				AllowedSourceRanges: []string{"127.0.0.1/32"},
 				NumberOfInstances:   2,
 				Users:               map[string]UserFlags{"zalando": {"superuser", "createdb"}},
@ -319,7 +319,7 @@ var unmarshalCluster = []struct {
 			},
 			Error: "",
 		},
-		marshal: []byte(`{"kind":"Postgresql","apiVersion":"acid.zalan.do/v1","metadata":{"name":"acid-testcluster1","creationTimestamp":null},"spec":{"postgresql":{"version":"9.6","parameters":{"log_statement":"all","max_connections":"10","shared_buffers":"32MB"}},"pod_priority_class_name":"spilo-pod-priority","volume":{"size":"5Gi","storageClass":"SSD", "subPath": "subdir"},"enableShmVolume":false,"patroni":{"initdb":{"data-checksums":"true","encoding":"UTF8","locale":"en_US.UTF-8"},"pg_hba":["hostssl all all 0.0.0.0/0 md5","host    all all 0.0.0.0/0 md5"],"ttl":30,"loop_wait":10,"retry_timeout":10,"maximum_lag_on_failover":33554432,"slots":{"permanent_logical_1":{"database":"foo","plugin":"pgoutput","type":"logical"}}},"resources":{"requests":{"cpu":"10m","memory":"50Mi"},"limits":{"cpu":"300m","memory":"3000Mi"}},"teamId":"ACID","allowedSourceRanges":["127.0.0.1/32"],"numberOfInstances":2,"users":{"zalando":["superuser","createdb"]},"maintenanceWindows":["Mon:01:00-06:00","Sat:00:00-04:00","05:00-05:15"],"clone":{"cluster":"acid-batman"}},"status":{"PostgresClusterStatus":""}}`),
+		marshal: []byte(`{"kind":"Postgresql","apiVersion":"acid.zalan.do/v1","metadata":{"name":"acid-testcluster1","creationTimestamp":null},"spec":{"postgresql":{"version":"9.6","parameters":{"log_statement":"all","max_connections":"10","shared_buffers":"32MB"}},"pod_priority_class_name":"spilo-pod-priority","volume":{"size":"5Gi","storageClass":"SSD", "subPath": "subdir"},"enableShmVolume":false,"patroni":{"initdb":{"data-checksums":"true","encoding":"UTF8","locale":"en_US.UTF-8"},"pg_hba":["hostssl all all 0.0.0.0/0 md5","host    all all 0.0.0.0/0 md5"],"ttl":30,"loop_wait":10,"retry_timeout":10,"maximum_lag_on_failover":33554432,"slots":{"permanent_logical_1":{"database":"foo","plugin":"pgoutput","type":"logical"}}},"resources":{"requests":{"cpu":"10m","memory":"50Mi"},"limits":{"cpu":"300m","memory":"3000Mi"}},"teamId":"acid","allowedSourceRanges":["127.0.0.1/32"],"numberOfInstances":2,"users":{"zalando":["superuser","createdb"]},"maintenanceWindows":["Mon:01:00-06:00","Sat:00:00-04:00","05:00-05:15"],"clone":{"cluster":"acid-batman"}},"status":{"PostgresClusterStatus":""}}`),
 		err:     nil},
 	// example with teamId set in input
 	{
--- a/pkg/util/teams/teams_test.go
+++ b/pkg/util/teams/teams_test.go
@ -24,7 +24,7 @@ var teamsAPItc = []struct {
 	{`{
 "dn": "cn=100100,ou=official,ou=foobar,dc=zalando,dc=net",
 "id": "acid",
-"id_name": "ACID",
+"id_name": "acid",
 "team_id": "111222",
 "type": "official",
 "name": "Acid team name",
@ -70,7 +70,7 @@ var teamsAPItc = []struct {
 		&Team{
 			Dn:           "cn=100100,ou=official,ou=foobar,dc=zalando,dc=net",
 			ID:           "acid",
-			TeamName:     "ACID",
+			TeamName:     "acid",
 			TeamID:       "111222",
 			Type:         "official",
 			FullName:     "Acid team name",