Commit Graph

66 Commits

Author SHA1 Message Date
erthalion b44bf1dd3f Merge branch 'master' into feature/actions-for-secret-syncing 2019-01-23 17:30:15 +01:00
Maxim Ivanov 1109c861fb Report new Postgres CR error when previously incorrect one is being updated (#449) 2019-01-18 13:36:44 +01:00
erthalion dc8f169a00 Merge branch 'master' into feature/actions-for-secret-syncing 2018-09-24 10:28:19 +02:00
erthalion b0756a1f75 Add a simple tests for CreateSecret 2018-09-21 11:36:01 +02:00
Noah Kantrowitz 688d252752 Some tweaks to ensure compat with newer Go. (#383) 2018-09-17 10:13:07 +02:00
erthalion 43a0449c8f Improvements per review 2018-09-14 15:24:01 +02:00
erthalion 5563078a12 Add action-based version of secret sync 2018-09-12 15:40:40 +02:00
Noah Kantrowitz 0b75a89920 Fix the casing of github.com/Sirupsen/logrus to match what the project itself uses. (#380)
Dep enforces this.
2018-09-06 10:26:48 +02:00
Oleksii Kliukin e1ed4b847d
Use code-generation for CRD API and deepcopy methods (#369)
Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on.

Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API.

All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1

Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator.

Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into

Change the type of  the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it.

Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with
the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes.

Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files.

Update client-go dependencies.

Minor reformatting and renaming.
2018-08-15 17:22:25 +02:00
Oleksii Kliukin 199aa6508c
Populate list of clusters in the controller at startup. (#364)
Assign the list of clusters in the controller with the up-to-date list
of Postgres manifests on Kubernetes during the startup.

Node migration routines launched asynchronously to the cluster
processing rely on an up-to-date list of clusters in the controller to
detect clusters affected by the migration of the node and lock them
when doing migration of master pods. Without the initial list the
operator was subject to race conditions like the one described at
https://github.com/zalando-incubator/postgres-operator/issues/363

Restructure the code to decouple list cluster function required by the
postgresql informer from the one that emits cluster sync events. No
extra work is introduced, since cluster sync already runs in a separate
goroutine (clusterResync).

Introduce explicit initial cluster sync at the end of
acquireInitialListOfClusters instead of relying on an implicit one
coming from list function of the PostgreSQL informer.

Some minor refactoring.

Review by @zerg-junior
2018-08-08 11:00:56 +02:00
Oleksii Kliukin b06186eb41
Linter-induced code refactoring, run round 2. (#360)
Run more linters in the gometalinter, i.e. deadcode, megacheck,
nakedret, dup.

More consistent code formatting, remove two dead functions, eliminate
naked a bunch of naked returns, refactor a few functions to avoid code
duplication.
2018-08-06 12:09:19 +02:00
Oleksii Kliukin d2d3f21dc2 Client go upgrade v6 (#352)
There are shortcuts in this code, i.e. we created the deepcopy function
by using the deepcopy package instead of the generated code, that will
be addressed once migrated to client-go v8. Also, some objects,
particularly statefulsets, are still taken from v1beta, this will also
be addressed in further commits once the changes are stabilized.
2018-08-01 11:08:01 +02:00
Oleksii Kliukin 0181a1b5b1
Introduce a repair scan to fix failing clusters (#304)
A repair is a sync scan that acts only on those clusters that indicate
that the last add, update or sync operation on them has failed. It is
supposed to kick in more frequently than the repair scan. The repair
scan still remains to be useful to fix the consequences of external
actions (i.e. someone deletes a postgres-related service by mistake)
unbeknownst to the operator.

The repair scan is controlled by the new repair_period parameter in the
operator configuration. It has to be at least 2 times more frequent than
a sync scan to have any effect (a normal sync scan will update both last
synced and last repaired attributes of the controller, since repair is
just a sync underneath).

A repair scan could be queued for a cluster that is already being synced
if the sync period exceeds the interval between repairs. In that case a
repair event will be discarded once the corresponding worker finds out
that the cluster is not failing anymore.

Review by @zerg-junior
2018-07-24 11:21:45 +02:00
zerg-junior 417f13c0bd
Submit RBAC credentials during initial Event processing (#344)
* During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.
2018-07-19 16:40:40 +02:00
Oleksii Kliukin 3a9378d3b8
Allow configuring the operator via the YAML manifest. (#326)
* Up until now, the operator read its own configuration from the
configmap.  That has a number of limitations, i.e. when the
configuration value is not a scalar, but a map or a list. We use a
custom code based on github.com/kelseyhightower/envconfig to decode
non-scalar values out of plain text keys, but that breaks when the data
inside the keys contains both YAML-special elememtns (i.e. commas) and
complex quotes, one good example for that is search_path inside
`team_api_role_configuration`. In addition, reliance on the configmap
forced a flag structure on the configuration, making it hard to write
and to read (see
https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778).

The changes allow to supply the operator configuration in a proper YAML
file. That required registering a custom CRD to support the operator
configuration and provide an example at
manifests/postgresql-operator-default-configuration.yaml. At the moment,
both old configmap and the new CRD configuration is supported, so no
compatibility issues, however, in the future I'd like to deprecate the
configmap-based configuration altogether. Contrary to the
configmap-based configuration, the CRD one doesn't embed defaults into
the operator code, however, one can use the
manifests/postgresql-operator-default-configuration.yaml as a starting
point in order to build a custom configuration.

Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters
used to create the CRD were taken from the operator configuration, which
is not possible if the configuration itself is stored in the CRD object,
I've added the ability to specify them as environment variables
`CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively.

Per review by @zerg-junior  and  @Jan-M.
2018-07-16 16:20:46 +02:00
Oleksii Kliukin 16a710a99a Avoid possible skipping SYNC events.
OB1 bug in the condition deciding whether to sync.
2018-05-31 18:29:15 +02:00
Oleksii Kliukin 27c7245fed Avoid terminating delete on errors.
When there is an error happening upon deletion of the Kubernetes object
belonging to the cluster being removed, it makes no sense to abort the
deletion: the manifest will be removed anyway, therefore all the objects
after the one we aborted at will stay forever.
2018-05-18 18:10:37 +02:00
Oleksii Kliukin da4cc2705b Use deepcopy to propagate the spec to clusters.
Avoid sharing pointers to the same spec data between the informer
and the clusters. The only catch is that the error field is cleared
during deepcopy, since it is an interface that may contain private
fields that cannot be copied, however, the error is only used when
the manifest is parsed and before it is queued, therefore, we never
refer to that field in the cluster structure.
2018-05-17 16:05:12 +02:00
Oleksii Kliukin ebe50abccb
Make sure we never modify informer cached manifest. (#290)
987b434 introduced a new function that modifies the cluster spec in
memory before the cluster processes it. Unfortunately, the instance
being modified appeared to be the one stored internally in the
PostgresInformer, resulting in those modifications to be propagated with
futher cluster events and producing update loops in some occasions.

This commit makes sure we copy the spec before putting it into the
clusterEventQueues.
2018-05-16 18:23:31 +02:00
Oleksii Kliukin 987b43456b
Deprecate old LB options, fix endpoint sync. (#287)
* Depreate old LB options, fix endpoint sync.

- deprecate useLoadBalancer, replicaLoadBalancer from the manifest
  and enable_load_balancer from the operator configuration. The old
  operator configuration options become no-op with this commit. For
  the old manifest options, `useLoadBalancer` and `replicaLoadBalancer`
  are still consulted,  but only in the absense of the new ones
  (enableMasterLoadBalancer and enableReplicaLoadBalancer).

- Make sure the endpoint being created during the sync receives proper
  addresses subset. This is more critical for the replicas, as for the
  masters Patroni will normally re-create the endpoint before the
  operator.

- Avoid creating the replica endpoint, since it will be created automatically
  by the corresponding service.
- Update the README and unit tests.

Code review by @mgomezch and @zerg-junior
2018-05-15 15:19:18 +02:00
Oleksii Kliukin f18bb6eaaa Make errors in the cluster list function visible.
Sometimes the operator does not pick up clusters right away when
they are created. The change attempts to shed light on the
reason behind that.
2018-02-22 16:45:10 +01:00
Sergey Dudoladov ea84f9d577 Rename the configmap 'namespace' entry to avoid confusion with the map's owm namespace 2018-02-06 15:09:00 +01:00
Murat Kabilov 86803406db
use sync methods while updating the cluster 2017-11-03 12:00:43 +01:00
Oleksii Kliukin eba23279c8 Kube cluster upgrade 2017-10-19 10:49:42 +02:00
Murat Kabilov 3b32265258 Set status of the cluster on sync fail/success 2017-10-12 15:10:42 +02:00
Murat Kabilov 83c8d6c419 Extend diagnostic api with worker status info 2017-10-11 12:26:09 +02:00
Murat Kabilov a35e9c6119 move from tpr to crd 2017-10-06 15:12:08 +02:00
Murat Kabilov 9a66e09b88 cluster history api endpoint 2017-09-26 14:30:45 +02:00
Murat Kabilov f77852a152 store time of the cluster event 2017-09-26 13:17:23 +02:00
Murat Kabilov 4db5bd13d1 delete cluster key from the clusters list only when delete procedure is finished 2017-09-04 18:48:03 +02:00
Murat Kabilov 83760ebbef discard cluster events from the queue on cluster delete;
delete cluster from the clusters map before deleting cluster itself
2017-08-17 12:24:23 +02:00
Murat Kabilov dad8e2f49f make cluster event queue consumption non-blocking 2017-08-15 16:03:19 +02:00
Murat Kabilov 51fdfb90f7 log cluster and controller events in the ringlog via logrus hook 2017-08-15 12:16:09 +02:00
Murat Kabilov 5470f20be4 always pass a cluster name as a logger field 2017-08-15 10:29:18 +02:00
Murat Kabilov e26db66cb5 start all the log messages with lowercase letters 2017-08-15 10:12:36 +02:00
Murat Kabilov cf663cb841 Fix golint warnings 2017-08-01 16:08:56 +02:00
Murat Kabilov 6183203f4d fix cluster event queue processing 2017-07-31 10:30:49 +02:00
Murat Kabilov 3ad4b127c4 Fix graceful shutdown
graceful shutdown of goroutines on operator exit
2017-07-27 12:54:22 +02:00
Murat Kabilov 1f8b37f33d Make use of kubernetes client-go v4
* client-go v4.0.0-beta0
* remove unnecessary methods for tpr object
* rest client: use interface instead of structure pointer
* proper names for constants; some clean up for log messages
* remove teams api client from controller and make it per cluster
2017-07-25 15:25:17 +02:00
Murat Kabilov e104a67260 Fix resync of the clusters 2017-06-08 11:51:48 +02:00
Oleksii Kliukin bc0e9ab4bc Add error checks per report from errcheck-ng 2017-06-08 10:41:44 +02:00
Oleksii Kliukin 7b0ca31bfb Implements EBS volume resizing #35.
In order to support volumes different from EBS and filesystems other than EXT2/3/4 the respective code parts were implemented as interfaces. Adding the new resize for the volume or the filesystem will require implementing the interface, but no other changes in the cluster code itself.

Volume resizing first changes the EBS and the filesystem, and only afterwards is reflected in the Kubernetes "PersistentVolume" object. This is done deliberately to be able to check if the volume needs resizing by peeking at the Size of the PersistentVolume structure. We recheck, nevertheless, in the EBSVolumeResizer, whether the actual EBS volume size doesn't match the spec, since call to the AWS ModifyVolume is counted against the resize limit of once every 6 hours, even for those calls that shouldn't result in an actual resize (i.e. when the size matches the one for the running volume).

As a collateral, split the constants into multiple files, move the volume code into a separate file and fix minor issues related to the error reporting.
2017-06-06 13:53:27 +02:00
Murat Kabilov 009db16c7c Use queues for the pod events (#30) 2017-05-23 15:24:14 +02:00
Murat Kabilov c470bd6646 reset cluster error on successful update or sync (#29) 2017-05-22 15:45:38 +02:00
Oleksii Kliukin bc17897478 Run sync cluster when previous add failed. (#28) 2017-05-22 15:27:26 +02:00
Oleksii Kliukin afce38f6f0 Fix error messages (#27)
Use lowercase for kubernetes objects
Use %v instead of %s for errors
Start error messages with a lowercase letter.
2017-05-22 14:12:06 +02:00
Murat Kabilov d34273543e Fix the golint, gosimple warnings 2017-05-18 17:38:54 +02:00
Murat Kabilov 233e8529c1 Return error instead of logging it 2017-05-18 17:24:44 +02:00
Murat Kabilov 356be8f0f1 skip clusters with invalid spec 2017-05-16 16:46:37 +02:00
Murat Kabilov 92d7fbf372 replace github.bus.zalan.do with github.cm/zalando-incubator 2017-05-12 11:50:16 +02:00