postgres-operator

Commit Graph

Author	SHA1	Message	Date
Christian Rohmann	743aade45f	Use finalizers to avoid losing delete events and to ensure full resource cleanup (#941 ) * Add Finalizer functions to Cluster; add/remove finalizer on Create/Delete events * Check if clusters have a deletion timestamp and we missed that event. Run Delete() and remove finalizer when done. * Fix nil handling when using Service from map; Remove Service, Endpoint entries from their maps - just like with Secrets * Add handling of ResourceNotFound to all delete functions (Service, Endpoint, LogicalBackup CronJob, PDB and Secret) - this is not a real error when deleting things * Emit events when there are issues deleting resources to the user is informed * Depend the removal of the Finalizer on all resources being deleted successfully first. Otherwise the next sync run should let us try again * Add config option to enable finalizers * Removed dangling whitespace at EOL * config.EnableFinalizers is a bool pointer --------- Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2024-01-04 16:22:53 +01:00
Felix Kunde	4c07494ac7	deprecate ClusterName field of Postgresql type and remove team from REST endpoints (#2015 ) * deprecate ClusterName field of Postgresql type * remove for teamId from operator API endpints /status /logs /history * update dns_format_string and yaml template in UI	2022-08-29 15:00:25 +02:00
Felix Kunde	89375186b3	use old LB DNS format when teamId prefix is disabled (#2011 ) * use old LB DNS format when teamId prefix is disabled * support both old and new format in external-dns * switch dns template from team to namespace	2022-08-25 18:29:54 +02:00
Felix Kunde	3bfd63cbe6	Make teamId in cluster name optional (#2001 ) * making teamId in clustername optional * move teamId check to addCluster function	2022-08-24 10:12:50 +02:00
mujx	5dc963105c	Fix typo (#1897 )	2022-05-18 17:47:05 +02:00
Felix Kunde	60e0685c32	define readinessProbe on statefulSet (#1825 ) * define readinessProbe on statefulSet * do not error out on deleting Patroni cluster objects * change delete order for patroni objects	2022-03-30 18:19:34 +02:00
Dmitry Volodin	da83982313	inherited_labels and inherited_annotations not passed to PVC (#1784 ) * inherited_labels and inherited_annotations not passed to PVC * Fix developer.md related to the local operator deployment	2022-03-01 17:07:37 +01:00
Jan Mussler	636a9a8191	Support major version upgrade via manifest and global upgrades via min version (#1372 ) Support major version upgrade trigger via manifest. There is `off` `manual` and `full`. Manual is what you expect, and full will auto upgrade clusters below a certain threshold.	2021-02-25 11:42:43 +01:00
Felix Kunde	6a97316a69	Support inherited annotations for all major objects (#1236 ) * add comments where inherited annotations could be added * add inheritedAnnotations feature * return nil if no annotations are set * minor changes * first downscaler then inherited annotations * add unit test for inherited annotations * add pvc to test + minor changes * missing comma * fix nil map assignment * set annotations in the same order it is done in other places * replace acidClientSet with acid getters in K8s client * more fixes on clientSet vs getters * minor changes * remove endpoints from annotation test * refine unit test - but deployment and sts are still empty * fix checkinng sts and deployment * make annotations setter one liners * no need for len check anymore Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>	2020-12-11 16:34:01 +01:00
Rafia Sabih	49158ecb68	Connection pooler for replica (#1127 ) * Enable connection pooler for replica * Refactor code for connection pooler - Move all the relevant code to a separate file - Move all the related tests to a separate file - Avoid using cluster where not required - Simplify the logic in sync and other methods - Cleanup of duplicated or unused code * Fix labels for the replica pods * Update deleteConnectionPooler to include role * Adding test cases and other changes - Fix unit test and delete secret when required only - Make sure we use empty fresh cluster for every test case. * enhance e2e test * Disable pooler in complete manifest as this is source for e2e too an creates unnecessary pooler setups. Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de> Co-authored-by: Jan Mussler <janm81@gmail.com>	2020-11-13 14:52:21 +01:00
Jan Mussler	c694a72352	Make failure in retry a warning not an error. (#1188 )	2020-10-29 13:12:25 +01:00
Jan Mussler	3a86dfc8bb	End 2 End tests speedup (#1180 ) * Improving end 2 end tests, especially speed of execution and error, by implementing proper eventual asserts and timeouts. * Add documentation for running individual tests * Fixed String encoding in Patorni state check and error case * Printing config as multi log line entity, makes it readable and grepable on startup * Cosmetic changes to logs. Removed quotes from diff. Move all object diffs to text diff. Enabled padding for log level. * Mount script with tools for easy logaccess and watching objects. * Set proper update strategy for Postgres operator deployment. * Move long running test to end. Move pooler test to new functions. * Remove quote from valid K8s identifiers.	2020-10-28 10:04:33 +01:00
Felix Kunde	3ddc56e5b9	allow delete only if annotations meet configured criteria (#1069 ) * define annotations for delete protection * change log level and reduce log lines for e2e tests * reduce wait_for_pod_start even further	2020-08-13 16:36:22 +02:00
Felix Kunde	0c6655a22d	skip creation later to improve visibility of errors (#1013 ) * try to emit error for missing team name in cluster name * skip creation after new cluster object * move SetStatus to k8sclient and emit event when skipping creation and rename to SetPostgresCRDStatus Co-authored-by: Felix Kunde <felix.kunde@zalando.de>	2020-06-17 13:32:16 +02:00
Rafia Sabih	d52296c323	Propagate annotations to the StatefulSet (#932 ) * Initial commit * Corrections - set the type of the new configuration parameter to be array of strings - propagate the annotations to statefulset at sync * Enable regular expression matching * Improvements -handle rollingUpdate flag -modularize code -rename config parameter name * fix merge error * Pass annotations to connection pooler deployment * update code-gen * Add documentation and update manifests * add e2e test and introduce option in configmap * fix service annotations test * Add unit test * fix e2e tests * better key lookup of annotations tests * add debug message for annotation tests * Fix typos * minor fix for looping * Handle update path and renaming - handle the update path to update sts and connection pooler deployment. This way no need to wait for sync - rename the parameter to downscaler_annotations - handle other review comments * another try to fix python loops * Avoid unneccessary update events * Update manifests * some final polishing * fix cluster_test after polishing Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de> Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-05-04 14:46:56 +02:00
Christian Rohmann	21b9b6fcbe	Emit K8S events to the postgresql CR as feedback to the requestor / user (#896 ) * Add EventsGetter to KubeClient to enable to sending K8S events * Add eventRecorder to the controller, initialize it and hand it down to cluster via its constructor to enable it to emit events this way * Add first set of events which then go to the postgresql custom resource the user interacts with to provide some feedback * Add right to "create" events to operator cluster role * Adapt cluster tests to new function sigurature with eventRecord (via NewFakeRecorder) * Get a proper reference before sending events to a resource Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>	2020-04-27 08:22:07 +02:00
Felix Kunde	66f2cda87f	Move operator to go 1.14 (#882 ) * update go modules march 2020 * update to GO 1.14 * reflect k8s client API changes	2020-03-30 15:50:17 +02:00
Felix Kunde	cf829df1a4	define ownership between operator and clusters via annotation (#802 ) * define ownership between operator and postgres clusters * add documentation * add unit test	2020-03-17 16:34:31 +01:00
Felix Kunde	b24da3201c	bump version to 1.4.0 + some polishing (#839 ) * bump version to 1.4.0 + some polishing * align version for UI chart * update user docs to warn for standby replicas * minor log message changes for RBAC resources	2020-02-25 09:50:54 +01:00
Dmitry Dolgov	647a4d3023	Remove service accounts cache (#685 ) For optimization purposes operator was creating a cache map to remember if service accounts and role binding was deployed to a namespace. This could lead to a problem, when a namespace was deleted, since this cache was not synchronized. For the sake of correctness remove the cache, and check every time if required service account and rbac is present. In the normal case this introduces an overhead of two API calls per an event (one to get a service accounts, one to get a role binding), which should not be a problem, unless proven otherwise.	2019-10-11 11:06:14 +02:00
Dmitry Dolgov	baae1887b3	Replace glide with Go modules (#544 ) * And attempt to build with modules and remove glide * new tools.go file to get code-generator dependency + updated codegen + remove Glide files and update docs	2019-10-02 16:18:55 +02:00
Felix Kunde	31e568157b	reflect change in github url (#496 ) Project was moved from the incubator to the Zalando main org, hence the rename	2019-02-25 11:26:55 +01:00
Maxim Ivanov	1109c861fb	Report new Postgres CR error when previously incorrect one is being updated (#449 )	2019-01-18 13:36:44 +01:00
Noah Kantrowitz	688d252752	Some tweaks to ensure compat with newer Go. (#383 )	2018-09-17 10:13:07 +02:00
Noah Kantrowitz	0b75a89920	Fix the casing of github.com/Sirupsen/logrus to match what the project itself uses. (#380 ) Dep enforces this.	2018-09-06 10:26:48 +02:00
Oleksii Kliukin	e1ed4b847d	Use code-generation for CRD API and deepcopy methods (#369 ) Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on. Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API. All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1 Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator. Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it. Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes. Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files. Update client-go dependencies. Minor reformatting and renaming.	2018-08-15 17:22:25 +02:00
Oleksii Kliukin	199aa6508c	Populate list of clusters in the controller at startup. (#364 ) Assign the list of clusters in the controller with the up-to-date list of Postgres manifests on Kubernetes during the startup. Node migration routines launched asynchronously to the cluster processing rely on an up-to-date list of clusters in the controller to detect clusters affected by the migration of the node and lock them when doing migration of master pods. Without the initial list the operator was subject to race conditions like the one described at https://github.com/zalando-incubator/postgres-operator/issues/363 Restructure the code to decouple list cluster function required by the postgresql informer from the one that emits cluster sync events. No extra work is introduced, since cluster sync already runs in a separate goroutine (clusterResync). Introduce explicit initial cluster sync at the end of acquireInitialListOfClusters instead of relying on an implicit one coming from list function of the PostgreSQL informer. Some minor refactoring. Review by @zerg-junior	2018-08-08 11:00:56 +02:00
Oleksii Kliukin	b06186eb41	Linter-induced code refactoring, run round 2. (#360 ) Run more linters in the gometalinter, i.e. deadcode, megacheck, nakedret, dup. More consistent code formatting, remove two dead functions, eliminate naked a bunch of naked returns, refactor a few functions to avoid code duplication.	2018-08-06 12:09:19 +02:00
Oleksii Kliukin	d2d3f21dc2	Client go upgrade v6 (#352 ) There are shortcuts in this code, i.e. we created the deepcopy function by using the deepcopy package instead of the generated code, that will be addressed once migrated to client-go v8. Also, some objects, particularly statefulsets, are still taken from v1beta, this will also be addressed in further commits once the changes are stabilized.	2018-08-01 11:08:01 +02:00
Oleksii Kliukin	0181a1b5b1	Introduce a repair scan to fix failing clusters (#304 ) A repair is a sync scan that acts only on those clusters that indicate that the last add, update or sync operation on them has failed. It is supposed to kick in more frequently than the repair scan. The repair scan still remains to be useful to fix the consequences of external actions (i.e. someone deletes a postgres-related service by mistake) unbeknownst to the operator. The repair scan is controlled by the new repair_period parameter in the operator configuration. It has to be at least 2 times more frequent than a sync scan to have any effect (a normal sync scan will update both last synced and last repaired attributes of the controller, since repair is just a sync underneath). A repair scan could be queued for a cluster that is already being synced if the sync period exceeds the interval between repairs. In that case a repair event will be discarded once the corresponding worker finds out that the cluster is not failing anymore. Review by @zerg-junior	2018-07-24 11:21:45 +02:00
zerg-junior	417f13c0bd	Submit RBAC credentials during initial Event processing (#344 ) * During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.	2018-07-19 16:40:40 +02:00
Oleksii Kliukin	3a9378d3b8	Allow configuring the operator via the YAML manifest. (#326 ) * Up until now, the operator read its own configuration from the configmap. That has a number of limitations, i.e. when the configuration value is not a scalar, but a map or a list. We use a custom code based on github.com/kelseyhightower/envconfig to decode non-scalar values out of plain text keys, but that breaks when the data inside the keys contains both YAML-special elememtns (i.e. commas) and complex quotes, one good example for that is search_path inside `team_api_role_configuration`. In addition, reliance on the configmap forced a flag structure on the configuration, making it hard to write and to read (see https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778). The changes allow to supply the operator configuration in a proper YAML file. That required registering a custom CRD to support the operator configuration and provide an example at manifests/postgresql-operator-default-configuration.yaml. At the moment, both old configmap and the new CRD configuration is supported, so no compatibility issues, however, in the future I'd like to deprecate the configmap-based configuration altogether. Contrary to the configmap-based configuration, the CRD one doesn't embed defaults into the operator code, however, one can use the manifests/postgresql-operator-default-configuration.yaml as a starting point in order to build a custom configuration. Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters used to create the CRD were taken from the operator configuration, which is not possible if the configuration itself is stored in the CRD object, I've added the ability to specify them as environment variables `CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively. Per review by @zerg-junior and @Jan-M.	2018-07-16 16:20:46 +02:00
Oleksii Kliukin	16a710a99a	Avoid possible skipping SYNC events. OB1 bug in the condition deciding whether to sync.	2018-05-31 18:29:15 +02:00
Oleksii Kliukin	27c7245fed	Avoid terminating delete on errors. When there is an error happening upon deletion of the Kubernetes object belonging to the cluster being removed, it makes no sense to abort the deletion: the manifest will be removed anyway, therefore all the objects after the one we aborted at will stay forever.	2018-05-18 18:10:37 +02:00
Oleksii Kliukin	da4cc2705b	Use deepcopy to propagate the spec to clusters. Avoid sharing pointers to the same spec data between the informer and the clusters. The only catch is that the error field is cleared during deepcopy, since it is an interface that may contain private fields that cannot be copied, however, the error is only used when the manifest is parsed and before it is queued, therefore, we never refer to that field in the cluster structure.	2018-05-17 16:05:12 +02:00
Oleksii Kliukin	ebe50abccb	Make sure we never modify informer cached manifest. (#290 ) `987b434` introduced a new function that modifies the cluster spec in memory before the cluster processes it. Unfortunately, the instance being modified appeared to be the one stored internally in the PostgresInformer, resulting in those modifications to be propagated with futher cluster events and producing update loops in some occasions. This commit makes sure we copy the spec before putting it into the clusterEventQueues.	2018-05-16 18:23:31 +02:00
Oleksii Kliukin	987b43456b	Deprecate old LB options, fix endpoint sync. (#287 ) * Depreate old LB options, fix endpoint sync. - deprecate useLoadBalancer, replicaLoadBalancer from the manifest and enable_load_balancer from the operator configuration. The old operator configuration options become no-op with this commit. For the old manifest options, `useLoadBalancer` and `replicaLoadBalancer` are still consulted, but only in the absense of the new ones (enableMasterLoadBalancer and enableReplicaLoadBalancer). - Make sure the endpoint being created during the sync receives proper addresses subset. This is more critical for the replicas, as for the masters Patroni will normally re-create the endpoint before the operator. - Avoid creating the replica endpoint, since it will be created automatically by the corresponding service. - Update the README and unit tests. Code review by @mgomezch and @zerg-junior	2018-05-15 15:19:18 +02:00
Oleksii Kliukin	f18bb6eaaa	Make errors in the cluster list function visible. Sometimes the operator does not pick up clusters right away when they are created. The change attempts to shed light on the reason behind that.	2018-02-22 16:45:10 +01:00
Sergey Dudoladov	ea84f9d577	Rename the configmap 'namespace' entry to avoid confusion with the map's owm namespace	2018-02-06 15:09:00 +01:00
Murat Kabilov	86803406db	use sync methods while updating the cluster	2017-11-03 12:00:43 +01:00
Oleksii Kliukin	eba23279c8	Kube cluster upgrade	2017-10-19 10:49:42 +02:00
Murat Kabilov	3b32265258	Set status of the cluster on sync fail/success	2017-10-12 15:10:42 +02:00
Murat Kabilov	83c8d6c419	Extend diagnostic api with worker status info	2017-10-11 12:26:09 +02:00
Murat Kabilov	a35e9c6119	move from tpr to crd	2017-10-06 15:12:08 +02:00
Murat Kabilov	9a66e09b88	cluster history api endpoint	2017-09-26 14:30:45 +02:00
Murat Kabilov	f77852a152	store time of the cluster event	2017-09-26 13:17:23 +02:00
Murat Kabilov	4db5bd13d1	delete cluster key from the clusters list only when delete procedure is finished	2017-09-04 18:48:03 +02:00
Murat Kabilov	83760ebbef	discard cluster events from the queue on cluster delete; delete cluster from the clusters map before deleting cluster itself	2017-08-17 12:24:23 +02:00
Murat Kabilov	dad8e2f49f	make cluster event queue consumption non-blocking	2017-08-15 16:03:19 +02:00
Murat Kabilov	51fdfb90f7	log cluster and controller events in the ringlog via logrus hook	2017-08-15 12:16:09 +02:00

1 2

83 Commits