postgres-operator

Commit Graph

Author	SHA1	Message	Date
Jan Mussler	4a88f00a3f	Full AWS gp3 support for iops and througput config. (#1261 ) Support new AWS EBS volume type `gp3` with `iops` and `throughput` in the manifest. Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2021-01-25 10:07:18 +01:00
Felix Kunde	929075814a	diff SecurityContext of containers (#1255 ) * diff SecurityContext of containers * change log messages to use "does not" vs "doesn't"	2020-12-15 10:06:53 +01:00
Felix Kunde	6a97316a69	Support inherited annotations for all major objects (#1236 ) * add comments where inherited annotations could be added * add inheritedAnnotations feature * return nil if no annotations are set * minor changes * first downscaler then inherited annotations * add unit test for inherited annotations * add pvc to test + minor changes * missing comma * fix nil map assignment * set annotations in the same order it is done in other places * replace acidClientSet with acid getters in K8s client * more fixes on clientSet vs getters * minor changes * remove endpoints from annotation test * refine unit test - but deployment and sts are still empty * fix checkinng sts and deployment * make annotations setter one liners * no need for len check anymore Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>	2020-12-11 16:34:01 +01:00
Jan Mussler	549f71bb49	Support EBS gp2 to gp3 migration on sync for below 1tb volumes (#1242 ) * initial commit for gp3 migration. * Default volume migration done. * Added Gomock and one test case with mock. * Dep update. * more changes for code gen. * push fake package. * Rename var. * Changes to Makefile and return value. * Macke mocks phony due to overlap in foldername. * Learning as one goes. Initialize map. * Wrong toggle. * Expect modify call. * Fix mapping of ids in test. * Fix volume id. * volume ids. * Fixing test setup. Late night... * create all pvs. * Fix test case config. * store volumes and compare. * More logs. * Logging of migration action. * Ensure to log errors. * Log warning if modify failed, e.g. due to ebs volume state. * Add more output. * Skip local e2e tests. * Reflect k8s volume id in test data. Extract aws volume id from k8s value. * Finalizing ebs migration. * More logs. describe fails. * Fix non existing fields in gp2 discovery. * Remove nothing to do flag for migration. * Final commit for migration. * add new options to all places Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-12-11 15:52:32 +01:00
Sergey Dudoladov	6f5751fe55	raise log level for malformed secrets (#1235 ) Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>	2020-11-27 18:47:50 +01:00
Rafia Sabih	49158ecb68	Connection pooler for replica (#1127 ) * Enable connection pooler for replica * Refactor code for connection pooler - Move all the relevant code to a separate file - Move all the related tests to a separate file - Avoid using cluster where not required - Simplify the logic in sync and other methods - Cleanup of duplicated or unused code * Fix labels for the replica pods * Update deleteConnectionPooler to include role * Adding test cases and other changes - Fix unit test and delete secret when required only - Make sure we use empty fresh cluster for every test case. * enhance e2e test * Disable pooler in complete manifest as this is source for e2e too an creates unnecessary pooler setups. Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de> Co-authored-by: Jan Mussler <janm81@gmail.com>	2020-11-13 14:52:21 +01:00
Felix Kunde	3fed565328	check resize mode on update events (#1194 ) * check resize mode on update events * add unit test for PVC resizing * set resize mode to pvc in charts and manifests * add test for quantityToGigabyte * just one debug line for syncing volumes * extend test and update log msg	2020-11-11 13:22:43 +01:00
Sergey Dudoladov	4f3bb6aa8c	Remove operator checks that prevent PG major version upgrade (#1160 ) * remove checks that prevent major version upgrade Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>	2020-11-02 16:49:29 +01:00
Jan Mussler	3a86dfc8bb	End 2 End tests speedup (#1180 ) * Improving end 2 end tests, especially speed of execution and error, by implementing proper eventual asserts and timeouts. * Add documentation for running individual tests * Fixed String encoding in Patorni state check and error case * Printing config as multi log line entity, makes it readable and grepable on startup * Cosmetic changes to logs. Removed quotes from diff. Move all object diffs to text diff. Enabled padding for log level. * Mount script with tools for easy logaccess and watching objects. * Set proper update strategy for Postgres operator deployment. * Move long running test to end. Move pooler test to new functions. * Remove quote from valid K8s identifiers.	2020-10-28 10:04:33 +01:00
Dmitry Dolgov	1f5d0995a5	Lookup function installation (#1171 ) * Lookup function installation Due to reusing a previous database connection without closing it, lookup function installation process was skipping the first database in the list, installing twice into postgres db instead. To prevent that, make internal initDbConnWithName to overwrite a connection object, and return the same object only from initDbConn, which is sort of public interface. Another solution for this would be to modify initDbConnWithName to return a connection object and then generate one temporary connection for each db. It sound feasible but after one attempt it seems it requires a bit more changes around (init, close connections) and doesn't bring anything significantly better on the table. In case if some future changes will prove this wrong, do not hesitate to refactor. Change retry strategy to more insistive one, namely: * retry on the next sync even if we failed to process one database and install pooler appliance. * perform the whole installation unconditionally on update, since the list of target databases could be changed. And for the sake of making it even more robust, also log the case when operator decides to skip installation. Extend connection pooler e2e test with verification that all dbs have required schema installed.	2020-10-19 16:18:58 +02:00
neelasha-09	ab95eaa6ef	Fixes #1130 (#1139 ) * Fixes #1130 Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-09-22 17:16:05 +02:00
Felix Kunde	0508266219	Remove all secrets on delete incl. pooler (#1091 ) * fix syncSecrets and remove pooler secret * update log for deleteSecret * use c.credentialSecretName(username) * minor fix	2020-08-10 18:26:26 +02:00
Igor Yanchenko	88735a798a	Resize volume by changing pvc size if enabled in config. (#958 ) * Try to resize pvc if resizing pv has failed * added config option to switch between storage resize strategies * changes according to requests * Update pkg/controller/operator_config.go Co-authored-by: Felix Kunde <felix-kunde@gmx.de> * enable_storage_resize documented added examples to the default configuration and helm value files * enable_storage_resize renamed to volume_resize_mode, off by default * volume_resize_mode renamed to storage_resize_mode * Update pkg/apis/acid.zalan.do/v1/crds.go * pkg/cluster/volumes.go updated * Update docs/reference/operator_parameters.md * Update manifests/postgresql-operator-default-configuration.yaml * Update pkg/controller/operator_config.go * Update pkg/util/config/config.go * Update charts/postgres-operator/values-crd.yaml * Update charts/postgres-operator/values.yaml * Update docs/reference/operator_parameters.md * added logging if no changes required Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-07-03 10:53:37 +02:00
Felix Kunde	0c6655a22d	skip creation later to improve visibility of errors (#1013 ) * try to emit error for missing team name in cluster name * skip creation after new cluster object * move SetStatus to k8sclient and emit event when skipping creation and rename to SetPostgresCRDStatus Co-authored-by: Felix Kunde <felix.kunde@zalando.de>	2020-06-17 13:32:16 +02:00
Rafia Sabih	d52296c323	Propagate annotations to the StatefulSet (#932 ) * Initial commit * Corrections - set the type of the new configuration parameter to be array of strings - propagate the annotations to statefulset at sync * Enable regular expression matching * Improvements -handle rollingUpdate flag -modularize code -rename config parameter name * fix merge error * Pass annotations to connection pooler deployment * update code-gen * Add documentation and update manifests * add e2e test and introduce option in configmap * fix service annotations test * Add unit test * fix e2e tests * better key lookup of annotations tests * add debug message for annotation tests * Fix typos * minor fix for looping * Handle update path and renaming - handle the update path to update sts and connection pooler deployment. This way no need to wait for sync - rename the parameter to downscaler_annotations - handle other review comments * another try to fix python loops * Avoid unneccessary update events * Update manifests * some final polishing * fix cluster_test after polishing Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de> Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-05-04 14:46:56 +02:00
Felix Kunde	d76203b3f9	Bootstrapped databases with best practice role setup (#843 ) * PreparedDatabases with default role setup * merge changes from master * include preparedDatabases spec check when syncing databases * create a default preparedDB if not specified * add more default privileges for schemas * use empty brackets block for undefined objects * cover more default privilege scenarios and always define admin role * add DefaultUsers flag * support extensions and defaultUsers for preparedDatabases * remove exact version in deployment manifest * enable CRD validation for new field * update generated code * reflect code review * fix typo in SQL command * add documentation for preparedDatabases feature + minor changes * some datname should stay * add unit tests * reflect some feedback * init users for preparedDatabases also on update * only change DB default privileges on creation * add one more section in user docs * one more sentence	2020-04-29 10:56:06 +02:00
Sergey Dudoladov	cc635a02e3	Lazy upgrade of the Spilo image (#859 ) * initial implementation * describe forcing the rolling upgrade * make parameter name more descriptive * add missing pieces * address review * address review * fix bug in e2e tests * fix cluster name label in e2e test * raise test timeout * load spilo test image * use available spilo image * delete replica pod for lazy update test * fix e2e * fix e2e with a vengeance * lets wait for another 30m * print pod name in error msg * print pod name in error msg 2 * raise timeout, comment other tests * subsequent updates of config * add comma * fix e2e test * run unit tests before e2e * remove conflicting dependency * Revert "remove conflicting dependency" This reverts commit `65fc09054b`. * improve cdp build * dont run unit before e2e tests * Revert "improve cdp build" This reverts commit `e2a8fa12aa`. Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de> Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-04-29 10:07:14 +02:00
Christian Rohmann	21b9b6fcbe	Emit K8S events to the postgresql CR as feedback to the requestor / user (#896 ) * Add EventsGetter to KubeClient to enable to sending K8S events * Add eventRecorder to the controller, initialize it and hand it down to cluster via its constructor to enable it to emit events this way * Add first set of events which then go to the postgresql custom resource the user interacts with to provide some feedback * Add right to "create" events to operator cluster role * Adapt cluster tests to new function sigurature with eventRecord (via NewFakeRecorder) * Get a proper reference before sending events to a resource Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>	2020-04-27 08:22:07 +02:00
ReSearchITEng	5014eebfb2	when kubernetes_use_configmaps -> skip further endpoints actions even delete (#921 ) * further compatibility with k8sUseConfigMaps - skip further endpoints related actions * Update pkg/cluster/cluster.go thanks! Co-Authored-By: Felix Kunde <felix-kunde@gmx.de> * Update pkg/cluster/cluster.go Co-Authored-By: Felix Kunde <felix-kunde@gmx.de> * Update pkg/cluster/cluster.go Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-04-16 16:47:59 +02:00
Dmitry Dolgov	6a689cdc1c	Prevent empty syncs (#922 ) There is a possibility to pass nil as one of the specs and an empty spec into syncConnectionPooler. In this case it will perfom a syncronization because nil != empty struct. Avoid such cases and make it testable by returning list of syncronization reasons on top together with the final error.	2020-04-16 15:14:31 +02:00
Felix Kunde	b43b22dfcc	Call me pooler, not pool (#883 ) * rename pooler parts and add example to manifest * update codegen * fix manifest and add more details to docs * reflect renaming also in e2e tests	2020-04-01 10:34:03 +02:00
Felix Kunde	66f2cda87f	Move operator to go 1.14 (#882 ) * update go modules march 2020 * update to GO 1.14 * reflect k8s client API changes	2020-03-30 15:50:17 +02:00
Dmitry Dolgov	9dfa433363	Connection pooler (#799 ) Connection pooler support Add support for a connection pooler. The idea is to make it generic enough to be able to switch between different implementations (e.g. pgbouncer or odyssey). Operator needs to create a deployment with pooler and a service for it to access. For connection pool to work properly, a database needs to be prepared by operator, namely a separate user have to be created with an access to an installed lookup function (to fetch credential for other users). This setups is supposed to be used only by robot/application users. Usually a connection pool implementation is more CPU bounded, so it makes sense to create several pods for connection pool with more emphasize on cpu resources. At the moment there are no special affinity or tolerations assigned to bring those pods closer to the database. For availability purposes minimal number of connection pool pods is 2, ideally they have to be distributed between different nodes/AZ, but it's not enforced in the operator itself. Available configuration supposed to be ergonomic and in the normal case require minimum changes to a manifest to enable connection pool. To have more control over the configuration and functionality on the pool side one can customize the corresponding docker image. Co-authored-by: Felix Kunde <felix-kunde@gmx.de>	2020-03-25 12:57:26 +01:00
Felix Kunde	b66734a0a9	omit PgVersion diff on sync (#860 ) * use PostgresParam.PgVersion everywhere * on sync compare pgVersion with SpiloConfiguration * update getNewPgVersion and added tests	2020-03-13 11:48:19 +01:00
Felix Kunde	3b10dc645d	patch/update services on type change (#824 ) * use Update when disabling LoadBalancer + added e2e test	2020-02-13 16:24:15 +01:00
Felix Kunde	1f0312a014	make minimum limits boundaries configurable (#808 ) * make minimum limits boundaries configurable * add e2e test	2020-02-03 11:43:18 +01:00
Felix Kunde	cd110aabf4	Enforce minimum cpu and memory limits (#731 ) * add validation for PG resources and volume size * check resource requests also on UPDATE and SYNC + update docs * if cluster was running don't error on sync	2019-12-12 16:43:55 +01:00
Rafia Sabih	540d58d5bd	Adding the support for standby cluster This will set up a continuous wal streaming cluster, by adding the corresponding section in postgres manifest. Instead of having a full-fledged standby cluster as in Patroni, here we use only the wal path of the source cluster and stream from there. Since, standby cluster is streaming from the master and does not require to create or use databases of it's own. Hence, it bypasses the creation of users or databases. There is a separate sample manifest added to set up a standby-cluster.	2019-06-21 10:11:39 +02:00
Felix Kunde	6918394562	Add PDB configuration toggle (#583 ) * Don't create an impossible disruption budget for smaller clusters. * sync PDB also on update	2019-06-18 10:48:21 +02:00
Sergey Dudoladov	f3e1e80aaf	Add logical backup (#442 ) * Add k8s cron job to spawn logical backups * Minor doc updates	2019-05-16 15:52:01 +02:00
Felix Kunde	0fbfbb23bb	Use /status subresource instead of plain manifest field (#534 ) * turns PostgresStatus type into a struct with field PostgresClusterStatus * setStatus patch target is now /status subresource * unmarshalling PostgresStatus takes care of previous status field convention * new simple bool functions status.Running(), status.Creating()	2019-05-07 12:01:45 +02:00
Felix Kunde	31e568157b	reflect change in github url (#496 ) Project was moved from the incubator to the Zalando main org, hence the rename	2019-02-25 11:26:55 +01:00
zerg-junior	7907f95d2f	Improve reporting about rolling updates (#391 )	2018-09-24 11:57:43 +02:00
Oleksii Kliukin	e1ed4b847d	Use code-generation for CRD API and deepcopy methods (#369 ) Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on. Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API. All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1 Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator. Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it. Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes. Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files. Update client-go dependencies. Minor reformatting and renaming.	2018-08-15 17:22:25 +02:00
Oleksii Kliukin	b06186eb41	Linter-induced code refactoring, run round 2. (#360 ) Run more linters in the gometalinter, i.e. deadcode, megacheck, nakedret, dup. More consistent code formatting, remove two dead functions, eliminate naked a bunch of naked returns, refactor a few functions to avoid code duplication.	2018-08-06 12:09:19 +02:00
Oleksii Kliukin	59f0c5551e	Allow configuring pod priority globally and per cluster. (#353 ) * Allow configuring pod priority globally and per cluster. Allow to specify pod priority class for all pods managed by the operator, as well as for those belonging to individual clusters. Controlled by the pod_priority_class_name operator configuration parameter and the podPriorityClassName manifest option. See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass for the explanation on how to define priority classes since Kubernetes 1.8. Some import order changes are due to go fmt. Removal of OrphanDependents deprecated field. Code review by @zerg-junior	2018-08-03 14:03:37 +02:00
Oleksii Kliukin	ac7b132314	Refactoring inspired by gometalinter. (#357 ) Among other things, fix a few issues with deepcopy implementation.	2018-08-03 11:09:45 +02:00
Oleksii Kliukin	d2d3f21dc2	Client go upgrade v6 (#352 ) There are shortcuts in this code, i.e. we created the deepcopy function by using the deepcopy package instead of the generated code, that will be addressed once migrated to client-go v8. Also, some objects, particularly statefulsets, are still taken from v1beta, this will also be addressed in further commits once the changes are stabilized.	2018-08-01 11:08:01 +02:00
zerg-junior	7394c15d0a	Make AWS region configurable in the operator cofig map (#333 )	2018-06-27 17:29:02 +02:00
Oleksii Kliukin	48a5744314	Use Patroni API to set bootstrap-only options. (#299 ) Call Patroni API /config in order to set special options that are ignored when set in the configuration file, such as max_connections. Per https://github.com/zalando-incubator/postgres-operator/issues/297 * Some minor refacoring: Rename Cluster ManualFailover to Swithover Rename Patroni Failover to Switchover Add more details to error messages and comments introduced in this PR. Review by @zerg-junior	2018-05-29 12:35:25 +02:00
Oleksii Kliukin	88d6a7be3f	Sync persistent volumes before statefulsets. (#293 ) Avoid the condition of waiting for the pod that cannot start PostgreSQL because it ran out of disk space.	2018-05-18 12:01:43 +02:00
Oleksii Kliukin	11d568bf65	Address code review by @zerg-junior - new info messages, rename the annotation flag.	2018-05-15 16:50:03 +02:00
Oleksii Kliukin	332dab5237	Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations	2018-05-08 14:51:10 +02:00
Oleksii Kliukin	f41a42f922	Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations	2018-05-07 10:16:30 +02:00
Oleksii Kliukin	ce0d4af91c	Initial implementation for the statefulset annotations indicating rolling updates.	2018-05-07 08:07:37 +02:00
Oleksii Kliukin	1a20362c5b	Initial implementation for the statefulset annotations indicating rolling updates.	2018-05-04 18:59:23 +02:00
Oleksii Kliukin	43a1db2128	Merge branch 'master' into pending_rolling_updates	2018-05-03 11:27:16 +02:00
Oleksii Kliukin	37caa3f60b	Fix a bug with syncing services Avoid showing "there is no service in the cluster" when syncing a service for the cluster if the operator has been restarted after the cluster had been created.	2018-04-27 12:35:25 +02:00
Oleksii Kliukin	0618723a61	Check rolling updates using controller revisions. Compare pods controller revisions with the one for the statefulset to determine whether the pod is running the latest revision and, therefore, no rolling update is necessary. This is performed only during the operator start, afterwards the rolling update status that is stored locally in the cluster structure is used for all rolling update decisions.	2018-04-09 18:07:24 +02:00
Sergey Dudoladov	fb21246fcd	Remove early stopping conditions that rely on the relica service being absent	2018-02-27 17:21:51 +01:00

1 2 3

102 Commits