postgres-operator

Commit Graph

Author	SHA1	Message	Date
Felix Kunde	9a11e85d57	disable PostgresTeam by default (#1186 ) * disable PostgresTeam by default * fix version in chart	2020-10-28 17:51:37 +01:00
Felix Kunde	d658b9672e	PostgresTeam CRD for advanced team management (#1165 ) * PostgresTeamCRD for advanced team management * rework internal structure to be closer to CRD * superusers instead of admin * add more util functions and unit tests * fix initHumanUsers * check for superusers when creating normal teams * polishing and fixes * adding the essential missing pieces * add documentation and update rbac * reflect some feedback * reflect more feedback * fixing debug logs and raise QueueResyncPeriodTPR * add two more flags to disable CRD and its superuser support * fix chart * update go modules * move to client 1.19.3 and update codegen	2020-10-28 10:40:10 +01:00
Jan Mussler	3a86dfc8bb	End 2 End tests speedup (#1180 ) * Improving end 2 end tests, especially speed of execution and error, by implementing proper eventual asserts and timeouts. * Add documentation for running individual tests * Fixed String encoding in Patorni state check and error case * Printing config as multi log line entity, makes it readable and grepable on startup * Cosmetic changes to logs. Removed quotes from diff. Move all object diffs to text diff. Enabled padding for log level. * Mount script with tools for easy logaccess and watching objects. * Set proper update strategy for Postgres operator deployment. * Move long running test to end. Move pooler test to new functions. * Remove quote from valid K8s identifiers.	2020-10-28 10:04:33 +01:00
刘新	a8bfe4eb87	Remove repeated initialization of Pod ServiceAccount (#1164 ) Co-authored-by: xin.liu <xin.liu@woqutech.com>	2020-10-20 14:18:22 +02:00
Alex Stockinger	692c721854	Introduce ENABLE_JSON_LOGGING env variable (#1158 )	2020-10-08 15:32:15 +02:00
Felix Kunde	3ddc56e5b9	allow delete only if annotations meet configured criteria (#1069 ) * define annotations for delete protection * change log level and reduce log lines for e2e tests * reduce wait_for_pod_start even further	2020-08-13 16:36:22 +02:00
Dmitry Dolgov	7cf2fae6df	[WIP] Extend infrastructure roles handling (#1064 ) Extend infrastructure roles handling Postgres Operator uses infrastructure roles to provide access to a database for external users e.g. for monitoring purposes. Such infrastructure roles are expected to be present in the form of k8s secrets with the following content: inrole1: some_encrypted_role password1: some_encrypted_password user1: some_entrypted_name inrole2: some_encrypted_role password2: some_encrypted_password user2: some_entrypted_name The format of this content is implied implicitly and not flexible enough. In case if we do not have possibility to change the format of a secret we want to use in the Operator, we need to recreate it in this format. To address this lets make the format of secret content explicitly. The idea is to introduce a new configuration option for the Operator. infrastructure_roles_secrets: - secretname: k8s_secret_name userkey: some_encrypted_name passwordkey: some_encrypted_password rolekey: some_encrypted_role - secretname: k8s_secret_name userkey: some_encrypted_name passwordkey: some_encrypted_password rolekey: some_encrypted_role This would allow Operator to use any avalable secrets to prepare infrastructure roles. To make it backward compatible simulate the old behaviour if the new option is not present. The new configuration option is intended be used mainly from CRD, but it's also available via Operator ConfigMap in a limited fashion. For ConfigMap one can put there only a string with one secret definition in the following format (as a string): infrastructure_roles_secrets: \| secretname: k8s_secret_name, userkey: some_encrypted_name, passwordkey: some_encrypted_password, rolekey: some_encrypted_role Note than only one secret could be specified this way, no multiple secrets are allowed. Eventually the resulting list of infrastructure roles would be a total sum of all supported ways to describe it, namely legacy via infrastructure_roles_secret_name and infrastructure_roles_secrets from both ConfigMap and CRD.	2020-08-05 14:18:56 +02:00
Felix Kunde	0c6655a22d	skip creation later to improve visibility of errors (#1013 ) * try to emit error for missing team name in cluster name * skip creation after new cluster object * move SetStatus to k8sclient and emit event when skipping creation and rename to SetPostgresCRDStatus Co-authored-by: Felix Kunde <felix.kunde@zalando.de>	2020-06-17 13:32:16 +02:00
Felix Kunde	865d5b41a7	set event broadcasting to Infof and update rbac (#952 )	2020-04-29 17:26:46 +02:00
Björn Fischer	168abfe37b	Fully speced global sidecars (#890 ) * implement fully speced global sidecars * fix issue #924	2020-04-27 17:40:22 +02:00
Christian Rohmann	21b9b6fcbe	Emit K8S events to the postgresql CR as feedback to the requestor / user (#896 ) * Add EventsGetter to KubeClient to enable to sending K8S events * Add eventRecorder to the controller, initialize it and hand it down to cluster via its constructor to enable it to emit events this way * Add first set of events which then go to the postgresql custom resource the user interacts with to provide some feedback * Add right to "create" events to operator cluster role * Adapt cluster tests to new function sigurature with eventRecord (via NewFakeRecorder) * Get a proper reference before sending events to a resource Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>	2020-04-27 08:22:07 +02:00
Felix Kunde	66f2cda87f	Move operator to go 1.14 (#882 ) * update go modules march 2020 * update to GO 1.14 * reflect k8s client API changes	2020-03-30 15:50:17 +02:00
Felix Kunde	cf829df1a4	define ownership between operator and clusters via annotation (#802 ) * define ownership between operator and postgres clusters * add documentation * add unit test	2020-03-17 16:34:31 +01:00
Felix Kunde	b24da3201c	bump version to 1.4.0 + some polishing (#839 ) * bump version to 1.4.0 + some polishing * align version for UI chart * update user docs to warn for standby replicas * minor log message changes for RBAC resources	2020-02-25 09:50:54 +01:00
Felix Kunde	aea9e9bd33	postgres-pod clusterrole (#832 ) * define postgres-pod clusterrole and align rbac in chart * align UI chart rbac with operator and update doc * operator RBAC needs podsecuritypolicy to grant it to postgres-pod	2020-02-19 12:32:54 +01:00
Felix Kunde	702a194c41	switch to rbac/v1 (#829 ) * switch to rbac/v1	2020-02-17 11:25:07 +01:00
Felix Kunde	cd110aabf4	Enforce minimum cpu and memory limits (#731 ) * add validation for PG resources and volume size * check resource requests also on UPDATE and SYNC + update docs * if cluster was running don't error on sync	2019-12-12 16:43:55 +01:00
Felix Kunde	a3b34f146f	Add CRD validation (#599 ) * add CRD manifests with validation * update documentation * patroni slots is not an array but a nested hash map * make deps call tools * cover validation in docs and export it in crds.go * add toggle to disable creation of CRD validation and document it * use templated service account also for CRD-configured helm deployment	2019-11-28 12:02:05 +01:00
Dmitry Dolgov	647a4d3023	Remove service accounts cache (#685 ) For optimization purposes operator was creating a cache map to remember if service accounts and role binding was deployed to a namespace. This could lead to a problem, when a namespace was deleted, since this cache was not synchronized. For the sake of correctness remove the cache, and check every time if required service account and rbac is present. In the normal case this introduces an overhead of two API calls per an event (one to get a service accounts, one to get a role binding), which should not be a problem, unless proven otherwise.	2019-10-11 11:06:14 +02:00
Erik Inge Bolsø	6fbfee3903	decouple clusterrole name and serviceaccount name (#581 ) Decouple clusterrole name and service account name.	2019-06-14 14:24:23 +02:00
Felix Kunde	313db7d10b	set default name also for RoleBinding and roleRef (#529 )	2019-04-02 17:16:47 +02:00
Felix Kunde	31e568157b	reflect change in github url (#496 ) Project was moved from the incubator to the Zalando main org, hence the rename	2019-02-25 11:26:55 +01:00
zerg-junior	45c89b3da4	[WIP] Add set_memory_request_to_limit option (#406 ) * Add set_memory_request_to_limit option	2018-11-15 14:00:08 +01:00
Noah Kantrowitz	688d252752	Some tweaks to ensure compat with newer Go. (#383 )	2018-09-17 10:13:07 +02:00
Noah Kantrowitz	0b75a89920	Fix the casing of github.com/Sirupsen/logrus to match what the project itself uses. (#380 ) Dep enforces this.	2018-09-06 10:26:48 +02:00
Oleksii Kliukin	e1ed4b847d	Use code-generation for CRD API and deepcopy methods (#369 ) Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on. Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API. All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1 Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator. Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it. Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes. Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files. Update client-go dependencies. Minor reformatting and renaming.	2018-08-15 17:22:25 +02:00
Oleksii Kliukin	199aa6508c	Populate list of clusters in the controller at startup. (#364 ) Assign the list of clusters in the controller with the up-to-date list of Postgres manifests on Kubernetes during the startup. Node migration routines launched asynchronously to the cluster processing rely on an up-to-date list of clusters in the controller to detect clusters affected by the migration of the node and lock them when doing migration of master pods. Without the initial list the operator was subject to race conditions like the one described at https://github.com/zalando-incubator/postgres-operator/issues/363 Restructure the code to decouple list cluster function required by the postgresql informer from the one that emits cluster sync events. No extra work is introduced, since cluster sync already runs in a separate goroutine (clusterResync). Introduce explicit initial cluster sync at the end of acquireInitialListOfClusters instead of relying on an implicit one coming from list function of the PostgreSQL informer. Some minor refactoring. Review by @zerg-junior	2018-08-08 11:00:56 +02:00
Oleksii Kliukin	b06186eb41	Linter-induced code refactoring, run round 2. (#360 ) Run more linters in the gometalinter, i.e. deadcode, megacheck, nakedret, dup. More consistent code formatting, remove two dead functions, eliminate naked a bunch of naked returns, refactor a few functions to avoid code duplication.	2018-08-06 12:09:19 +02:00
Oleksii Kliukin	59f0c5551e	Allow configuring pod priority globally and per cluster. (#353 ) * Allow configuring pod priority globally and per cluster. Allow to specify pod priority class for all pods managed by the operator, as well as for those belonging to individual clusters. Controlled by the pod_priority_class_name operator configuration parameter and the podPriorityClassName manifest option. See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass for the explanation on how to define priority classes since Kubernetes 1.8. Some import order changes are due to go fmt. Removal of OrphanDependents deprecated field. Code review by @zerg-junior	2018-08-03 14:03:37 +02:00
Oleksii Kliukin	ac7b132314	Refactoring inspired by gometalinter. (#357 ) Among other things, fix a few issues with deepcopy implementation.	2018-08-03 11:09:45 +02:00
Oleksii Kliukin	d2d3f21dc2	Client go upgrade v6 (#352 ) There are shortcuts in this code, i.e. we created the deepcopy function by using the deepcopy package instead of the generated code, that will be addressed once migrated to client-go v8. Also, some objects, particularly statefulsets, are still taken from v1beta, this will also be addressed in further commits once the changes are stabilized.	2018-08-01 11:08:01 +02:00
Oleksii Kliukin	f27833b5eb	Fix disabling database access and teams API via command-line options. (#351 )	2018-07-27 10:24:05 +02:00
Oleksii Kliukin	0181a1b5b1	Introduce a repair scan to fix failing clusters (#304 ) A repair is a sync scan that acts only on those clusters that indicate that the last add, update or sync operation on them has failed. It is supposed to kick in more frequently than the repair scan. The repair scan still remains to be useful to fix the consequences of external actions (i.e. someone deletes a postgres-related service by mistake) unbeknownst to the operator. The repair scan is controlled by the new repair_period parameter in the operator configuration. It has to be at least 2 times more frequent than a sync scan to have any effect (a normal sync scan will update both last synced and last repaired attributes of the controller, since repair is just a sync underneath). A repair scan could be queued for a cluster that is already being synced if the sync period exceeds the interval between repairs. In that case a repair event will be discarded once the corresponding worker finds out that the cluster is not failing anymore. Review by @zerg-junior	2018-07-24 11:21:45 +02:00
zerg-junior	417f13c0bd	Submit RBAC credentials during initial Event processing (#344 ) * During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.	2018-07-19 16:40:40 +02:00
Oleksii Kliukin	3a9378d3b8	Allow configuring the operator via the YAML manifest. (#326 ) * Up until now, the operator read its own configuration from the configmap. That has a number of limitations, i.e. when the configuration value is not a scalar, but a map or a list. We use a custom code based on github.com/kelseyhightower/envconfig to decode non-scalar values out of plain text keys, but that breaks when the data inside the keys contains both YAML-special elememtns (i.e. commas) and complex quotes, one good example for that is search_path inside `team_api_role_configuration`. In addition, reliance on the configmap forced a flag structure on the configuration, making it hard to write and to read (see https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778). The changes allow to supply the operator configuration in a proper YAML file. That required registering a custom CRD to support the operator configuration and provide an example at manifests/postgresql-operator-default-configuration.yaml. At the moment, both old configmap and the new CRD configuration is supported, so no compatibility issues, however, in the future I'd like to deprecate the configmap-based configuration altogether. Contrary to the configmap-based configuration, the CRD one doesn't embed defaults into the operator code, however, one can use the manifests/postgresql-operator-default-configuration.yaml as a starting point in order to build a custom configuration. Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters used to create the CRD were taken from the operator configuration, which is not possible if the configuration itself is stored in the CRD object, I've added the ability to specify them as environment variables `CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively. Per review by @zerg-junior and @Jan-M.	2018-07-16 16:20:46 +02:00
Oleksii Kliukin	987b43456b	Deprecate old LB options, fix endpoint sync. (#287 ) * Depreate old LB options, fix endpoint sync. - deprecate useLoadBalancer, replicaLoadBalancer from the manifest and enable_load_balancer from the operator configuration. The old operator configuration options become no-op with this commit. For the old manifest options, `useLoadBalancer` and `replicaLoadBalancer` are still consulted, but only in the absense of the new ones (enableMasterLoadBalancer and enableReplicaLoadBalancer). - Make sure the endpoint being created during the sync receives proper addresses subset. This is more critical for the replicas, as for the masters Patroni will normally re-create the endpoint before the operator. - Avoid creating the replica endpoint, since it will be created automatically by the corresponding service. - Update the README and unit tests. Code review by @mgomezch and @zerg-junior	2018-05-15 15:19:18 +02:00
Sergey Dudoladov	4255e702bc	Always empty account's namespace after parsing	2018-04-25 13:57:24 +02:00
Sergey Dudoladov	d99b553ec1	Convert default account definiton into JSON	2018-04-25 12:35:16 +02:00
Sergey Dudoladov	3d0ab40d64	Explicitly warn on account name mismatch	2018-04-24 15:31:22 +02:00
Sergey Dudoladov	485ec4b8ea	Move service account to Controller	2018-04-24 15:13:08 +02:00
Sergey Dudoladov	bd51d2922b	Turn ServiceAccount into struct value to avoid race conditon during account creation	2018-04-20 13:05:05 +02:00
Sergey Dudoladov	a5a65e93f4	Name service account consistenly	2018-04-19 16:15:52 +02:00
Sergey Dudoladov	23f893647c	Remove sync of pod service accounts	2018-04-19 15:48:58 +02:00
Sergey Dudoladov	214ae04aa7	Deploy service account for pod creation on demand	2018-04-18 16:20:20 +02:00
Sergey Dudoladov	66a3b6830e	Call fatalf if namespace to watch does not exist	2018-02-20 16:13:48 +01:00
Sergey Dudoladov	dcfc9925f6	Respond to code review	2018-02-20 14:43:02 +01:00
Sergey Dudoladov	e3d2434420	Use '*' as an alias to denote all namespaces	2018-02-16 15:20:26 +01:00
Sergey Dudoladov	088bf70e7d	Merge branch 'master' into support-many-namespaces	2018-02-16 15:06:10 +01:00
Sergey Dudoladov	ec7de38f9b	Make operator watch its own namespace instead of controller's one	2018-02-16 14:22:38 +01:00
Sergey Dudoladov	5e9a21456e	Remove the incorrect service account check	2018-02-15 16:33:53 +01:00

1 2 3

106 Commits