postgres-operator

Commit Graph

Author	SHA1	Message	Date
Felix Kunde	3ce0b1e7fa	deprecate crd validation toggle and sync with manifests (#1781 ) * deprecate crd validation toggle and sync with manifests * fix description in pg crd manifests * change CRD creation strategy * affinity matchExpression has values * lower repair period in e2e tests	2022-02-18 15:04:31 +01:00
Felix Kunde	f7858ffb70	Initialize arrays of errors / error messages + minor refactoring (#1701 ) * init error arrays correctly * avoid nilPointer when syncing connectionPooler * getInfrastructureRoles should return error * fix unit tests and return type for getInfrastructureRoles	2021-11-29 12:49:12 +01:00
Felix Kunde	41858a702c	making pgTeamMap a pointer (#1349 ) * making pgTeamMap a pointer * init empty map * add e2e test for additional teams and members * update test_min_resource_limits * add more waiting in node_affinity_test * no need for pointers in map of postgresTeamMebership * another minor update on node affinity test * refactor and fix fetching additional members	2021-02-16 10:38:20 +01:00
Felix Kunde	6a97316a69	Support inherited annotations for all major objects (#1236 ) * add comments where inherited annotations could be added * add inheritedAnnotations feature * return nil if no annotations are set * minor changes * first downscaler then inherited annotations * add unit test for inherited annotations * add pvc to test + minor changes * missing comma * fix nil map assignment * set annotations in the same order it is done in other places * replace acidClientSet with acid getters in K8s client * more fixes on clientSet vs getters * minor changes * remove endpoints from annotation test * refine unit test - but deployment and sts are still empty * fix checkinng sts and deployment * make annotations setter one liners * no need for len check anymore Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>	2020-12-11 16:34:01 +01:00
Felix Kunde	b379db20ed	fix redundant appending of infrastructure roles (#1192 )	2020-11-05 12:04:51 +01:00
Felix Kunde	d76419565b	move to apiextensions from v1beta1 to v1 (#746 ) * move to apiextensions from v1beta1 to v1 * remove metadata from CRD validation * some forgotten change	2020-11-02 10:49:29 +01:00
Felix Kunde	d658b9672e	PostgresTeam CRD for advanced team management (#1165 ) * PostgresTeamCRD for advanced team management * rework internal structure to be closer to CRD * superusers instead of admin * add more util functions and unit tests * fix initHumanUsers * check for superusers when creating normal teams * polishing and fixes * adding the essential missing pieces * add documentation and update rbac * reflect some feedback * reflect more feedback * fixing debug logs and raise QueueResyncPeriodTPR * add two more flags to disable CRD and its superuser support * fix chart * update go modules * move to client 1.19.3 and update codegen	2020-10-28 10:40:10 +01:00
Felix Kunde	43163cf83b	allow using both infrastructure_roles_options (#1090 ) * allow using both infrastructure_roles_options * new default values for user and role definition * use robot_zmon as parent role * add operator log to debug * right name for old secret * only extract if rolesDefs is empty * set password1 in old infrastructure role * fix new infra rile secret * choose different role key for new secret * set memberof everywhere * reenable all tests * reflect feedback * remove condition for rolesDefs	2020-08-10 15:08:03 +02:00
Dmitry Dolgov	7cf2fae6df	[WIP] Extend infrastructure roles handling (#1064 ) Extend infrastructure roles handling Postgres Operator uses infrastructure roles to provide access to a database for external users e.g. for monitoring purposes. Such infrastructure roles are expected to be present in the form of k8s secrets with the following content: inrole1: some_encrypted_role password1: some_encrypted_password user1: some_entrypted_name inrole2: some_encrypted_role password2: some_encrypted_password user2: some_entrypted_name The format of this content is implied implicitly and not flexible enough. In case if we do not have possibility to change the format of a secret we want to use in the Operator, we need to recreate it in this format. To address this lets make the format of secret content explicitly. The idea is to introduce a new configuration option for the Operator. infrastructure_roles_secrets: - secretname: k8s_secret_name userkey: some_encrypted_name passwordkey: some_encrypted_password rolekey: some_encrypted_role - secretname: k8s_secret_name userkey: some_encrypted_name passwordkey: some_encrypted_password rolekey: some_encrypted_role This would allow Operator to use any avalable secrets to prepare infrastructure roles. To make it backward compatible simulate the old behaviour if the new option is not present. The new configuration option is intended be used mainly from CRD, but it's also available via Operator ConfigMap in a limited fashion. For ConfigMap one can put there only a string with one secret definition in the following format (as a string): infrastructure_roles_secrets: \| secretname: k8s_secret_name, userkey: some_encrypted_name, passwordkey: some_encrypted_password, rolekey: some_encrypted_role Note than only one secret could be specified this way, no multiple secrets are allowed. Eventually the resulting list of infrastructure roles would be a total sum of all supported ways to describe it, namely legacy via infrastructure_roles_secret_name and infrastructure_roles_secrets from both ConfigMap and CRD.	2020-08-05 14:18:56 +02:00
Felix Kunde	66f2cda87f	Move operator to go 1.14 (#882 ) * update go modules march 2020 * update to GO 1.14 * reflect k8s client API changes	2020-03-30 15:50:17 +02:00
Felix Kunde	a3b34f146f	Add CRD validation (#599 ) * add CRD manifests with validation * update documentation * patroni slots is not an array but a nested hash map * make deps call tools * cover validation in docs and export it in crds.go * add toggle to disable creation of CRD validation and document it * use templated service account also for CRD-configured helm deployment	2019-11-28 12:02:05 +01:00
Maxim Ivanov	44acd7e4db	Not being able to register CRD is not a fatal error (#444 ) Operator proceeds to checking if CRD is present and ready, and if not, only then it is a fatal error.	2019-06-14 16:08:29 +02:00
Felix Kunde	ad0b250b5b	patch CRD on operator update (#558 ) * patch existing CRD each time there is an operator update	2019-05-09 12:35:15 +02:00
Felix Kunde	31e568157b	reflect change in github url (#496 ) Project was moved from the incubator to the Zalando main org, hence the rename	2019-02-25 11:26:55 +01:00
Noah Kantrowitz	a4224f6063	Move CRD definitions into a formal API to allow access from other controllers. (#378 )	2018-08-31 11:20:02 +02:00
Oleksii Kliukin	e1ed4b847d	Use code-generation for CRD API and deepcopy methods (#369 ) Client-go provides a https://github.com/kubernetes/code-generator package in order to provide the API to work with CRDs similar to the one available for built-in types, i.e. Pods, Statefulsets and so on. Use this package to generate deepcopy methods (required for CRDs), instead of using an external deepcopy package; we also generate APIs used to manipulate both Postgres and OperatorConfiguration CRDs, as well as informers and listers for the Postgres CRD, instead of using generic informers and CRD REST API; by using generated code we can get rid of some custom and obscure CRD-related code and use a better API. All generated code resides in /pkg/generated, with an exception of zz_deepcopy.go in apis/acid.zalan.do/v1 Rename postgres-operator-configuration CRD to OperatorConfiguration, since the former broke naming convention in the code-generator. Moved Postgresql, PostgresqlList, OperatorConfiguration and OperatorConfigurationList and other types used by them into Change the type of the Error field in the Postgresql crd to a string, so that client-go could generate a deepcopy for it. Use generated code to set status of CRD objects as well. Right now this is done with patch, however, Kubernetes 1.11 introduces the /status subresources, allowing us to set the status with the special updateStatus call in the future. For now, we keep the code that is compatible with earlier versions of Kubernetes. Rename postgresql.go to database.go and status.go to logs_and_api.go to reflect the purpose of each of those files. Update client-go dependencies. Minor reformatting and renaming.	2018-08-15 17:22:25 +02:00
Oleksii Kliukin	59f0c5551e	Allow configuring pod priority globally and per cluster. (#353 ) * Allow configuring pod priority globally and per cluster. Allow to specify pod priority class for all pods managed by the operator, as well as for those belonging to individual clusters. Controlled by the pod_priority_class_name operator configuration parameter and the podPriorityClassName manifest option. See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass for the explanation on how to define priority classes since Kubernetes 1.8. Some import order changes are due to go fmt. Removal of OrphanDependents deprecated field. Code review by @zerg-junior	2018-08-03 14:03:37 +02:00
Oleksii Kliukin	ac7b132314	Refactoring inspired by gometalinter. (#357 ) Among other things, fix a few issues with deepcopy implementation.	2018-08-03 11:09:45 +02:00
Oleksii Kliukin	d2d3f21dc2	Client go upgrade v6 (#352 ) There are shortcuts in this code, i.e. we created the deepcopy function by using the deepcopy package instead of the generated code, that will be addressed once migrated to client-go v8. Also, some objects, particularly statefulsets, are still taken from v1beta, this will also be addressed in further commits once the changes are stabilized.	2018-08-01 11:08:01 +02:00
Oleksii Kliukin	3a9378d3b8	Allow configuring the operator via the YAML manifest. (#326 ) * Up until now, the operator read its own configuration from the configmap. That has a number of limitations, i.e. when the configuration value is not a scalar, but a map or a list. We use a custom code based on github.com/kelseyhightower/envconfig to decode non-scalar values out of plain text keys, but that breaks when the data inside the keys contains both YAML-special elememtns (i.e. commas) and complex quotes, one good example for that is search_path inside `team_api_role_configuration`. In addition, reliance on the configmap forced a flag structure on the configuration, making it hard to write and to read (see https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778). The changes allow to supply the operator configuration in a proper YAML file. That required registering a custom CRD to support the operator configuration and provide an example at manifests/postgresql-operator-default-configuration.yaml. At the moment, both old configmap and the new CRD configuration is supported, so no compatibility issues, however, in the future I'd like to deprecate the configmap-based configuration altogether. Contrary to the configmap-based configuration, the CRD one doesn't embed defaults into the operator code, however, one can use the manifests/postgresql-operator-default-configuration.yaml as a starting point in order to build a custom configuration. Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters used to create the CRD were taken from the operator configuration, which is not possible if the configuration itself is stored in the CRD object, I've added the ability to specify them as environment variables `CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively. Per review by @zerg-junior and @Jan-M.	2018-07-16 16:20:46 +02:00
Sergey Dudoladov	485ec4b8ea	Move service account to Controller	2018-04-24 15:13:08 +02:00
Oleksii Kliukin	26db91c53e	Improve infrastructure role definitions (#208 ) Enhance definitions of infrastructure roles by allowing membership in multiple roles, role options and per-role configuration to be specified in the infrastructure role configmap, which must have the same name as the infrastructure role secret. See manifests/infrastructure-roles-configmap.yaml for the examples and updated README for the description of different types of database roles supposed by the operator and their purposes. Change the logic of merging infrastructure roles with the manifest roles when they have the same name, to return the infrastructure role unchanged instead of merging. Previously, we used to propagate flags from the manifest role to the resulting infrastructure one, as there were no way to define flags for the infrastructure role; however, this is not the case anymore. Code review and tests by @erthalion	2018-04-04 17:21:36 +02:00
Oleksii Kliukin	2bb7e98268	update individual role secrets from infrastructure roles (#206 ) * Track origin of roles. * Propagate changes on infrastructure roles to corresponding secrets. When the password in the infrastructure role is updated, re-generate the secret for that role. Previously, the password for an infrastructure role was always fetched from the secret, making any updates to such role a no-op after the corresponding secret had been generated.	2018-02-23 17:24:04 +01:00
zerg-junior	3c178f68df	Warn on infrastructure-roles.yaml format violations (#177 ) Emit a warning if there are unprocessed entries in the infrastructure-roles secret.	2017-12-15 17:21:41 +01:00
Murat Kabilov	83c8d6c419	Extend diagnostic api with worker status info	2017-10-11 12:26:09 +02:00
Murat Kabilov	32aa7270e6	Use round-robin strategy while assigning workers	2017-10-09 16:56:27 +02:00
Murat Kabilov	a35e9c6119	move from tpr to crd	2017-10-06 15:12:08 +02:00
Murat Kabilov	f77852a152	store time of the cluster event	2017-09-26 13:17:23 +02:00
Murat Kabilov	899c0bef45	Use warningf instead of warnf	2017-08-30 14:35:56 +02:00
Murat Kabilov	e26db66cb5	start all the log messages with lowercase letters	2017-08-15 10:12:36 +02:00
Murat Kabilov	1f8b37f33d	Make use of kubernetes client-go v4 * client-go v4.0.0-beta0 * remove unnecessary methods for tpr object * rest client: use interface instead of structure pointer * proper names for constants; some clean up for log messages * remove teams api client from controller and make it per cluster	2017-07-25 15:25:17 +02:00
Oleksii Kliukin	4455f1b639	Feature/unit tests (#53 ) - Avoid relying on Clientset structure to call Kubernetes API functions. While Clientset is a convinient "catch-all" abstraction for calling REST API related to different Kubernetes objects, it's impossible to mock. Replacing it wih the kubernetes.Interface would be quite straightforward, but would require an exra level of mocked interfaces, because of the versioning. Instead, a new interface is defined, which contains only the objects we need of the pre-defined versions. - Move KubernetesClient to k8sutil package. - Add more tests.	2017-07-24 16:56:46 +02:00
Oleksii Kliukin	dc36c4ca12	Implement replicaLoadBalancer boolean flag. (#38 ) The flag adds a replica service with the name cluster_name-repl and a DNS name that defaults to {cluster}-repl.{team}.{hostedzone}. The implementation converted Service field of the cluster into a map with one or two elements and deals with the cases when the new flag is changed on a running cluster (the update and the sync should create or delete the replica service). In order to pick up master and replica service and master endpoint when listing cluster resources. * Update the spec when updating the cluster.	2017-06-07 13:54:17 +02:00
Oleksii Kliukin	7b0ca31bfb	Implements EBS volume resizing #35 . In order to support volumes different from EBS and filesystems other than EXT2/3/4 the respective code parts were implemented as interfaces. Adding the new resize for the volume or the filesystem will require implementing the interface, but no other changes in the cluster code itself. Volume resizing first changes the EBS and the filesystem, and only afterwards is reflected in the Kubernetes "PersistentVolume" object. This is done deliberately to be able to check if the volume needs resizing by peeking at the Size of the PersistentVolume structure. We recheck, nevertheless, in the EBSVolumeResizer, whether the actual EBS volume size doesn't match the spec, since call to the AWS ModifyVolume is counted against the resize limit of once every 6 hours, even for those calls that shouldn't result in an actual resize (i.e. when the size matches the one for the running volume). As a collateral, split the constants into multiple files, move the volume code into a separate file and fix minor issues related to the error reporting.	2017-06-06 13:53:27 +02:00
Murat Kabilov	1fb05212a9	Refactor teams API package	2017-05-30 10:14:30 +02:00
Oleksii Kliukin	afce38f6f0	Fix error messages (#27 ) Use lowercase for kubernetes objects Use %v instead of %s for errors Start error messages with a lowercase letter.	2017-05-22 14:12:06 +02:00
Murat Kabilov	4acaf27a5d	Remove etcd requests (#25 ) update glide	2017-05-19 17:18:37 +02:00
Murat Kabilov	d34273543e	Fix the golint, gosimple warnings	2017-05-18 17:38:54 +02:00
Murat Kabilov	92d7fbf372	replace github.bus.zalan.do with github.cm/zalando-incubator	2017-05-12 11:50:16 +02:00
Murat Kabilov	fd449342e5	Use Kubernetes API instead of API group	2017-05-12 11:41:36 +02:00
Oleksii Kliukin	6983f444ed	Periodically sync roles with the running clusters. (#102 ) The sync adds or alters database roles based on the roles defined in the cluster's TPR, Team API and operator's infrastructure roles. At the moment, roles are not deleted, as it would be dangerous for the robot roles in case TPR is misconfigured. In addition, ALTER ROLE does not remove role options, i.e. SUPERUSER or CREATEROLE, neither it removes role membership: only new options are added and new role membership is granted. So far, options like NOSUPERUSER and NOCREATEROLE won't be handed correctly, when mixed with the non-negative counterparts, also NOLOGIN should be processed correctly. The code assumes that only MD5 passwords are stored in the DB and will likely break with the new SCRAM auth in PostgreSQL 10. On the implementation side, create the new interface to abstract roles merge and creation, move most of the role-based functionality from cluster/pg into the new 'users' module, strip create user code of special cases related to human-based users (moving them to init instead) and fixed the password md5 generator to avoid processing already encrypted passwords. In addition, moved the system roles off the slice containing all other roles in order to avoid extra efforts to avoid creating them. Also, fix a leak in DB connections when the new connection is not considered healthy and discarded without being closed. Initialize the database during the sync phase before syncing users.	2017-05-12 11:41:35 +02:00
Murat Kabilov	2370659c69	Parallel cluster processing Run operations concerning multiple clusters in parallel. Each cluster gets its own worker in order to create, update, sync or delete clusters. Each worker acquires the lock on a cluster. Subsequent operations on the same cluster have to wait until the current one finishes. There is a pool of parallel workers, configurable with the `workers` parameter in the configmap and set by default to 4. The cluster-related tasks are assigned to the workers based on a cluster name: the tasks for the same cluster will be always assigned to the same worker. There is no blocking between workers, although there is a chance that a single worker will become a bottleneck if too many clusters are assigned to it; therefore, for large-scale deployments it might be necessary to bump up workers from the default value.	2017-05-12 11:41:35 +02:00
Murat Kabilov	da438aab3a	Use ConfigMap to store operator's config	2017-05-12 11:41:34 +02:00
Murat Kabilov	08c0e3b6dd	Use unified type for the namespaced object names	2017-05-12 11:41:34 +02:00
Oleksii Kliukin	71b93b4cc2	Feature/infrastructure roles (#91 ) * Add infrastructure roles configured globally. Those are the roles defined in the operator itself. The operator's configuration refers to the secret containing role names, passwords and membership information. While they are referred to as roles, in reality those are users. In addition, improve the regex to filter out invalid users and make sure user secret names are compatible with DNS name spec. Add an example manifest for the infrastructure roles.	2017-05-12 11:41:33 +02:00
Murat Kabilov	101dc06acb	Better logging for teams api calls	2017-05-12 11:41:32 +02:00
Murat Kabilov	c2d2a67ad5	Get config from environment variables; ignore pg major version change; get rid of resources package;	2017-05-12 11:41:29 +02:00
Murat Kabilov	6f7399b36f	Sync clusters states * move statefulset creation from cluster spec to the separate function * sync cluster state with desired state; * move out from arrays for cluster resources; * recreate pods instead of deleting them in case of statefulset change * check for master while creating cluster/updating pods * simplify retryutil * list pvc while listing resources * name kubernetes resources with capital letter * do rolling update in case of env variables change	2017-05-12 11:41:27 +02:00
Murat Kabilov	fc127069ab	remove unnecessary ControllerNamespace	2017-05-12 11:41:26 +02:00
Murat Kabilov	ae77fa15e8	Pod Rolling update introduce Pod events channel; add parsing of the MaintenanceWindows section; skip deleting Etcd key on cluster delete; use external etcd host; watch for tpr/pods in the namespace of the operator pod only;	2017-05-12 11:41:25 +02:00

50 Commits