postgres-operator

Commit Graph

Author	SHA1	Message	Date
Oleksii Kliukin	25a306244f	Support for per-cluster and operator global sidecars (#331 ) * Define sidecars in the operator configuration. Right now only the name and the docker image can be defined, but with the help of the pod_environment_configmap parameter arbitrary environment variables can be passed to the sidecars. * Refactoring around generatePodTemplate. Original implementation of per-cluster sidecars by @theRealWardo Per review by @zerg-junior and @Jan-M	2018-07-02 16:25:27 +02:00
zerg-junior	7394c15d0a	Make AWS region configurable in the operator cofig map (#333 )	2018-06-27 17:29:02 +02:00
Oleksii Kliukin	9cb48e0889	Document operator configuration parameters. (#313 )	2018-06-08 13:21:57 +02:00
Oleksii Kliukin	04b660519a	Fix exec into pods to resize volumes for multi-container pods. The original code assumed only one container per pod.	2018-06-04 14:51:39 +02:00
Oleksii Kliukin	48a5744314	Use Patroni API to set bootstrap-only options. (#299 ) Call Patroni API /config in order to set special options that are ignored when set in the configuration file, such as max_connections. Per https://github.com/zalando-incubator/postgres-operator/issues/297 * Some minor refacoring: Rename Cluster ManualFailover to Swithover Rename Patroni Failover to Switchover Add more details to error messages and comments introduced in this PR. Review by @zerg-junior	2018-05-29 12:35:25 +02:00
Sergey Dudoladov	2e041c50e6	Bump up default Spilo image	2018-05-28 16:54:27 +02:00
Manuel Gómez	32a1456a68	Update config.go	2018-05-24 16:58:46 +02:00
Sergey Dudoladov	749d723f55	Shorten the commen	2018-05-24 16:22:13 +02:00
Sergey Dudoladov	9824ddae5e	Fix etcd_host default	2018-05-24 16:05:45 +02:00
Oleksii Kliukin	11d568bf65	Address code review by @zerg-junior - new info messages, rename the annotation flag.	2018-05-15 16:50:03 +02:00
Oleksii Kliukin	0c616a802f	Merge branch 'master' into rolling_updates_with_statefulset_annotations # Conflicts: # pkg/cluster/k8sres.go	2018-05-15 15:33:34 +02:00
Oleksii Kliukin	987b43456b	Deprecate old LB options, fix endpoint sync. (#287 ) * Depreate old LB options, fix endpoint sync. - deprecate useLoadBalancer, replicaLoadBalancer from the manifest and enable_load_balancer from the operator configuration. The old operator configuration options become no-op with this commit. For the old manifest options, `useLoadBalancer` and `replicaLoadBalancer` are still consulted, but only in the absense of the new ones (enableMasterLoadBalancer and enableReplicaLoadBalancer). - Make sure the endpoint being created during the sync receives proper addresses subset. This is more critical for the replicas, as for the masters Patroni will normally re-create the endpoint before the operator. - Avoid creating the replica endpoint, since it will be created automatically by the corresponding service. - Update the README and unit tests. Code review by @mgomezch and @zerg-junior	2018-05-15 15:19:18 +02:00
Oleksii Kliukin	332dab5237	Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations	2018-05-08 14:51:10 +02:00
Sergey Dudoladov	59ded0c212	Shorten bucket name	2018-05-02 14:05:57 +02:00
Sergey Dudoladov	c45219bafa	Set up an S3 bucket for the postgres daily logs	2018-05-02 12:52:42 +02:00
Sergey Dudoladov	d99b553ec1	Convert default account definiton into JSON	2018-04-25 12:35:16 +02:00
Sergey Dudoladov	e3f7fac443	Comment on the default value for pod service account name	2018-04-24 15:41:28 +02:00
Sergey Dudoladov	485ec4b8ea	Move service account to Controller	2018-04-24 15:13:08 +02:00
Sergey Dudoladov	c31c76281c	Make operator unaware of its own service account	2018-04-23 14:38:20 +02:00
Sergey Dudoladov	bd51d2922b	Turn ServiceAccount into struct value to avoid race conditon during account creation	2018-04-20 13:05:05 +02:00
Sergey Dudoladov	214ae04aa7	Deploy service account for pod creation on demand	2018-04-18 16:20:20 +02:00
Sergey Dudoladov	96d46252f5	Change the default values to closer match previous behaviour	2018-03-26 11:43:46 +02:00
Sergey Dudoladov	a8862aeee1	Enable backward compatibility for enable_load_balancer setting from operator configmap	2018-03-19 17:19:50 +01:00
Sergey Dudoladov	145689c950	Disable load balancer for master service by default (it may cost money)	2018-03-16 13:18:13 +01:00
Sergey Dudoladov	0986e56226	Add separate params for master and replica load balancers to operator configuration	2018-03-14 12:12:28 +01:00
Dmitry Dolgov	bf4b0f0f33	Merge pull request #240 from zalando-incubator/feature/goreport-improvements Some improvements for golint, ineffassign and misspell	2018-02-22 11:31:08 +01:00
Oleksii Kliukin	cca73e30b7	Make code around recreating pods and creating objects in the database less brittle (#213 ) There used to be a masterLess flag that was supposed to indicate whether the cluster it belongs to runs without the acting master by design. At some point, as we didn't really have support for such clusters, the flag has been misused to indicate there is no master in the cluster. However, that was not done consistently (a cluster without all pods running would never be masterless, even when the master is not among the running pods) and it was based on the wrong assumption that the masterless cluster will remain masterless until the next attempt to change that flag, ignoring the possibility of master coming up or some node doing a successful promotion. Therefore, this PR gets rid of that flag completely. When the cluster is running with 0 instances, there is obviously no master and it makes no sense to create any database objects inside the non-existing master. Therefore, this PR introduces an additional check for that. recreatePods were assuming that the roles of the pods recorded when the function has stared will not change; for instance, terminated replica pods should start as replicas. Revisit that assumption by looking at the actual role of the re-spawned pods; that avoids a failover if some replica has promoted to the master role while being re-spawned. In addition, if the failover from the old master was unsuccessful, we used to stop and leave the old master running on an old pod, without recording this fact anywhere. This PR makes the failover failure emit a warning, but not stop recreating the last master pod; in the worst case, the running master will be terminated, however, this case is rather unlikely one. As a side effect, make waitForPodLabel return the pod definition it waited for, avoiding extra API calls in recreatePods and movePodFromEndOfLifeNode	2018-02-22 10:42:05 +01:00
Oleksii Kliukin	85f7c944c2	Improve the condition check.	2018-02-22 10:13:46 +01:00
Sergey Dudoladov	e048328d6a	Comment on special values for watched namespace	2018-02-20 17:26:17 +01:00
Sergey Dudoladov	dcfc9925f6	Respond to code review	2018-02-20 14:43:02 +01:00
Dmitrii Dolgov	a7cd859919	Some improvements for golint, ineffassign and misspell	2018-02-19 17:46:31 +01:00
Sergey Dudoladov	088bf70e7d	Merge branch 'master' into support-many-namespaces	2018-02-16 15:06:10 +01:00
Sergey Dudoladov	06fd9e33f5	Watch the namespace where operator deploys to unless told otherwise	2018-02-13 18:17:47 +01:00
Dmitrii Dolgov	4c1db33c27	Change the order of arguments	2018-02-08 10:43:27 +01:00
Sergey Dudoladov	de2a028592	Warn if the watched namespace does not exist	2018-02-07 17:43:05 +01:00
Dmitrii Dolgov	dd79fcd036	Tests for retry_utils One can argue about how necessary they are, but at least I remembered how to do golang.	2018-02-07 17:04:43 +01:00
Sergey Dudoladov	74fa7b9492	Restrict operator to single watched namespace via env var	2018-02-07 16:44:49 +01:00
Sergey Dudoladov	ea84f9d577	Rename the configmap 'namespace' entry to avoid confusion with the map's owm namespace	2018-02-06 15:09:00 +01:00
Oleksii Kliukin	b90a36c909	Set node_readiness_label default to an empty value. (#204 ) Previously, it was set to the lifecycle-status:ready, breaking a lot of minikube deployments. Also it was not possible befor to run with this label set to an empty value. Document the effect of the label in the new section of the documentation.	2018-01-16 15:43:03 +01:00
Oleksii Kliukin	8e99518eeb	Improve behavior on node decomissionining (#184 ) * Trigger the node migration on the lack of the readiness label. * Examine the node's readiness status on node add. Make sure we don't miss the not ready node, especially when the operator is killed during the migration.	2018-01-04 11:53:15 +01:00
Manuel Gómez	15c278d4e8	Scalyr agent sidecar for log shipping (#190 ) * Scalyr agent sidecar for log shipping * Remove the default for the Scalyr image Now the image needs to be specified explicitly to enable log shipping to Scalyr. This removes the problem of having to generate the config file or publish our agent image repository. * Add configuration variable for Scalyr server URL Defaults to the EU address. * Alter style Newlines are cheap and make code easier to edit/refactor, but ok. * Fix StatefulSet comparison logic I broke it when I made the comparison consider all containers in the PostgreSQL pod.	2017-12-21 15:34:26 +01:00
Oleksii Kliukin	bf80f5225e	Introduce higher and lower bounds for the number of instances (#178 ) * Introduce higher and lower bounds for the number of instances Reduce the number of instances to the min_instances if it is lower and to the max_instances if it is higher. -1 for either of those means there is no lower or upper bound. In addition, terminate the operator when there is a nonsense in the configuration (i.e. max_instances < min_instances). Reviewed by Jan Mußler and Sergey Dudoladov.	2017-12-15 16:02:50 +01:00
Georg Kunz	e8d9c75949	Allow custom Postgres pod environment variables	2017-12-14 14:39:33 +01:00
Oleksii Kliukin	87bc47d8d0	Fixes for the case of re-creating the cluster after deletion. - make sure that the secrets for the system users (superuser, replication) are not deleted when the main cluster is. Therefore, we can re-create the cluster, potentially forcing Patroni to restore it from the backup and enable Patroni to connect, since it will use the old password, not the newly generated random one. - when syncing users, always check whether they are already in the DB. Previously, we did this only for the sync cluster case, but the new cluster could be actually the one restored from the backup by Patroni, having all or some of the users already in place. - delete endponts last. Patroni uses the $clustername endpoint in order to store the leader related metadata. If we remove it before removing all pods, one of those pods running Patroni will re-create it and the next attempt to create the cluster with the same name will stuble on the existing endpoint. - Use db.Exec instead of db.Query for queries that expect no result. This also fixes the issue with the DB creation, since we didn't release an empty Row object it was not possible to create more than one database for a cluster.	2017-12-13 16:49:00 +01:00
Oleksii Kliukin	1fb8cf7ea0	Avoid overwriting critical users. (#172 ) * Avoid overwriting critical users. Disallow defining new users either in the cluster manifest, teams API or infrastructure roles with the names mentioned in the new protected_role_names parameter (list of comma-separated names) Additionally, forbid defining a user with the name matching either super_username or replication_username, so that we don't overwrite system roles required for correct working of the operator itself. Also, clear PostgreSQL roles on each sync first in order to avoid using the old definitions that are no longer present in the current manifest, infrastructure roles secret or the teams API.	2017-12-05 14:27:12 +01:00
Oleksii Kliukin	637921cdee	Tests for initHumanUsers and initinitRobotUsers. Change the Cluster class in the process to implelement Teams API calls and Oauth token fetches as interfaces, so that we can mock them in the tests.	2017-12-04 10:49:25 +01:00
Oleksii Kliukin	dd0affc390	Tweak our reaction to the cluster upgrade process. Previously, the operator started to move the pods off the nodes to be decomissioned by watching the eol_node_label value. Every new postgres pod has been created with the anti-affinity to that label, making sure that the pods being moved won't land on another to be decomissioned node. The changes introduce another label that indicates the ready node. The new pod affinity will esnure that the pod is only scheduled to the node marked as ready, discarding the previous anti-affinity. That way the nodes can transition from the pending-decomission to the other statuses (drained, terminating) without having pods suddently scaled to them. In addition, rename the label that triggers the start of the upgrade process to node_eol_label (for consistency with node_readiness_label) and set its default vvalue to lifecycle-status:pending-decomission.	2017-11-30 14:11:49 +01:00
Oleksii Kliukin	1ffe98ba9f	Fix the connection leak and user options sync. - fix the lack of closing the cursor for the query that returned no rows. - fix syncing of the user options, as previously those were not fetched from the database.	2017-11-27 16:46:34 +01:00
Oleksii Kliukin	086ead03f5	Warn about attempts to use escape quotes.	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	975b21f633	Rename api roles configuration parameter. Change api_roles_configuration to team_api_role_configuration	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	6b2f5071f7	Special case for search_path in user options. - search_path accepts a list of values that cannot be quoted, as quoting would make PostgreSQL interpret the result as a single value. Since we require quoting of values with commas in the operator's configMap in order to avoid confusing them with the separate map entities, we need to strip those quotes before passing the value to PostgreSQL. - make ftm run	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	2079d811b4	Add tests for the string splitting function.	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	e95f80e351	Make configMap marshaling code aware of quotes. A value in a configMap that is a map itself (a key:value string separated by commas) may include commans inside quotes (i.e. search_path:"public,"$user"). The changes make marshaling code process such cases correctly.	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	2352fc9a39	go fmt run	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	71f57c9fe3	Fix escaping of parameter values and extra spaces. - document the newly introduced option (for now in the main README) - make query error output more readable.	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	415a7fdc4d	Allow global configuration options for API roles. Add options to the PgUser structure, potentially allowing to set per-role options in the cluster definition as well. Introduce api_roles_configuration operator option with the default of log_statement=all	2017-11-22 10:43:35 +01:00
Oleksii Kliukin	c25e849fe4	Fix a failure to create new statefulset at sync. Also do a fmt run.	2017-11-08 18:24:17 +01:00
Murat Kabilov	86803406db	use sync methods while updating the cluster	2017-11-03 12:00:43 +01:00
Georg Kunz	47dd766fa7	Add node toleration config to PodSpec (#151 ) * Add node toleration config to PodSpec This allows to taint nodes dedicated to Postgres and prevents other pods from running on these nodes. * Document taint and toleration setup And remove setting from default operator ConfigMap * Allow to overwrite tolerations with Postgres manifest	2017-11-02 19:10:44 +01:00
Oleksii Kliukin	eba23279c8	Kube cluster upgrade	2017-10-19 10:49:42 +02:00
Murat Kabilov	202f2de988	Retry connecting to pg	2017-10-17 17:03:50 +02:00
Murat Kabilov	6c4cb4e9da	Perform manual failover during the scale down	2017-10-16 17:41:23 +02:00
Murat Kabilov	5b29576a8e	Remove redundant constants	2017-10-16 15:52:48 +02:00
Jan Mussler	cec695d48e	Superuser toggle for team members Make superuser toggleable for team members. Add and "admin" role to team members if superuser is disabled.	2017-10-12 15:01:54 +02:00
Murat Kabilov	83c8d6c419	Extend diagnostic api with worker status info	2017-10-11 12:26:09 +02:00
Murat Kabilov	2f3bb1e265	set the proper name for the crd related constants file	2017-10-09 11:01:46 +02:00
Murat Kabilov	a35e9c6119	move from tpr to crd	2017-10-06 15:12:08 +02:00
Murat Kabilov	93d4bf2b55	Merge branch 'master' into api-improvements	2017-09-26 14:47:13 +02:00
Murat Kabilov	9a66e09b88	cluster history api endpoint	2017-09-26 14:30:45 +02:00
Murat Kabilov	ed476ae85d	add missing comment for the method	2017-09-26 13:39:13 +02:00
Murat Kabilov	c44cfff988	add Diff util method	2017-09-26 13:13:15 +02:00
Murat Kabilov	c67f06956e	fix comments for ringlogger	2017-09-26 13:12:38 +02:00
Murat Kabilov	d876f4d88e	set secret name template via config map	2017-09-18 14:25:09 +02:00
Oleksii Kliukin	7667847bfe	Feature/validate role options (#101 ) Be more rigorous about validating user flags. Only accept CREATE ROLE flags that doesn't have any params (i.e. not ADMIN or CONNECTION LIMIT). Check that both flag and NOflag are not used at the same time.	2017-09-15 13:57:48 +02:00
Oleksii Kliukin	8b85935a7a	Allow cloning clusters from the operator. (#90 ) Allow cloning clusters from the operator. The changes add a new JSON node `clone` with possible values `cluster` and `timestamp`. `cluster` is mandatory, and setting a non-empty `timestamp` triggers wal-e point in time recovery. Spilo and Patroni do the whole heavy-lifting, the operator just defines certain variables and gathers some data about how to connect to the host to clone or the target S3 bucket. As a minor change, set the image pull policy to IfNotPresent instead of Always to simplify local testing. Change the default replication username to standby.	2017-09-08 16:47:03 +02:00
Murat Kabilov	8aa11ecee2	Add patroni api client	2017-08-30 16:01:18 +02:00
Murat Kabilov	71dfb33b2b	make pod termination grace period configurable	2017-08-18 16:38:25 +02:00
Murat Kabilov	d2828e5ece	remove var shading; fix imports	2017-08-15 15:59:10 +02:00
Murat Kabilov	38e0ffecf7	make controllerinformer interface private; use named regexp groups	2017-08-15 14:07:16 +02:00
Murat Kabilov	82d5583809	add diagnostic api http server	2017-08-15 12:20:09 +02:00
Murat Kabilov	51fdfb90f7	log cluster and controller events in the ringlog via logrus hook	2017-08-15 12:16:09 +02:00
Murat Kabilov	4ee28e3818	add ringlog	2017-08-15 11:59:09 +02:00
Murat Kabilov	606d000022	fix test	2017-08-15 10:41:04 +02:00
Murat Kabilov	5470f20be4	always pass a cluster name as a logger field	2017-08-15 10:29:18 +02:00
Murat Kabilov	e26db66cb5	start all the log messages with lowercase letters	2017-08-15 10:12:36 +02:00
Oleksii Kliukin	8b58782a4a	fix pam_role_name parameter name.	2017-08-02 17:55:06 +02:00
Murat Kabilov	cf663cb841	Fix golint warnings	2017-08-01 16:08:56 +02:00
Murat Kabilov	1211220208	Skip running empty set of queries	2017-08-01 10:09:09 +02:00
Murat Kabilov	1f8b37f33d	Make use of kubernetes client-go v4 * client-go v4.0.0-beta0 * remove unnecessary methods for tpr object * rest client: use interface instead of structure pointer * proper names for constants; some clean up for log messages * remove teams api client from controller and make it per cluster	2017-07-25 15:25:17 +02:00
Oleksii Kliukin	4455f1b639	Feature/unit tests (#53 ) - Avoid relying on Clientset structure to call Kubernetes API functions. While Clientset is a convinient "catch-all" abstraction for calling REST API related to different Kubernetes objects, it's impossible to mock. Replacing it wih the kubernetes.Interface would be quite straightforward, but would require an exra level of mocked interfaces, because of the versioning. Instead, a new interface is defined, which contains only the objects we need of the pre-defined versions. - Move KubernetesClient to k8sutil package. - Add more tests.	2017-07-24 16:56:46 +02:00
Murat Kabilov	4f36e447c3	Skip config params with no values (#62 )	2017-07-14 17:22:25 +02:00
Oleksii Kliukin	00150711e4	Configure load balancer on a per-cluster and operator-wide level (#57 ) * Deny all requests to the load balancer by default. * Operator-wide toggle for the load-balancer. * Define per-cluster useLoadBalancer option. If useLoadBalancer is not set - then operator-wide defaults take place. If it is true - the load balancer is created, otherwise a service type clusterIP is created. Internally, we have to completely replace the service if the service type changes. We cannot patch, since some fields from the old service that will remain after patch are incompatible with the new one, and handling them explicitly when updating the service is ugly and error-prone. We cannot update the service because of the immutable fields, that leaves us the only option of deleting the old service and creating the new one. Unfortunately, there is still an issue of unnecessary removal of endpoints associated with the service, it will be addressed in future commits. * Revert the unintended effect of go fmt * Recreate endpoints on service update. When the service type is changed, the service is deleted and then the one with the new type is created. Unfortnately, endpoints are deleted as well. Re-create them afterwards, preserving the original addresses stored in them. * Improve error messages and comments. Use generate instead of gen in names.	2017-06-30 13:38:49 +02:00
Murat Kabilov	9a6b0b8c37	Tests for teams API (#46 )	2017-06-12 17:29:32 +02:00
Oleksii Kliukin	987990fb0e	Move service annotation patch template into the constants.	2017-06-12 10:24:23 +02:00
Murat Kabilov	1540a2ba65	fix typos; remove unnecessary tests; go fmt -s	2017-06-08 15:52:01 +02:00
Murat Kabilov	e104a67260	Fix resync of the clusters	2017-06-08 11:51:48 +02:00
Murat Kabilov	bdc2db97ac	Tests for Specs and Teams API	2017-06-08 10:58:48 +02:00
Oleksii Kliukin	bc0e9ab4bc	Add error checks per report from errcheck-ng	2017-06-08 10:41:44 +02:00
Oleksii Kliukin	dc36c4ca12	Implement replicaLoadBalancer boolean flag. (#38 ) The flag adds a replica service with the name cluster_name-repl and a DNS name that defaults to {cluster}-repl.{team}.{hostedzone}. The implementation converted Service field of the cluster into a map with one or two elements and deals with the cases when the new flag is changed on a running cluster (the update and the sync should create or delete the replica service). In order to pick up master and replica service and master endpoint when listing cluster resources. * Update the spec when updating the cluster.	2017-06-07 13:54:17 +02:00
Oleksii Kliukin	7b0ca31bfb	Implements EBS volume resizing #35 . In order to support volumes different from EBS and filesystems other than EXT2/3/4 the respective code parts were implemented as interfaces. Adding the new resize for the volume or the filesystem will require implementing the interface, but no other changes in the cluster code itself. Volume resizing first changes the EBS and the filesystem, and only afterwards is reflected in the Kubernetes "PersistentVolume" object. This is done deliberately to be able to check if the volume needs resizing by peeking at the Size of the PersistentVolume structure. We recheck, nevertheless, in the EBSVolumeResizer, whether the actual EBS volume size doesn't match the spec, since call to the AWS ModifyVolume is counted against the resize limit of once every 6 hours, even for those calls that shouldn't result in an actual resize (i.e. when the size matches the one for the running volume). As a collateral, split the constants into multiple files, move the volume code into a separate file and fix minor issues related to the error reporting.	2017-06-06 13:53:27 +02:00

1 2 3 4 5

206 Commits