postgres-operator

Commit Graph

Author	SHA1	Message	Date
Oleksii Kliukin	acf46bfa62	Include CREATEROLE to the list of allowed flags. (#365 ) Previously it has been supported by the operator, but the validity check excluded it for no reason.	2018-08-08 10:53:08 +02:00
Oleksii Kliukin	b06186eb41	Linter-induced code refactoring, run round 2. (#360 ) Run more linters in the gometalinter, i.e. deadcode, megacheck, nakedret, dup. More consistent code formatting, remove two dead functions, eliminate naked a bunch of naked returns, refactor a few functions to avoid code duplication.	2018-08-06 12:09:19 +02:00
Oleksii Kliukin	59f0c5551e	Allow configuring pod priority globally and per cluster. (#353 ) * Allow configuring pod priority globally and per cluster. Allow to specify pod priority class for all pods managed by the operator, as well as for those belonging to individual clusters. Controlled by the pod_priority_class_name operator configuration parameter and the podPriorityClassName manifest option. See https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass for the explanation on how to define priority classes since Kubernetes 1.8. Some import order changes are due to go fmt. Removal of OrphanDependents deprecated field. Code review by @zerg-junior	2018-08-03 14:03:37 +02:00
Oleksii Kliukin	ac7b132314	Refactoring inspired by gometalinter. (#357 ) Among other things, fix a few issues with deepcopy implementation.	2018-08-03 11:09:45 +02:00
Oleksii Kliukin	d2d3f21dc2	Client go upgrade v6 (#352 ) There are shortcuts in this code, i.e. we created the deepcopy function by using the deepcopy package instead of the generated code, that will be addressed once migrated to client-go v8. Also, some objects, particularly statefulsets, are still taken from v1beta, this will also be addressed in further commits once the changes are stabilized.	2018-08-01 11:08:01 +02:00
Oleksii Kliukin	0181a1b5b1	Introduce a repair scan to fix failing clusters (#304 ) A repair is a sync scan that acts only on those clusters that indicate that the last add, update or sync operation on them has failed. It is supposed to kick in more frequently than the repair scan. The repair scan still remains to be useful to fix the consequences of external actions (i.e. someone deletes a postgres-related service by mistake) unbeknownst to the operator. The repair scan is controlled by the new repair_period parameter in the operator configuration. It has to be at least 2 times more frequent than a sync scan to have any effect (a normal sync scan will update both last synced and last repaired attributes of the controller, since repair is just a sync underneath). A repair scan could be queued for a cluster that is already being synced if the sync period exceeds the interval between repairs. In that case a repair event will be discarded once the corresponding worker finds out that the cluster is not failing anymore. Review by @zerg-junior	2018-07-24 11:21:45 +02:00
Oleksii Kliukin	1a0e5357dc	Improve generation of Scalyr container environment. (#346 ) * Improve generting of Scalyr container environment. Avoid duplicating POD_NAME and POD_NAMESPACE that already bundled every sidecar. Do not complain on the lack of SCLALYR_SERVER_HOST, since it is set to https://upload.eu.scalyr.com in the container we use. Do not mentioned SCALYR_SERVER_HOST in the error messages, since it is derived from the cluster name automatically.	2018-07-24 11:16:24 +02:00
Oleksii Kliukin	12871aad1a	Avoid showing an extra error when resizing volume fails (#350 ) Do not show 'persistent volumes are not compatible' errors for the volumes that failed to be resized because of the other reasons (i.e. the new size is smaller than the existing one).	2018-07-20 14:12:25 +02:00
zerg-junior	417f13c0bd	Submit RBAC credentials during initial Event processing (#344 ) * During initial Event processing submit the service account for pods and bind it to a cluster role that allows Patroni to successfully start. The cluster role is assumed to be created by the k8s cluster administrator.	2018-07-19 16:40:40 +02:00
Oleksii Kliukin	3a9378d3b8	Allow configuring the operator via the YAML manifest. (#326 ) * Up until now, the operator read its own configuration from the configmap. That has a number of limitations, i.e. when the configuration value is not a scalar, but a map or a list. We use a custom code based on github.com/kelseyhightower/envconfig to decode non-scalar values out of plain text keys, but that breaks when the data inside the keys contains both YAML-special elememtns (i.e. commas) and complex quotes, one good example for that is search_path inside `team_api_role_configuration`. In addition, reliance on the configmap forced a flag structure on the configuration, making it hard to write and to read (see https://github.com/zalando-incubator/postgres-operator/pull/308#issuecomment-395131778). The changes allow to supply the operator configuration in a proper YAML file. That required registering a custom CRD to support the operator configuration and provide an example at manifests/postgresql-operator-default-configuration.yaml. At the moment, both old configmap and the new CRD configuration is supported, so no compatibility issues, however, in the future I'd like to deprecate the configmap-based configuration altogether. Contrary to the configmap-based configuration, the CRD one doesn't embed defaults into the operator code, however, one can use the manifests/postgresql-operator-default-configuration.yaml as a starting point in order to build a custom configuration. Since previously `ReadyWaitInterval` and `ReadyWaitTimeout` parameters used to create the CRD were taken from the operator configuration, which is not possible if the configuration itself is stored in the CRD object, I've added the ability to specify them as environment variables `CRD_READY_WAIT_INTERVAL` and `CRD_READY_WAIT_TIMEOUT` respectively. Per review by @zerg-junior and @Jan-M.	2018-07-16 16:20:46 +02:00
Oleksii Kliukin	e90a01050c	Switchover must wait for the inner goroutine before it returns. (#343 ) * Switchover must wait for the inner goroutine before it returns. Otherwise, two corner cases may happen: - waitForPodLabel writes to the podLabelErr channel that has been already closed by the outer routine - the outer routine exists and the caller subscribes to the pod the inner goroutine has already subscribed to, resulting in panic. The previous commit `fe47f9ebea` that touched that code added the cancellation channel, but didn't bother to actually wait for the goroutine to be cancelled. Per report and review from @valer-cara. Original issue: https://github.com/zalando-incubator/postgres-operator/issues/342	2018-07-16 11:50:35 +02:00
Oleksii Kliukin	b7b950eb28	Use the StorageClassName field of the volumeClaimTemplate. (#338 ) The old way of specifying it with the annotation is deprecated and not available in recent Kubernetes versions. We will keep it there anyway until upgrading to the new go-client that is incompatible with those versions. Per report from @schmitch	2018-07-16 11:49:58 +02:00
Oleksii Kliukin	25a306244f	Support for per-cluster and operator global sidecars (#331 ) * Define sidecars in the operator configuration. Right now only the name and the docker image can be defined, but with the help of the pod_environment_configmap parameter arbitrary environment variables can be passed to the sidecars. * Refactoring around generatePodTemplate. Original implementation of per-cluster sidecars by @theRealWardo Per review by @zerg-junior and @Jan-M	2018-07-02 16:25:27 +02:00
zerg-junior	7394c15d0a	Make AWS region configurable in the operator cofig map (#333 )	2018-06-27 17:29:02 +02:00
Oleksii Kliukin	04b660519a	Fix exec into pods to resize volumes for multi-container pods. The original code assumed only one container per pod.	2018-06-04 14:51:39 +02:00
Oleksii Kliukin	48a5744314	Use Patroni API to set bootstrap-only options. (#299 ) Call Patroni API /config in order to set special options that are ignored when set in the configuration file, such as max_connections. Per https://github.com/zalando-incubator/postgres-operator/issues/297 * Some minor refacoring: Rename Cluster ManualFailover to Swithover Rename Patroni Failover to Switchover Add more details to error messages and comments introduced in this PR. Review by @zerg-junior	2018-05-29 12:35:25 +02:00
Oleksii Kliukin	76ea754fc3	Be lenient when asked to shrink a persisten volume. Do not hard error, emit a warning instead. The cluster is not going to be broken because of our refusal to shrink a volume.	2018-05-24 11:17:42 +02:00
Oleksii Kliukin	1ea8b3bbe6	Fix a crash on node migration. After an unsuccessful initial cluster sync it may happen that the cluster statefulset is empty. This has been made more likely since `88d6a7be3`, since it has introduced syncing volumes before statefulsets, and the volume sync mail fail for different reasons (i.e. the volume has been shrinked, or too many calls to Amazon).	2018-05-24 11:05:19 +02:00
Oleksii Kliukin	e84ecb1d03	Address code review by @zerg-junior	2018-05-23 11:36:38 +02:00
Oleksii Kliukin	f5550c337b	Put special patroni parameters to the bootstrap. Some special patroni postgresql parameters, like max_connections, should reside in the bootstrap.dcs.postgresql.parameters section to come into effect.	2018-05-22 18:27:12 +02:00
zerg-junior	e6d12b3480	Merge pull request #295 from zalando-incubator/continue_on_delete_errors Avoid terminating delete on errors.	2018-05-22 10:44:43 +02:00
Oleksii Kliukin	27c7245fed	Avoid terminating delete on errors. When there is an error happening upon deletion of the Kubernetes object belonging to the cluster being removed, it makes no sense to abort the deletion: the manifest will be removed anyway, therefore all the objects after the one we aborted at will stay forever.	2018-05-18 18:10:37 +02:00
Oleksii Kliukin	a8fdd3f2db	Fix crash during sync. Do not use statefulset number of pods to figure out running ones for volume resizing, since the statefulset pointer could be nil. Instead, look at the actual running pods.	2018-05-18 14:42:20 +02:00
Oleksii Kliukin	88d6a7be3f	Sync persistent volumes before statefulsets. (#293 ) Avoid the condition of waiting for the pod that cannot start PostgreSQL because it ran out of disk space.	2018-05-18 12:01:43 +02:00
Oleksii Kliukin	52ddcd25cc	Sync persistent volumes before statefulsets. Avoid the condition of waiting for the pod that cannot start PostgreSQL because it ran out of disk space.	2018-05-18 11:43:45 +02:00
Oleksii Kliukin	cf800aef90	Minor import fix	2018-05-15 16:53:12 +02:00
Oleksii Kliukin	11d568bf65	Address code review by @zerg-junior - new info messages, rename the annotation flag.	2018-05-15 16:50:03 +02:00
Oleksii Kliukin	0c616a802f	Merge branch 'master' into rolling_updates_with_statefulset_annotations # Conflicts: # pkg/cluster/k8sres.go	2018-05-15 15:33:34 +02:00
Oleksii Kliukin	987b43456b	Deprecate old LB options, fix endpoint sync. (#287 ) * Depreate old LB options, fix endpoint sync. - deprecate useLoadBalancer, replicaLoadBalancer from the manifest and enable_load_balancer from the operator configuration. The old operator configuration options become no-op with this commit. For the old manifest options, `useLoadBalancer` and `replicaLoadBalancer` are still consulted, but only in the absense of the new ones (enableMasterLoadBalancer and enableReplicaLoadBalancer). - Make sure the endpoint being created during the sync receives proper addresses subset. This is more critical for the replicas, as for the masters Patroni will normally re-create the endpoint before the operator. - Avoid creating the replica endpoint, since it will be created automatically by the corresponding service. - Update the README and unit tests. Code review by @mgomezch and @zerg-junior	2018-05-15 15:19:18 +02:00
Oleksii Kliukin	332dab5237	Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations	2018-05-08 14:51:10 +02:00
Oleksii Kliukin	f41a42f922	Merge branch 'rolling_updates_with_statefulset_annotations' of github.com:zalando-incubator/postgres-operator into rolling_updates_with_statefulset_annotations	2018-05-07 10:16:30 +02:00
Oleksii Kliukin	ce0d4af91c	Initial implementation for the statefulset annotations indicating rolling updates.	2018-05-07 08:07:37 +02:00
Oleksii Kliukin	1a20362c5b	Initial implementation for the statefulset annotations indicating rolling updates.	2018-05-04 18:59:23 +02:00
Oleksii Kliukin	43a1db2128	Merge branch 'master' into pending_rolling_updates	2018-05-03 11:27:16 +02:00
Oleksii Kliukin	fe47f9ebea	Improve the pod moving behavior during the Kubernetes cluster upgrade. (#281 ) * Improve the pod moving behavior during the Kubernetes cluster upgrade. Fix an issue of not waiting for at least one replica to become ready (if the Statefulset indicates there are replicas) when moving the master pod off the decomissioned node. Resolves the first part of #279. Small fixes to error messages. * Eliminate a race condition during the swithover. When the operator initiates the failover (switchover) that fails and then retries it for a second time it may happen that the previous waitForPodChannel is still active. As a result, the operator subscribes to the former master pod two times, causing a panic. The problem was that the original code didn't bother to cancel the waitForPodLalbel for the new master pod in the case when the failover fails. This commit fixes it by adding a stop channel to that function. Code review by @zerg-junior	2018-05-03 10:20:24 +02:00
Sergey Dudoladov	59ded0c212	Shorten bucket name	2018-05-02 14:05:57 +02:00
Sergey Dudoladov	c45219bafa	Set up an S3 bucket for the postgres daily logs	2018-05-02 12:52:42 +02:00
Oleksii Kliukin	37caa3f60b	Fix a bug with syncing services Avoid showing "there is no service in the cluster" when syncing a service for the cluster if the operator has been restarted after the cluster had been created.	2018-04-27 12:35:25 +02:00
zerg-junior	8f08bef67c	Merge pull request #277 from zalando-incubator/automatically-deploy-service-account Deploy service account for pod creation on demand	2018-04-26 14:44:37 +02:00
Sergey Dudoladov	1b718fd4c2	Minor improvemets in reporting service account creation	2018-04-26 13:47:25 +02:00
Sergey Dudoladov	d99b553ec1	Convert default account definiton into JSON	2018-04-25 12:35:16 +02:00
Sergey Dudoladov	485ec4b8ea	Move service account to Controller	2018-04-24 15:13:08 +02:00
Sergey Dudoladov	bc8b950da4	Tolerate issues of the Teams API	2018-04-23 16:31:53 +02:00
Sergey Dudoladov	c31c76281c	Make operator unaware of its own service account	2018-04-23 14:38:20 +02:00
Sergey Dudoladov	5daf0a4172	Fix error reporting during pod service account creation	2018-04-20 14:20:38 +02:00
Sergey Dudoladov	bd51d2922b	Turn ServiceAccount into struct value to avoid race conditon during account creation	2018-04-20 13:05:05 +02:00
Sergey Dudoladov	23f893647c	Remove sync of pod service accounts	2018-04-19 15:48:58 +02:00
Sergey Dudoladov	214ae04aa7	Deploy service account for pod creation on demand	2018-04-18 16:20:20 +02:00
Oleksii Kliukin	0618723a61	Check rolling updates using controller revisions. Compare pods controller revisions with the one for the statefulset to determine whether the pod is running the latest revision and, therefore, no rolling update is necessary. This is performed only during the operator start, afterwards the rolling update status that is stored locally in the cluster structure is used for all rolling update decisions.	2018-04-09 18:07:24 +02:00
Manuel Gómez	88c68712b6	Fix statefulset label selector diffing (#273 ) Otherwise, rolling updates are done unnecessarily.	2018-04-06 17:21:57 +02:00
Oleksii Kliukin	9bf80afa6b	Remove team from statefulset selector (#271 ) * Remove 'team' label from the statefulset selector. I was never supposed to be there, but implicitely statefulset creates a selector out of meta.labels field. That is the problem with recent Kubernetes, since statefulset cannot pick up pods with non-matching label selectors, and we rely on statefulset picking up old pods after statefulset replacement. Make sure selector changes trigger replacement of the statefulset. In the case new selector has more labels than the old one nothing should be done with a statefulset, otherwise the new statefulset won't see orphaned pods from the old one, as they won't match the selector. See https://github.com/kubernetes/kubernetes/issues/46901#issuecomment-356418393	2018-04-06 13:58:47 +02:00
Oleksii Kliukin	26db91c53e	Improve infrastructure role definitions (#208 ) Enhance definitions of infrastructure roles by allowing membership in multiple roles, role options and per-role configuration to be specified in the infrastructure role configmap, which must have the same name as the infrastructure role secret. See manifests/infrastructure-roles-configmap.yaml for the examples and updated README for the description of different types of database roles supposed by the operator and their purposes. Change the logic of merging infrastructure roles with the manifest roles when they have the same name, to return the infrastructure role unchanged instead of merging. Previously, we used to propagate flags from the manifest role to the resulting infrastructure one, as there were no way to define flags for the infrastructure role; however, this is not the case anymore. Code review and tests by @erthalion	2018-04-04 17:21:36 +02:00
zerg-junior	d264be9faa	Merge pull request #261 from zalando-incubator/wal_bucket_scope_prefix Fix clone for origins in non-default namespaces.	2018-04-03 17:47:18 +02:00
zerg-junior	ff5793b584	Merge pull request #258 from zalando-incubator/always-create-replica-service [WIP] Always create replica service	2018-03-29 14:42:26 +02:00
erthalion	8967a3be2c	Add tests for load balancer function logic	2018-03-27 12:11:46 +02:00
Sergey Dudoladov	ced770a827	Respond to code review	2018-03-26 11:07:32 +02:00
Sergey Dudoladov	a8862aeee1	Enable backward compatibility for enable_load_balancer setting from operator configmap	2018-03-19 17:19:50 +01:00
Sergey Dudoladov	386d7b6bdb	Implement backward compatibility with older load balancer settings	2018-03-16 13:27:38 +01:00
Sergey Dudoladov	20f30d3739	Update the method for deciding about load balancers	2018-03-14 12:46:58 +01:00
Sergey Dudoladov	0986e56226	Add separate params for master and replica load balancers to operator configuration	2018-03-14 12:12:28 +01:00
Sergey Dudoladov	ac6c5bcf09	Explicitly name replica and master load balancer params in PostgresSpec	2018-03-14 12:03:27 +01:00
Sergey Dudoladov	5bc5e70c81	Log if replica service has no load balancer	2018-03-12 16:48:44 +01:00
Sergey Dudoladov	5ff562a607	Minor improvements	2018-03-02 14:03:41 +01:00
Sergey Dudoladov	2aeff096f7	Make ReplicaLoadBalancer a separate toggler	2018-03-02 13:35:25 +01:00
Oleksii Kliukin	59a214727c	Fix clone for origins in non-default namespaces. By default, spilo sets WAL_BUCKET_SCOPE_PREFIX depending on the cluster namespace, possibly to a non-empty string. However, we won't be able to clone those clusters, as the clone prefix is always set to an empty string. We could go the other way around and set both WAL_BUCKET_SCOPE_PREFIX and CLONE_WAL_BUCKET_SCOPE_PREFIX to a non-default value that depends on the cluster's namespace, but it seems that we don't need this feature for now (no conflict will occur even for clusters with the same name and different namespaces because of the SCOPE_SUFFIX) and it requires some additional testing first.	2018-03-01 12:26:09 +01:00
Sergey Dudoladov	35104cb72b	Add CLONE_ prefix to the env var	2018-03-01 11:19:15 +01:00
Sergey Dudoladov	bcb8caeddf	Set WAL_BUCKET_SCOPE_PREFIX to the empty string	2018-03-01 11:16:47 +01:00
Sergey Dudoladov	fb21246fcd	Remove early stopping conditions that rely on the relica service being absent	2018-02-27 17:21:51 +01:00
Sergey Dudoladov	28fed26845	Do not delete an endpoint for the replica service w/o load balancer during sync	2018-02-27 17:18:30 +01:00
Sergey Dudoladov	b107d781e8	Do not delete replica service w/o load balancer during sync	2018-02-27 17:16:00 +01:00
Sergey Dudoladov	2ef069ee93	Create/delete replica service regardless of load balancer setup	2018-02-27 17:10:49 +01:00
zerg-junior	0f392c2007	Merge pull request #252 from zalando-incubator/label-teams Add 'team' label to pods, stateful sets, secrets and pod disruption budgets	2018-02-26 12:57:26 +01:00
Sergey Dudoladov	071547e5bf	Modify to add extra labels only during resource creation	2018-02-26 11:11:50 +01:00
Oleksii Kliukin	2bb7e98268	update individual role secrets from infrastructure roles (#206 ) * Track origin of roles. * Propagate changes on infrastructure roles to corresponding secrets. When the password in the infrastructure role is updated, re-generate the secret for that role. Previously, the password for an infrastructure role was always fetched from the secret, making any updates to such role a no-op after the corresponding secret had been generated.	2018-02-23 17:24:04 +01:00
Sergey Dudoladov	00dc810544	Add 'team' label to pods, stateful sets, secrets and pod disruption budgets	2018-02-23 14:36:10 +01:00
Dmitrii Dolgov	ef50b147c5	Use list of checks instead of a map	2018-02-23 14:24:33 +01:00
Dmitrii Dolgov	95d86c7600	Move container comparison logic to a separate function	2018-02-23 11:58:37 +01:00
Oleksii Kliukin	c4aab502b3	Remove Patroni leftover objects on cluster deletion. (#244 ) * Remove all endpoints and configmaps from Patroni when Patroni is running with Kubernetes support on cluster deletion.	2018-02-23 09:52:22 +01:00
Dmitry Dolgov	bf4b0f0f33	Merge pull request #240 from zalando-incubator/feature/goreport-improvements Some improvements for golint, ineffassign and misspell	2018-02-22 11:31:08 +01:00
Oleksii Kliukin	cca73e30b7	Make code around recreating pods and creating objects in the database less brittle (#213 ) There used to be a masterLess flag that was supposed to indicate whether the cluster it belongs to runs without the acting master by design. At some point, as we didn't really have support for such clusters, the flag has been misused to indicate there is no master in the cluster. However, that was not done consistently (a cluster without all pods running would never be masterless, even when the master is not among the running pods) and it was based on the wrong assumption that the masterless cluster will remain masterless until the next attempt to change that flag, ignoring the possibility of master coming up or some node doing a successful promotion. Therefore, this PR gets rid of that flag completely. When the cluster is running with 0 instances, there is obviously no master and it makes no sense to create any database objects inside the non-existing master. Therefore, this PR introduces an additional check for that. recreatePods were assuming that the roles of the pods recorded when the function has stared will not change; for instance, terminated replica pods should start as replicas. Revisit that assumption by looking at the actual role of the re-spawned pods; that avoids a failover if some replica has promoted to the master role while being re-spawned. In addition, if the failover from the old master was unsuccessful, we used to stop and leave the old master running on an old pod, without recording this fact anywhere. This PR makes the failover failure emit a warning, but not stop recreating the last master pod; in the worst case, the running master will be terminated, however, this case is rather unlikely one. As a side effect, make waitForPodLabel return the pod definition it waited for, avoiding extra API calls in recreatePods and movePodFromEndOfLifeNode	2018-02-22 10:42:05 +01:00
zerg-junior	b0549c3c9c	Merge pull request #225 from zalando-incubator/support-many-namespaces Support many namespaces	2018-02-20 17:39:42 +01:00
Oleksii Kliukin	99c090899f	Change the suffix delimiter to slash. (#242 ) This allows using S3 API in order to simplify finding all folders that are different only by a suffix, since the suffix delimiter will not occur in the suffix itself (currently being a UID).	2018-02-20 16:31:44 +01:00
Oleksii Kliukin	c597377617	Use cluster UID as a suffix to the WAL bucket. (#211 ) Avoid reusing WAL S3 buckets of the older cluster with the same name as the existing one. For the new cluster, the S3 bucket name will include a suffix that is equal to the UID of the PostgreSQL object describing the cluster. That way, the bucket name will stay the same for all members iff they correspond to the same PostgreSQL cluster object. When "clone: uid:" key is present in the cluster manifest and the cluster is cloned from an S3 bucket (currently that happens if the endTimestamp is present in the clone description) the S3 bucket to clone from is suffixed with the -uid value.	2018-02-20 15:36:43 +01:00
Dmitrii Dolgov	a7cd859919	Some improvements for golint, ineffassign and misspell	2018-02-19 17:46:31 +01:00
Sergey Dudoladov	f194a2ae5a	Introduce changes from the PR #200 by @alexeyklyukin	2018-02-07 14:02:32 +01:00
Sergey Dudoladov	ea84f9d577	Rename the configmap 'namespace' entry to avoid confusion with the map's owm namespace	2018-02-06 15:09:00 +01:00
Oleksii Kliukin	b90a36c909	Set node_readiness_label default to an empty value. (#204 ) Previously, it was set to the lifecycle-status:ready, breaking a lot of minikube deployments. Also it was not possible befor to run with this label set to an empty value. Document the effect of the label in the new section of the documentation.	2018-01-16 15:43:03 +01:00
Manuel Gómez	bf4406d2a4	Consider container names in Statefulset diffs (#210 ) This includes a comparison on container names being equal in the decision of whether a Statefulset has been updated.	2018-01-16 12:06:11 +01:00
Oleksii Kliukin	23011bdf9a	Migrate only master pods. Migrate single masters. (#199 ) Avoid migrating replica pods, since they will be handled by the node draining anyway (the PDB specifies that only masters are to be kept). Allow migration of the single-pod clusters.	2018-01-09 11:55:11 +01:00
zerg-junior	bb5ce6cbbe	Merge pull request #195 from zalando-incubator/databases-rest-endpoint Add a REST endpoint to list databases in all clusters	2018-01-09 11:53:32 +01:00
Oleksii Kliukin	8e99518eeb	Improve behavior on node decomissionining (#184 ) * Trigger the node migration on the lack of the readiness label. * Examine the node's readiness status on node add. Make sure we don't miss the not ready node, especially when the operator is killed during the migration.	2018-01-04 11:53:15 +01:00
Manuel Gómez	1109cfa7a1	Add PostgreSQL pod namespace Scalyr sidecar environment (#196 ) Another tiny bit of information that could be useful for log filters once we start deploying clusters into separate namespaces.	2017-12-22 17:12:50 +01:00
Oleksii Kliukin	9720ac1f7e	WIP: Hold the proper locks while examining the list of databases. Introduce a new lock called specMu lock to protect the cluster spec. This lock is held on update and sync, and when retrieving the spec in the API code. There is no need to acquire it for cluster creation and deletion: creation assigns the spec to the cluster before linking it to the controller, and deletion just removes the cluster from the list in the controller, both holding the global clustersMu Lock.	2017-12-22 13:06:11 +01:00
Manuel Gómez	cd9bc7bdc5	Add PostgreSQL pod name Scalyr sidecar environment (#194 ) This will allow the Scalyr image to add a custom attribute to shipped log entries that notes the name of the originating pod.	2017-12-21 16:52:27 +01:00
Manuel Gómez	15c278d4e8	Scalyr agent sidecar for log shipping (#190 ) * Scalyr agent sidecar for log shipping * Remove the default for the Scalyr image Now the image needs to be specified explicitly to enable log shipping to Scalyr. This removes the problem of having to generate the config file or publish our agent image repository. * Add configuration variable for Scalyr server URL Defaults to the EU address. * Alter style Newlines are cheap and make code easier to edit/refactor, but ok. * Fix StatefulSet comparison logic I broke it when I made the comparison consider all containers in the PostgreSQL pod.	2017-12-21 15:34:26 +01:00
Oleksii Kliukin	da0de8cff7	Make sure the statefulset that is deleted manually gets re-created. (#191 ) * Make sure the statefulset that is deleted manually gets re-created. Per report and analysis by Manuel Gomez. * Move the existence checks for other objects out of the Create functions. create{Object} for services, endpoints and PDBs refused to continue if there is a cached definition in the cluster, however, the only place where it makes sense is when creating a new cluster. Note that contrary to the statefulset this doesn't fix any issues, since those definitions were nullified correspondingly when the sync code detected there is no object present in the Kubernetes cluster.	2017-12-21 15:20:43 +01:00
zerg-junior	5d5fa680a3	Merge pull request #180 from zalando-incubator/container-name Make pod's single container name static	2017-12-15 16:13:33 +01:00
Oleksii Kliukin	bf80f5225e	Introduce higher and lower bounds for the number of instances (#178 ) * Introduce higher and lower bounds for the number of instances Reduce the number of instances to the min_instances if it is lower and to the max_instances if it is higher. -1 for either of those means there is no lower or upper bound. In addition, terminate the operator when there is a nonsense in the configuration (i.e. max_instances < min_instances). Reviewed by Jan Mußler and Sergey Dudoladov.	2017-12-15 16:02:50 +01:00
Sergey Dudoladov	52e358ba8f	Make pod's single container name static	2017-12-15 15:53:53 +01:00
Oleksii Kliukin	0e255f82c6	Provide more information about variable conflicts. They are mentioned in the documentation and the operator will emit a warning each time the variable from the pod environment configmap is ignored because the same variable is defined by the operator. Some minor changes in the variable names to make the code more readable. Per review from Sergey Dudoladov.	2017-12-14 14:39:33 +01:00

1 2 3 4 5 ...

311 Commits