* add the possibility to create a standby cluster that streams from a remote primary
* extending unit tests
* add more docs and e2e test
Co-authored-by: machine424 <ayoubmrini424@gmail.com>
* return only warning if team can't be found
Co-authored-by: Jociele Padilha <jociele.padilha@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* do not create endpoints when use config maps
* delete cluster objects with 'leader' suffix
Co-authored-by: Евграфов Александр Александрович <aevgrafov@cmx.ru>
* feat: add ignored annotations when comparing during sync
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
Co-authored-by: Moshe Immerman <moshe@flanksource.com>
In Go, when a struct field is not set, it becomes a struct with
default values for all fields. These default values are included
during serialization. This causes issues with schema validation
where optional fields cannot be omitted because default values
are considered invalid.
This patch addresses this issue for `Resources` fields on several
types by using a pointer value.
* Add support for pooler load balancer
Signed-off-by: Sergey Shatunov <me@prok.pw>
* Rename to enable_master_pooler_load_balancer
Signed-off-by: Sergey Shatunov <me@prok.pw>
* target port should be intval
* enhance pooler e2e test
* add new options to crds.go
Co-authored-by: Sergey Shatunov <me@prok.pw>
* Add optional logical backup retention time
* Set defaults for potentially unbound variables, so that the script will work with older operator versions
* Document retention time parameter for logical backups
* Add retention time parameter to resources and charts
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* provide event stream API
* check manifest settings for logical decoding before creating streams
* operator updates Postgres config and creates replication user
* name FES like the Postgres cluster
* add delete case and fix updating streams + update unit test
* check if fes CRD exists before syncing
* existing slot must use the same plugin
* make id and payload columns configurable
* sync streams only when they are defined in manifest
* introduce applicationId for separate stream CRDs
* add FES to RBAC in chart
* disable streams in chart
* switch to pgoutput plugin and let operator create publications
* reflect code review and additional refactoring
Co-authored-by: Paŭlo Ebermann <paul.ebermann@zalando.de>
* synchronous_node_count support
* notification about Patroni image version
* default synchronous_node_count to 1
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* do not recreate pods if previous Patroni API calls fail
* move retry reads against Patroni API to pod.go
* remove final failover check in node affinity test
* make test_min_resource_limits more robust
* password rotation in K8s secrets
* add db connection to syncSecrets
* add user retention
* add e2e test
* cleanup on username mismatch if rotation was switched off
* add unit test for syncSecrets + new updateSecret func
* include tolerations in statefulset comparison
* provide alternative merge behavior of nodeSelectorTerms for node readiness label
* add config option to change affinity merge behavior
* reworked e2e tests around node affinity
* Make CRD registration configurable and drop RBAC permissions when CRD registration is disabled
* add generated deep copy functions
Co-authored-by: Damian Peckett <d.peckett_admin@mgmt.innovo-cloud.de>
* Add support for manual gs_wal_path in standby
* Remove separate standby version configuration
* Remove setting standby path via cluster/uid/version
Picking up the version doesn't work reliably without making changes to
Spilo. It's clearer to just specify the full S3/GS bucket path.
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* init error arrays correctly
* avoid nilPointer when syncing connectionPooler
* getInfrastructureRoles should return error
* fix unit tests and return type for getInfrastructureRoles
* restart master first in some edge cases
* edge case is when desired is lower than effective
* wait after config patch and restart on sync whenever we see pending_restart
* convert options to int to check decrease and add unit test
* minor update to e2e tests
* wait only after restart not every sync
* using spilo 14 e2e images
* improve Patroni config sync
* collect new and updated slots to patch patroni
* refactor httpGet in Patroni and extend unit tests
* GetMemberData should call the patroni endpoint
* add PATCH test
* remove role from installLookupFunction and run it on database sync, too
* fix condition to decide on syncing pooler
* trigger lookup from database sync only if pooler is set
* use empty spec everywhere and do not sync if one lookupfunction was passed
* do not sync pooler after being disabled
This commit adds support for using an Azure storage account as a backup
location.
It uses the existing GCS functionality as a reference for what to do,
and follows the example set by GCS as closely as possible.
The decision to name the cloud provider key "aws_or_gcp" is unfortunate
while adding support for Azure, but I have left it alone to allow for
this changeset to be backwards compatible.
* refactor restarting instances and reduce listPods calls
* only add parameters to set if it differs from effective config
* update e2e test for updating Postgres config
* patch config only once
* reorder e2e tests to follow alphabetical sorting
* e2e: finish waiting for pod failover only if all pods were replaced
* wait for sync in rolling update timeout test
* restart instances via rest api instead of recreating pods
* Ignore differences in bootstrap.dcs when compare SPILO_CONFIGURATION
* isBootstrapOnlyParameter is rewritten, instead of whitelist it uses blacklist
* added e2e test for max_connections decreasing
* documentation updated
* pending_restart flag added to restart api call, wait fot ttl seconds after restart
* refactoring, /restart returns error if pending_restart is set to true and patroni is not pending restart
* restart postgresql instances within pods only if pod's restart is not required
* patroni might need to restart postgresql after pods were recreated if values like max_connections decreased
* instancesRestart is not critical, try to restart pods if not successful
* cleanup
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* Create cross namespace secrets
* add test cases
* fixes
* Fixes
- include namespace in secret name only when namespace is provided
- use username.namespace as key to pgUsers only when namespace is
provided
- avoid conflict in the role creation in db by checking namespace
alongwith the username
* Update unit tests
* Fix test case
* Fixes
- update regular expression for usernames
- add test to allow check for valid usernames
- create pg roles with namespace (if any) appended in rolename
* add more test cases for valid usernames
* update docs
* fixes as per review comments
* update e2e
* fixes
* Add toggle to allow namespaced secrets
* update docs
* comment update
* Update e2e/tests/test_e2e.py
* few minor fixes
* fix unit tests
* fix e2e
* fix e2e attempt 2
* fix e2e
Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* Mount additional volumes to 'postgres' container when 'targetContainers' is an empty list
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* rename db roles that are removed from manifests
* extend PostgresTeam e2e test
* make suffix configurable and add deprecated field to pgUser struct
* deny LOGIN from deprecated roles
* update feature documentation
* replace statefulset on annotation diff
* remove update annotation function for statefulset
* add unit test for syncing annotations
* add inherited annotation to unit test
* helm chart remove 1.6.0 archive from 1.6.0 archive
* bump operator to v1.6.2
* fix pointer deref
* skip connection pooler sync when empty
* revert pooler change and minor update to version msg
* do not log query on error when creating or altering users
* Fix for AllowPrivilegeEscalation : issue-1403
* fixed syntax error
* Aligned the value for parameter
* Aligned the value for parameter
* Update crds.go
* Aligned the parameter spilo_allow_privilege_escalation
* Parameters sorted in Alphabetical order in manifests yaml
* Parameters sorted in Alphabetical order in manifests yaml
* Update pkg/controller/operator_config.go
* Update docs/reference/operator_parameters.md
Co-authored-by: Neelam Sharma <neelasha@amdocs.com>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* add TODOs for moving rooling update label on pods
* steer rolling update via pod annotation
* rename patch method and fix reading flag on sync
* pass only pods to recreatePods function
* do not take address of iterator if you use it later
* add e2e test and pass switchover targets to recreatePods
* add wait_for_pod_failover for e2e test
* add one more e2e test case
* helm chart remove 1.6.0 archive from 1.6.0 archive
* reflect code review feedback
Support major version upgrade trigger via manifest. There is `off` `manual` and `full`. Manual is what you expect, and full will auto upgrade clusters below a certain threshold.
* making pgTeamMap a pointer
* init empty map
* add e2e test for additional teams and members
* update test_min_resource_limits
* add more waiting in node_affinity_test
* no need for pointers in map of postgresTeamMebership
* another minor update on node affinity test
* refactor and fix fetching additional members
* pre-allocate cap for slice structure
* if clause is no need because of range, and kubelet also use range
method to get each capability so there is no side-effect
Signed-off-by: Jeff Zvier <zvier20@gmail.com>
Avoid extra syncing in case there are no changes in pooler requirements.
Add pooler specific labels to pooler secrets.
Add test case to check for pooler secret creation and deletion.
Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
* add default values to operatorconfiguration crd
* leave default for enable_master_load_balancer to true
* add missing bits for new logical backup option
* fix wrong lb tag and update chart package
* bump tp v1.6.0
* update logical-backup image
* Using smaller image for e2e test.
* fix env var name in docs
* add postgresql-client-13 to logical backup image
Co-authored-by: Jan Mussler <janm81@gmail.com>
* Initial commit for new 1.6 release with Postgres 13 support.
* Updating maintainers, Go version, Codeowners.
* Use lazy upgrade image that contains pg13.
* fix typo for ownerReference
* fix clusterrole in helm chart
* reflect GCP logical backup in validation
* improve PostgresTeam docs
* change defaults for enable_pgversion_env_var and storage_resize_mode
* explain manual part of in-place upgrade
* remove gsoc docs
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* Adding nodeaffinity support alongside node_readiness_label
* add documentation for node affinity
* add node affinity e2e test
* add unit test for node affinity
Co-authored-by: Steffen Pøhner Henriksen <str3sses@gmail.com>
Co-authored-by: Adrian Astley <adrian.astley@activision.com>
* add comments where inherited annotations could be added
* add inheritedAnnotations feature
* return nil if no annotations are set
* minor changes
* first downscaler then inherited annotations
* add unit test for inherited annotations
* add pvc to test + minor changes
* missing comma
* fix nil map assignment
* set annotations in the same order it is done in other places
* replace acidClientSet with acid getters in K8s client
* more fixes on clientSet vs getters
* minor changes
* remove endpoints from annotation test
* refine unit test - but deployment and sts are still empty
* fix checkinng sts and deployment
* make annotations setter one liners
* no need for len check anymore
Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
* initial commit for gp3 migration.
* Default volume migration done.
* Added Gomock and one test case with mock.
* Dep update.
* more changes for code gen.
* push fake package.
* Rename var.
* Changes to Makefile and return value.
* Macke mocks phony due to overlap in foldername.
* Learning as one goes. Initialize map.
* Wrong toggle.
* Expect modify call.
* Fix mapping of ids in test.
* Fix volume id.
* volume ids.
* Fixing test setup. Late night...
* create all pvs.
* Fix test case config.
* store volumes and compare.
* More logs.
* Logging of migration action.
* Ensure to log errors.
* Log warning if modify failed, e.g. due to ebs volume state.
* Add more output.
* Skip local e2e tests.
* Reflect k8s volume id in test data. Extract aws volume id from k8s value.
* Finalizing ebs migration.
* More logs. describe fails.
* Fix non existing fields in gp2 discovery.
* Remove nothing to do flag for migration.
* Final commit for migration.
* add new options to all places
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* preserving fields when k8s specs are used with x-kubernetes-preserve-unknown-fields flag
* cleaning up merge errors in postgresql and operatorconfiguration CRD
* add operatorconfiguration CRD and sample manifests in setUpClass of e2e tests
* update generated code and go modules
Stick with the existing pooler deployment selector labels to make it compatible with existing deployments.
Make the use of additional labels clear and avoid where not needed.
Deployment Selector and Service Selector now do not use extra labels, pod spec does.
* Enable connection pooler for replica
* Refactor code for connection pooler
- Move all the relevant code to a separate file
- Move all the related tests to a separate file
- Avoid using cluster where not required
- Simplify the logic in sync and other methods
- Cleanup of duplicated or unused code
* Fix labels for the replica pods
* Update deleteConnectionPooler to include role
* Adding test cases and other changes
- Fix unit test and delete secret when required only
- Make sure we use empty fresh cluster for every test case.
* enhance e2e test
* Disable pooler in complete manifest as this is source for e2e too an creates unnecessary pooler setups.
Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
Co-authored-by: Jan Mussler <janm81@gmail.com>
* check resize mode on update events
* add unit test for PVC resizing
* set resize mode to pvc in charts and manifests
* add test for quantityToGigabyte
* just one debug line for syncing volumes
* extend test and update log msg
* clean up after test_multi_namespace test
* see the PR description for complete list of changes
Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
* Fix clone from gcs
* pass google credentials env var if using GS bucket
* remove requirement for timezone as GCS returns timestamp in local time to the region it is in
* Revert "remove requirement for timezone as GCS returns timestamp in local time to the region it is in"
This reverts commit ac4eb350d9.
* update GCS documentation
* remove sentence about logical backups
* reword pod environment configmap section
* fix documentation
* PostgresTeamCRD for advanced team management
* rework internal structure to be closer to CRD
* superusers instead of admin
* add more util functions and unit tests
* fix initHumanUsers
* check for superusers when creating normal teams
* polishing and fixes
* adding the essential missing pieces
* add documentation and update rbac
* reflect some feedback
* reflect more feedback
* fixing debug logs and raise QueueResyncPeriodTPR
* add two more flags to disable CRD and its superuser support
* fix chart
* update go modules
* move to client 1.19.3 and update codegen
* Improving end 2 end tests, especially speed of execution and error, by implementing proper eventual asserts and timeouts.
* Add documentation for running individual tests
* Fixed String encoding in Patorni state check and error case
* Printing config as multi log line entity, makes it readable and grepable on startup
* Cosmetic changes to logs. Removed quotes from diff. Move all object diffs to text diff. Enabled padding for log level.
* Mount script with tools for easy logaccess and watching objects.
* Set proper update strategy for Postgres operator deployment.
* Move long running test to end. Move pooler test to new functions.
* Remove quote from valid K8s identifiers.
* Lookup function installation
Due to reusing a previous database connection without closing it, lookup
function installation process was skipping the first database in the
list, installing twice into postgres db instead. To prevent that, make
internal initDbConnWithName to overwrite a connection object, and return
the same object only from initDbConn, which is sort of public interface.
Another solution for this would be to modify initDbConnWithName to
return a connection object and then generate one temporary connection
for each db. It sound feasible but after one attempt it seems it
requires a bit more changes around (init, close connections) and
doesn't bring anything significantly better on the table. In case if
some future changes will prove this wrong, do not hesitate to refactor.
Change retry strategy to more insistive one, namely:
* retry on the next sync even if we failed to process one database and
install pooler appliance.
* perform the whole installation unconditionally on update, since the
list of target databases could be changed.
And for the sake of making it even more robust, also log the case when
operator decides to skip installation.
Extend connection pooler e2e test with verification that all dbs have
required schema installed.
Right now there are no readiness probes defined for connection pooler,
which means after a pod restart there is a short time window (between a
container start and connection pooler starting listening to a socket)
when a service can send queries to a new pod, but connection will be
refused. The pooler container is rather lightweight and it start to
listen immediately, so the time window is small, but still.
To fix this add a readiness probe for tcp socket opened by connection
pooler.
* post polishing for latest PRs
* update travis and go modules
* make deprecation comments in structs less confusing
* have separate pod priority class es for operator and database pods
* allow using both infrastructure_roles_options
* new default values for user and role definition
* use robot_zmon as parent role
* add operator log to debug
* right name for old secret
* only extract if rolesDefs is empty
* set password1 in old infrastructure role
* fix new infra rile secret
* choose different role key for new secret
* set memberof everywhere
* reenable all tests
* reflect feedback
* remove condition for rolesDefs
Extend infrastructure roles handling
Postgres Operator uses infrastructure roles to provide access to a database for
external users e.g. for monitoring purposes. Such infrastructure roles are
expected to be present in the form of k8s secrets with the following content:
inrole1: some_encrypted_role
password1: some_encrypted_password
user1: some_entrypted_name
inrole2: some_encrypted_role
password2: some_encrypted_password
user2: some_entrypted_name
The format of this content is implied implicitly and not flexible enough. In
case if we do not have possibility to change the format of a secret we want to
use in the Operator, we need to recreate it in this format.
To address this lets make the format of secret content explicitly. The idea is
to introduce a new configuration option for the Operator.
infrastructure_roles_secrets:
- secretname: k8s_secret_name
userkey: some_encrypted_name
passwordkey: some_encrypted_password
rolekey: some_encrypted_role
- secretname: k8s_secret_name
userkey: some_encrypted_name
passwordkey: some_encrypted_password
rolekey: some_encrypted_role
This would allow Operator to use any avalable secrets to prepare infrastructure
roles. To make it backward compatible simulate the old behaviour if the new
option is not present.
The new configuration option is intended be used mainly from CRD, but it's also
available via Operator ConfigMap in a limited fashion. For ConfigMap one can
put there only a string with one secret definition in the following format (as
a string):
infrastructure_roles_secrets: |
secretname: k8s_secret_name,
userkey: some_encrypted_name,
passwordkey: some_encrypted_password,
rolekey: some_encrypted_role
Note than only one secret could be specified this way, no multiple secrets are
allowed.
Eventually the resulting list of infrastructure roles would be a total sum of
all supported ways to describe it, namely legacy via
infrastructure_roles_secret_name and infrastructure_roles_secrets from both
ConfigMap and CRD.
* change Clone attribute of PostgresSpec to *ConnectionPooler
* update go.mod from master
* fix TestConnectionPoolerSynchronization()
* Update pkg/apis/acid.zalan.do/v1/postgresql_type.go
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
Co-authored-by: Pavlo Golub <pavlo.golub@gmail.com>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* Extend operator configuration to allow for a pod_environment_secret just like pod_environment_configmap
* Add all keys from PodEnvironmentSecrets as ENV vars (using SecretKeyRef to protect the value)
* Apply envVars from pod_environment_configmap and pod_environment_secrets before doing the global settings from the operator config. This allows them to be overriden by the user (via configmap / secret)
* Add ability use a Secret for custom pod envVars (via pod_environment_secret) to admin documentation
* Add pod_environment_secret to Helm chart values.yaml
* Add unit tests for PodEnvironmentConfigMap and PodEnvironmentSecret - highly inspired by @kupson and his very similar PR #481
* Added new parameter pod_environment_secret to operatorconfig CRD and configmap examples
* Add pod_environment_secret to the operationconfiguration CRD
Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>
* delete secrets the right way
* make a one function
* continue deleting secrets even if one delete fails
Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
* Try to resize pvc if resizing pv has failed
* added config option to switch between storage resize strategies
* changes according to requests
* Update pkg/controller/operator_config.go
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* enable_storage_resize documented
added examples to the default configuration and helm value files
* enable_storage_resize renamed to volume_resize_mode, off by default
* volume_resize_mode renamed to storage_resize_mode
* Update pkg/apis/acid.zalan.do/v1/crds.go
* pkg/cluster/volumes.go updated
* Update docs/reference/operator_parameters.md
* Update manifests/postgresql-operator-default-configuration.yaml
* Update pkg/controller/operator_config.go
* Update pkg/util/config/config.go
* Update charts/postgres-operator/values-crd.yaml
* Update charts/postgres-operator/values.yaml
* Update docs/reference/operator_parameters.md
* added logging if no changes required
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* try to emit error for missing team name in cluster name
* skip creation after new cluster object
* move SetStatus to k8sclient and emit event when skipping creation and rename to SetPostgresCRDStatus
Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
* do not block rolling updates with lazy spilo update enabled
* treat initContainers like Spilo image
Co-authored-by: Felix Kunde <felix.kunde@zalando.de>
* Support for WAL_GS_BUCKET and GOOGLE_APPLICATION_CREDENTIALS environtment variables
* Fixed merge issue but also removed all changes to support macos.
* Updated test to new format
* Missed macos specific changes
* Added documentation and addressed comments
* Update docs/administrator.md
* Update docs/administrator.md
* Update e2e/run.sh
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
deleteConnectionPooler function incorrectly checks that the delete api response is ResourceNotFound. Looks like the only consequence is a confusing log message, but obviously it's wrong. Remove negation, since having ResourceNotFound as error is the good case.
Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>
* Initial commit
* Corrections
- set the type of the new configuration parameter to be array of
strings
- propagate the annotations to statefulset at sync
* Enable regular expression matching
* Improvements
-handle rollingUpdate flag
-modularize code
-rename config parameter name
* fix merge error
* Pass annotations to connection pooler deployment
* update code-gen
* Add documentation and update manifests
* add e2e test and introduce option in configmap
* fix service annotations test
* Add unit test
* fix e2e tests
* better key lookup of annotations tests
* add debug message for annotation tests
* Fix typos
* minor fix for looping
* Handle update path and renaming
- handle the update path to update sts and connection pooler deployment.
This way no need to wait for sync
- rename the parameter to downscaler_annotations
- handle other review comments
* another try to fix python loops
* Avoid unneccessary update events
* Update manifests
* some final polishing
* fix cluster_test after polishing
Co-authored-by: Rafia Sabih <rafia.sabih@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* PreparedDatabases with default role setup
* merge changes from master
* include preparedDatabases spec check when syncing databases
* create a default preparedDB if not specified
* add more default privileges for schemas
* use empty brackets block for undefined objects
* cover more default privilege scenarios and always define admin role
* add DefaultUsers flag
* support extensions and defaultUsers for preparedDatabases
* remove exact version in deployment manifest
* enable CRD validation for new field
* update generated code
* reflect code review
* fix typo in SQL command
* add documentation for preparedDatabases feature + minor changes
* some datname should stay
* add unit tests
* reflect some feedback
* init users for preparedDatabases also on update
* only change DB default privileges on creation
* add one more section in user docs
* one more sentence
* initial implementation
* describe forcing the rolling upgrade
* make parameter name more descriptive
* add missing pieces
* address review
* address review
* fix bug in e2e tests
* fix cluster name label in e2e test
* raise test timeout
* load spilo test image
* use available spilo image
* delete replica pod for lazy update test
* fix e2e
* fix e2e with a vengeance
* lets wait for another 30m
* print pod name in error msg
* print pod name in error msg 2
* raise timeout, comment other tests
* subsequent updates of config
* add comma
* fix e2e test
* run unit tests before e2e
* remove conflicting dependency
* Revert "remove conflicting dependency"
This reverts commit 65fc09054b.
* improve cdp build
* dont run unit before e2e tests
* Revert "improve cdp build"
This reverts commit e2a8fa12aa.
Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* Add EventsGetter to KubeClient to enable to sending K8S events
* Add eventRecorder to the controller, initialize it and hand it down to cluster via its constructor to enable it to emit events this way
* Add first set of events which then go to the postgresql custom resource the user interacts with to provide some feedback
* Add right to "create" events to operator cluster role
* Adapt cluster tests to new function sigurature with eventRecord (via NewFakeRecorder)
* Get a proper reference before sending events to a resource
Co-authored-by: Christian Rohmann <christian.rohmann@inovex.de>
* adds a Get call to Patroni interface to fetch state of a Patroni member
* postpones re-creating pods if at least one replica is currently being created
Co-authored-by: Sergey Dudoladov <sergey.dudoladov@zalando.de>
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* further compatibility with k8sUseConfigMaps - skip further endpoints related actions
* Update pkg/cluster/cluster.go
thanks!
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* Update pkg/cluster/cluster.go
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* Update pkg/cluster/cluster.go
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
There is a possibility to pass nil as one of the specs and an empty spec
into syncConnectionPooler. In this case it will perfom a syncronization
because nil != empty struct. Avoid such cases and make it testable by
returning list of syncronization reasons on top together with the final
error.
* Allow additional Volumes to be mounted
* added TargetContainers option to determine if additional volume need to be mounter or not
* fixed dependencies
* updated manifest additional volume example
* More validation
Check that there are no volume mount path clashes or "all" vs ["a", "b"]
mixtures. Also change the default behaviour to mount to "postgres"
container.
* More documentation / example about additional volumes
* Revert go.sum and go.mod from origin/master
* Declare addictionalVolume specs in CRDs
* fixed k8sres after rebase
* resolv conflict
Co-authored-by: Dmitrii Dolgov <9erthalion6@gmail.com>
Co-authored-by: Thierry <thierry@malt.com>
* Protected and system users can't be a connection pool user
It's not supported, neither it's a best practice. Also fix potential null
pointer access. For protected users it makes sense by intent of protecting this
users (e.g. from being overriden or used as something else than supposed). For
system users the reason is the same as for superuser, it's about replicastion
user and it's under patroni control.
This is implemented on both levels, operator config and postgresql manifest.
For the latter we just use default name in this case, assuming that operator
config is always correct. For the former, since it's a serious
misconfiguration, operator panics.
* Add patroni parameters for `synchronous_mode`
* Update complete-postgres-manifest.yaml, removed quotation marks
* Update k8sres_test.go, adjust result for `Patroni configured`
* Update k8sres_test.go, adjust result for `Patroni configured`
* Update complete-postgres-manifest.yaml, set synchronous mode to false in this example
* Update pkg/cluster/k8sres.go
Does the same but is shorter. So we fix that it if you like.
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* Update docs/reference/cluster_manifest.md
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* Add patroni's `synchronous_mode_strict`
* Extend `TestGenerateSpiloConfig` with `SynchronousModeStrict`
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
* kubernetes_use_configmap
* Update manifests/postgresql-operator-default-configuration.yaml
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* Update manifests/configmap.yaml
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* Update charts/postgres-operator/values.yaml
Co-Authored-By: Felix Kunde <felix-kunde@gmx.de>
* go.fmt
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
Connection pooler support
Add support for a connection pooler. The idea is to make it generic enough to
be able to switch between different implementations (e.g. pgbouncer or
odyssey). Operator needs to create a deployment with pooler and a service for
it to access.
For connection pool to work properly, a database needs to be prepared by
operator, namely a separate user have to be created with an access to an
installed lookup function (to fetch credential for other users).
This setups is supposed to be used only by robot/application users. Usually a
connection pool implementation is more CPU bounded, so it makes sense to create
several pods for connection pool with more emphasize on cpu resources. At the
moment there are no special affinity or tolerations assigned to bring those
pods closer to the database. For availability purposes minimal number of
connection pool pods is 2, ideally they have to be distributed between
different nodes/AZ, but it's not enforced in the operator itself. Available
configuration supposed to be ergonomic and in the normal case require minimum
changes to a manifest to enable connection pool. To have more control over the
configuration and functionality on the pool side one can customize the
corresponding docker image.
Co-authored-by: Felix Kunde <felix-kunde@gmx.de>
The current password generation algorithm is extremely deterministic, due to being based on the standard random number generator with a deterministic seed based on the current Unix timestamp (in seconds).
This can lead to a number of security issues, including:
The same passwords being used in different Kubernetes clusters if the operator is deployed in parallel. (This issue was discovered because of four deployments having the same generated passwords due to automatically being deployed in parallel.)
The passwords being easily guessable based on the time the operator pod started when the database was created. (This would typically be present in logs, metrics, etc., that may typically be accessible to more people than should have database access.)
Fix this issue by replacing the current randomness source with crypto/rand, which should produce cryptographically secure random data that is virtually unguessable. This will avoid both of the above problems as each deployment will be guaranteed to have unique, indeterministic passwords.
* add json:omitempty option to ClusterDomain
* Add default value for ClusterDomain
Unfortunately, omitempty in operator configuration CRD doesn't mean that
defauls from operator config object will be picked up automatically.
Make sure that ClusterDomain default is specified, so that even when
someone will set cluster_domain = "", it will be overwritted with a
default value.
Co-authored-by: mlu42 <mlu42pro@gmail.com>
* bump version to 1.4.0 + some polishing
* align version for UI chart
* update user docs to warn for standby replicas
* minor log message changes for RBAC resources
* define postgres-pod clusterrole and align rbac in chart
* align UI chart rbac with operator and update doc
* operator RBAC needs podsecuritypolicy to grant it to postgres-pod
The code added on #818 depends on map sorting to return a static reason
for service annotation changes. To avoid tests flakiness and map sorting
the tests include a `strings.HasPrefix` instead of comparing the whole
string. One of the test cases,
`service_removes_a_custom_annotation,_adds_a_new_one_and_change_another`,
is trying to test the whole reason string.
This commit replaces the test case reason, for only the reason prefix.
It removes the flakiness from the tests. As all the cases (annotation
adding, removing and value changing) are tested before, it's safe to
test only prefixes.
Also, it renames the test name from `TestServiceAnnotations` to
`TestSameService` and introduces a better description in case of test
failure, describing that only prefixes are tested.
The current implementations for `pkg.util.k8sutil.SameService` considers
only service annotations change on the default annotations created by the
operator. Custom annotations are not compared and consequently not
applied after the first service creation.
This commit introduces a complete annotations comparison between the
current service created by the operator and the new one generated based on
the configs. Also, it adds tests on the above-mentioned function.
The [operator parameters][1] already support the
`custom_service_annotations` config.With this parameter is possible to
define custom annotations that will be used on the services created by the
operator. The `custom_service_annotations` as all the other
[operator parameters][1] are defined on the operator level and do not allow
customization on the cluster level. A cluster may require different service
annotations, as for example, set up different cloud load balancers
timeouts, different ingress annotations, and/or enable more customizable
environments.
This commit introduces a new parameter on the cluster level, called
`serviceAnnotations`, responsible for defining custom annotations just for
the services created by the operator to the specifically defined cluster.
It allows a mix of configuration between `custom_service_annotations` and
`serviceAnnotations` where the latest one will have priority. In order to
allow custom service annotations to be used on services without
LoadBalancers (as for example, service mesh services annotations) both
`custom_service_annotations` and `serviceAnnotations` are applied
independently of load-balancing configuration. For retro-compatibility
purposes, `custom_service_annotations` is still under
[Load balancer related options][2]. The two default annotations when using
LoadBalancer services, `external-dns.alpha.kubernetes.io/hostname` and
`service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout` are
still defined by the operator.
`service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout` can
be overridden by `custom_service_annotations` or `serviceAnnotations`,
allowing a more customizable environment.
`external-dns.alpha.kubernetes.io/hostname` can not be overridden once
there is no differentiation between custom service annotations for
replicas and masters.
It updates the documentation and creates the necessary unit and e2e
tests to the above-described feature too.
[1]: https://github.com/zalando/postgres-operator/blob/master/docs/reference/operator_parameters.md
[2]: https://github.com/zalando/postgres-operator/blob/master/docs/reference/operator_parameters.md#load-balancer-related-options
* Change error computation on JSON Unmarshall
The [Unmarshall function][1] on the encoding/JSON default library returns
different errors for different go versions. On Go 1.12, the version used
currently on the CI system it returns `json: cannot unmarshal number into
Go struct field PostgresSpec.teamId of type string`. On Go 1.13.5 it
returns `json: cannot unmarshal number into Go struct field
PostgresSpec.spec.teamId of type string`. The new version includes more
details of the whole structure being unmarshelled.
This commit introduces the same error but one level deeper on the JSON
structure. It creates consistency across different Go versions.
[1]: https://godoc.org/encoding/json#Unmarshal
* Create subtests on table test scenarios
The Run method of T allows defining subtests creating hierarchical tests.
It provides better visibility of tests in case of failure. More
details on https://golang.org/pkg/testing/.
This commit converts each test scenario on
pkg/apis/acid.zalan.do/v1/util_test.go to subtests, providing a better
visibility and the debugging environment when working with tests. The
following code snippet shows an error during test execution with
subtests:
```
--- FAIL: TestUnmarshalMaintenanceWindow (0.00s)
--- FAIL: TestUnmarshalMaintenanceWindow/expect_error_as_'From'_is_later_than_'To' (0.00s)
```
It included a `about` field on test scenarios describing the test
purpose and/or it expected output. When a description was provided with
comments it was moved to the about field.
* add validation for PG resources and volume size
* check resource requests also on UPDATE and SYNC + update docs
* if cluster was running don't error on sync
* add CRD manifests with validation
* update documentation
* patroni slots is not an array but a nested hash map
* make deps call tools
* cover validation in docs and export it in crds.go
* add toggle to disable creation of CRD validation and document it
* use templated service account also for CRD-configured helm deployment
* Added possibility to add custom annotations to LoadBalancer service.
* Added parameters for custom endpoint, access and secret key for logical backup.
* Modified dump.sh so it knows how to handle new features. Configurable S3 SSE
For optimization purposes operator was creating a cache map to remember
if service accounts and role binding was deployed to a namespace. This
could lead to a problem, when a namespace was deleted, since this
cache was not synchronized. For the sake of correctness remove the
cache, and check every time if required service account and rbac is
present. In the normal case this introduces an overhead of two API calls
per an event (one to get a service accounts, one to get a role binding),
which should not be a problem, unless proven otherwise.
* And attempt to build with modules and remove glide
* new tools.go file to get code-generator dependency + updated codegen + remove Glide files and update docs
* align config map, operator config, helm chart values and templates
* follow helm chart conventions also in CRD templates
* split up values files and add comments
* avoid yaml confusion in postgres manifests
* bump spilo version and use example for logical_backup_s3_bucket
* add ConfigTarget switch to values
This will set up a continuous wal streaming cluster, by adding the corresponding section in postgres manifest. Instead of having a full-fledged standby cluster as in Patroni, here we use only the wal path of the source cluster and stream from there.
Since, standby cluster is streaming from the master and does not require to create or use databases of it's own. Hence, it bypasses the creation of users or databases.
There is a separate sample manifest added to set up a standby-cluster.