fix(logical-backup): wait for PG connectivity before running backup (#3069)
* fix(logical-backup): wait for PG connectivity before running backup The backup script connects to the target PostgreSQL pod immediately after resolving its IP via the Kubernetes API. When NetworkPolicy is enforced via iptables, a newly-created pod's IP may not yet be present in the destination node's ingress allow lists, causing cross-node connections to be rejected until the next policy sync. This adds a pg_isready retry loop before the dump starts, with configurable retries and delay via LOGICAL_BACKUP_CONNECT_RETRIES (default: 10) and LOGICAL_BACKUP_CONNECT_RETRY_DELAY (default: 2s). Signed-off-by: Zadkiel AHARONIAN <zaharonian@ccl-consulting.fr> * docs: document LOGICAL_BACKUP_CONNECT_RETRIES and RETRY_DELAY env vars Document the new environment variables that control the pg_isready retry loop added in the previous commit. These are passed via the existing logical_backup_cronjob_environment_secret mechanism. Signed-off-by: Zadkiel AHARONIAN <zaharonian@ccl-consulting.fr> --------- Signed-off-by: Zadkiel AHARONIAN <zaharonian@ccl-consulting.fr> Co-authored-by: Ida Novindasari <idanovinda@gmail.com>
This commit is contained in:
parent
085a1a91e6
commit
0ba2147d73
|
|
@ -900,6 +900,19 @@ grouped under the `logical_backup` key.
|
|||
* **logical_backup_cronjob_environment_secret**
|
||||
Reference to a Kubernetes secret, which keys will be added as environment variables to the cronjob. Default: ""
|
||||
|
||||
The following environment variables can be passed to the logical backup
|
||||
cronjob via `logical_backup_cronjob_environment_secret` to control
|
||||
connectivity checks before the backup starts:
|
||||
|
||||
* **LOGICAL_BACKUP_CONNECT_RETRIES**
|
||||
Number of times to retry connecting to the target PostgreSQL pod before
|
||||
giving up. This is useful when NetworkPolicy enforcement introduces a
|
||||
short delay before a newly-created pod's IP is allowed through ingress
|
||||
rules on the destination node. Default: "10"
|
||||
|
||||
* **LOGICAL_BACKUP_CONNECT_RETRY_DELAY**
|
||||
Delay in seconds between connectivity retries. Default: "2"
|
||||
|
||||
## Debugging the operator
|
||||
|
||||
Options to aid debugging of the operator itself. Grouped under the `debug` key.
|
||||
|
|
|
|||
|
|
@ -183,6 +183,25 @@ function get_master_pod {
|
|||
get_pods "labelSelector=${CLUSTER_NAME_LABEL}%3D${SCOPE},spilo-role%3Dmaster" | tee | head -n 1
|
||||
}
|
||||
|
||||
# Wait for TCP connectivity to the target PostgreSQL pod.
|
||||
# When NetworkPolicy is enforced via iptables, a newly-created pod's IP may not
|
||||
# yet be present in the destination node's ingress allow lists, causing
|
||||
# cross-node connections to be rejected until the next policy sync.
|
||||
function wait_for_pg {
|
||||
local retries=${LOGICAL_BACKUP_CONNECT_RETRIES:-10}
|
||||
local delay=${LOGICAL_BACKUP_CONNECT_RETRY_DELAY:-2}
|
||||
local i
|
||||
for (( i=1; i<=retries; i++ )); do
|
||||
if "$PG_BIN"/pg_isready -h "$PGHOST" -p "${PGPORT:-5432}" -q 2>/dev/null; then
|
||||
return 0
|
||||
fi
|
||||
echo "waiting for $PGHOST:${PGPORT:-5432} to become reachable (attempt $i/$retries)..."
|
||||
sleep "$delay"
|
||||
done
|
||||
echo "ERROR: $PGHOST:${PGPORT:-5432} not reachable after $((retries * delay))s"
|
||||
return 1
|
||||
}
|
||||
|
||||
CURRENT_NODENAME=$(get_current_pod | jq .items[].spec.nodeName --raw-output)
|
||||
export CURRENT_NODENAME
|
||||
|
||||
|
|
@ -197,6 +216,8 @@ for search in "${search_strategy[@]}"; do
|
|||
|
||||
done
|
||||
|
||||
wait_for_pg
|
||||
|
||||
set -x
|
||||
if [ "$LOGICAL_BACKUP_PROVIDER" == "az" ]; then
|
||||
dump | compress > /tmp/azure-backup.sql.gz
|
||||
|
|
|
|||
Loading…
Reference in New Issue