Add GKE firewall issues to common errors (#1010)

A lot of people have issues with private GKE clusters and it seems they are all solved by setting up a firewall policy. I think it would be relevant to include this in a troubleshooting-section since so many people are searching around issues for it. I myself just spent most of my day trying to figure it out.

Issues where this is the solution:
* #293
* #335
* #68
This commit is contained in:
Jacob Lauritzen 2021-12-17 01:10:15 +01:00 committed by GitHub
parent c64000e11c
commit 83c8a9809e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 29 additions and 1 deletions

View File

@ -34,6 +34,7 @@ ToC:
- [Software Installed in the Runner Image](#software-installed-in-the-runner-image)
- [Using without cert-manager](#using-without-cert-manager)
- [Common Errors](#common-errors)
- [Troubleshooting](#troubleshooting)
- [Contributing](#contributing)
## Motivation
@ -1274,7 +1275,7 @@ $ helm --upgrade install actions-runner-controller/actions-runner-controller \
### Common Errors
#### invalid header field value
#### Invalid header field value
```json
2020-11-12T22:17:30.693Z ERROR controller-runtime.controller Reconciler error
@ -1334,6 +1335,33 @@ spec:
value: "2" # Remember! env var values must be strings.
```
#### Deployment fails on GKE due to webhooks
Due to GKEs firewall settings you may run into the following errors when trying to deploy runners on a private GKE cluster:
```
Internal error occurred: failed calling webhook "mutate.runner.actions.summerwind.dev": Post https://webhook-service.actions-runner-system.svc:443/mutate-actions-summerwind-dev-v1alpha1-runner?timeout=10s: context deadline exceeded
```
**Solution**<br />
To fix this, you need to set up a firewall rule to allow the master node to connect to the webhook port.
The exact way to do this may wary, but the following script should point you in the right direction:
```
# 1) Retrieve the network tag automatically given to the worker nodes
# NOTE: this only works if you have only one cluster in your GCP project. You will have to manually inspect the result of this command to find the tag for the cluster you want to target
WORKER_NODES_TAG=$(gcloud compute instances list --format='text(tags.items[0])' --filter='metadata.kubelet-config:*' | grep tags | awk '{print $2}' | sort | uniq)
# 2) Take note of the VPC network in which you deployed your cluster
# NOTE this only works if you have only one network in which you deploy your clusters
NETWORK=$(gcloud compute instances list --format='text(networkInterfaces[0].network)' --filter='metadata.kubelet-config:*' | grep networks | awk -F'/' '{print $NF}' | sort | uniq)
# 3) Get the master source ip block
SOURCE=$(gcloud container clusters describe <cluster-name> --region <region> | grep masterIpv4CidrBlock| cut -d ':' -f 2 | tr -d ' ')
gcloud compute firewall-rules create k8s-cert-manager --source-ranges $SOURCE --target-tags $WORKER_NODES_TAG --allow TCP:9443 --network $NETWORK
```
# Contributing
For more details on contributing to the project (including requirements) please check out [Getting Started with Contributing](CONTRIBUTING.md).