As part of #282, I have introduced some caching mechanism to avoid excessive GitHub API calls due to the autoscaling calculation involving GitHub API calls is executed on each Webhook event.
Apparently, it was saving the wrong value in the cache- The value was one after applying `HRA.Spec.{Max,Min}Replicas` so manual changes to {Max,Min}Replicas doesn't affect RunnerDeployment.Spec.Replicas until the cache expires. This isn't what I had wanted.
This patch fixes that, by changing the value being cached to one before applying {Min,Max}Replicas.
Additionally, I've also updated logging so that you observe which number was fetched from cache, and what number was suggested by either TotalNumberOfQueuedAndInProgressWorkflowRuns or PercentageRunnersBusy, and what was the final number used as the desired-replicas(after applying {Min,Max}Replicas).
Follow-up for #282
Since #392, the runner controller could have taken unexpectedly long time until it finally notices that the runner has been registered to GitHub. This patch fixes the issue, so that the controller will notice the successful registration in approximately 1 minute(hard-coded).
More concretely, let's say you had configured a long sync-period of like 10m, the runner controller could have taken approx 10m to notice the successful registration. The original expectation was 1m, because it was intended to recheck every 1m as implemented in #392. It wasn't working as such due to my misunderstanding in how requeueing work.
We occasionally encountered those errors while the underlying RunnerReplicaSet is being recreated/replaced on RunnerDeployment.Spec.Template update. It turned out to be due to that the RunnerDeployment controller was waiting for the runner pod becomes `Running`, intead of the new replacement runner to have registered to GitHub. This fixes that, by trying to Runner.Status.Phase to `Running` only after the runner in the runner pod appears to be registered.
A side-effect of this change is that runner controller would call more "ListRunners" GitHub Actions API. I've reviewed and improved the runner controller code and Runner CRD to make make the number of calls minimum. In most cases, ListRunners should be called only twice for each runner creation.
This allows you to trigger autoscaling depending on check_run names(i.e. actions job names). If you are willing to differentiate scale amount only for a specific job, or want to scale only on a specific job, try this.
PercentageRunnerBusy seems to have regressed since #355 due to that RunnerDeployment.Spec.Selector is empty by default and the HRA controller was using that empty selector to query runners, which somehow returned 0 runners. This fixes that by using the newly added automatic `runner-deployment-name` label for the default runner label and the selector, which avoids querying with empty selector.
Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-795200205
* so far, runner group parameter is only set for runners with org scope
* now set group for enterprise runners as well
* removed null check for org scope as either org or enterprise will be set
We sometimes see that integration test fails due to runner replicas not meeting the expected number in a timely manner. After investigating a bit, this turned out to be due to that HRA updates on webhook-based autoscaler and HRA controller are conflicting. This changes the controllers to use Patch instead of Update to make conflicts less likely to happen.
I have also updated the hra controller to use Patch when updating RunnerDeployment, too.
Overall, these changes should make the webhook-based autoscaling more reliable due to less conflicts.
* add replicaCount
* Add authSecret.existingSecret
* set image.tag null by default
* implement ingress for githubwebhook server
* fix deprecated and secretName template
* backward compat .authSecret.enabled
* existingSecret for github webhook secret
* use secretName template
* set default secret names
* do not use app version based image tag
* create and name variable for secrets
Similar to #348 for #346, but for HRA.Spec.CapacityReservations usually modified by the webhook-based autoscaler controller.
This patch tries to fix that by improving the webhook-based autoscaler controller to omit expired reservations on updating HRA spec.
`kubectl get horizontalrunnerautoscalers.actions.summerwind.dev` shows HRA.status.desiredReplicas as the DESIRED count. However the value had been not taking capacityReservations into account, which resulted in showing incorrect count when you used webhook-based autoscaler, or capacityReservations API directly. This fixes that.
The controller had been writing confusing messages like the below on missing scale target:
```
Found too many scale targets: It must be exactly one to avoid ambiguity. Either set WatchNamespace for the webhook-based autoscaler to let it only find HRAs in the namespace, or update Repository or Organization fields in your RunnerDeployment resources to fix the ambiguity.{"scaleTargets": ""}
```
This fixes that, while improving many kinds of messages written while reconcilation, so that the error message is more actionable.
* if a new runner pod was just scheduled to start up right before a
registration expired, it will not get a new registration token and go in
an infinite update loop (until #341) kicks in
* if registzration tokens got updated a little bit before they actually
expired, just starting up pods will way more likely get a working token
We occasionally see logs like the below:
```
2021-02-24T02:48:26.769ZERRORFailed to update runner status{"runnerreplicaset": "testns-244ol/example-runnerdeploy-j5wzf", "error": "Operation cannot be fulfilled on runnerreplicasets.actions.summerwind.dev \"example-runnerdeploy-j5wzf\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
/home/runner/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
github.com/summerwind/actions-runner-controller/controllers.(*RunnerReplicaSetReconciler).Reconcile
/home/runner/work/actions-runner-controller/actions-runner-controller/controllers/runnerreplicaset_controller.go:207
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:88
2021-02-24T02:48:26.769ZERRORcontroller-runtime.controllerReconciler error{"controller": "testns-244olrunnerreplicaset", "request": "testns-244ol/example-runnerdeploy-j5wzf", "error": "Operation cannot be fulfilled on runnerreplicasets.actions.summerwind.dev \"example-runnerdeploy-j5wzf\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
/home/runner/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:88
```
which can be compacted into one-liner, without the useless stack trace, without double-logging the same error from the logger and the controller.