Commit Graph

46 Commits

Author SHA1 Message Date
Nikola Jokic 7da2d7f96a
Fix acquire jobs after session refresh ghalistener (#3307) 2024-02-27 17:37:42 +01:00
Nikola Jokic f7b6ad901d
Add listener graceful termination period and background context after the message is received (#3187) 2024-01-25 15:45:07 +01:00
Nikola Jokic 728f05c844
Delete message session when `listener.Listen` returns (#3240) 2024-01-25 15:12:19 +01:00
Nikola Jokic c00465973e
Publish metrics in the new ghalistener (#3193) 2024-01-25 14:46:42 +01:00
Nikola Jokic a029b705cd
Fix proxy issue in new listener client (#3181) 2023-12-21 15:35:36 +01:00
Nikola Jokic f7eb88ce9c
Change minRunners behavior and fix the new listener min runners (#3139) 2023-12-13 19:39:21 +01:00
Nikola Jokic 0fd8eac305
Update user agent for new ghalistener (#3138) 2023-12-08 14:01:22 +01:00
Nikola Jokic b78cadd901
Refactoring listener app with configurable fallback (#3096) 2023-12-08 13:41:06 +01:00
Nikola Jokic 202a97ab12
Modify user agent format with subsystem and is proxy configured information (#3116) 2023-12-08 13:16:29 +01:00
Nikola Jokic 65fd04540c
Bump go version and all direct dependencies to newest for k8s compatibility (#2947) 2023-11-14 16:19:43 +01:00
Nikola Jokic 16815230bb
Metrics: set max and min runners during startup time (#3032) 2023-11-07 14:20:10 +01:00
Nikola Jokic b511953df7
Trim down metrics cardinality (#3003) 2023-10-20 12:20:30 +02:00
Nikola Jokic 2117fd1892
Configure listener pod with the secret instead of env (#2965)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-10-19 12:29:32 +02:00
Nikola Jokic 07bff8aa1e
Extend the user agent and fix the build version for the listener app (#2892) 2023-09-14 20:10:49 +02:00
Nikola Jokic a0a3916c80
Provide scale-set listener metrics (#2559)
Co-authored-by: Tingluo Huang <tingluohuang@github.com>
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-08-21 13:50:07 +02:00
Lukas Hauser 78271000c0
Logs - Add missing formatting (#2780) 2023-08-09 17:54:24 +09:00
Nikola Jokic 336e11a4e9
Fix scaling back to 0 after min runners were set to number > 0 (#2742) 2023-08-09 10:32:08 +02:00
Nikola Jokic 6fe8008640
Add configurable log format to values.yaml and propagate it to listener (#2686) 2023-07-05 21:06:42 +02:00
Tingluo Huang 08acb1b831
Get RunnerScaleSet based on both RunnerGroupId and Name. (#2413) 2023-03-15 11:10:09 -04:00
Francesco Renzi c569304271
Add support for self-signed CA certificates (#2268)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
Co-authored-by: Nikola Jokic <jokicnikola07@gmail.com>
Co-authored-by: Tingluo Huang <tingluohuang@github.com>
2023-03-09 17:23:32 +00:00
Francesco Renzi e289fe43d4
Apply proxy settings from environment in listener (#2366)
Co-authored-by: Tingluo Huang <tingluohuang@github.com>
2023-03-06 19:21:22 +00:00
Piotr Palka 91fddca3f7
Fix webhook server logging (#2320)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-03-06 14:20:46 -05:00
Dimitar 7d0918b6d5
Allow custom graceful termination and loadBalancerSourceRanges for the githubwebhook service (#2305)
Co-authored-by: Dimitar Hristov <dimitar.hristov@skyscanner.net>
2023-02-25 14:18:29 +09:00
Francesco Renzi df12e00c9e
Remove network requests from actions.NewClient (#2219)
Co-authored-by: Nikola Jokic <jokicnikola07@gmail.com>
2023-01-31 10:55:23 +00:00
Francesco Renzi 3327f620fb
Refactor actions.Client with options to help extensibility (#2193) 2023-01-23 11:50:14 +00:00
Tingluo Huang bb61bb1342
Include extra user-agent for runners created by actions-runner-controller. (#2177) 2023-01-18 07:38:59 +09:00
Tingluo Huang 622eaa34f8
Introduce new preview auto-scaling mode for ARC. (#2153)
Co-authored-by: Cory Miller <cory-miller@github.com>
Co-authored-by: Nikola Jokic <nikola-jokic@github.com>
Co-authored-by: Ava Stancu <AvaStancu@github.com>
Co-authored-by: Ferenc Hammerl <fhammerl@github.com>
Co-authored-by: Francesco Renzi <rentziass@github.com>
Co-authored-by: Bassem Dghaidi <Link-@github.com>
2023-01-17 12:06:20 -05:00
Tingluo Huang eaa451df32
Update controller package names to match the owning API group name (#2150)
* Update controller package names to match the owning API group name

* feedback.

Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-01-13 08:24:11 +09:00
Nikola Jokic aa6dab5a9a
Changes to folder structure to allow multigroups and changed go mod name (#2105)
* Changed folder structure to allow multi group registration

* included actions.github.com directory for resources and controllers

* updated go module to actions/actions-runner-controller

* publish arc packages under actions-runner-controller

* Update charts/actions-runner-controller/docs/UPGRADING.md

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-12-28 09:38:34 +09:00
Yusuke Kuoka 2407e4f6c6
fix: Add missing actions-metrics-server command (#2099)
Fixes #2090
2022-12-22 23:00:02 +09:00
Yusuke Kuoka 300e93c59d
Expose workflow job metrics via new actions-metrics-server (#2057)
* Add workflow job metrics to Github webhook server

* Fix handling of workflow_job.Conclusion

* Make the prometheus metrics exporter for the workflow jobs a dedicated application

* chart: Add support for deploying actions-metrics-server

* A few improvements to make it easy to cover in E2E

* chart: Add missing actionsmetrics.service.yaml

* chart: Do not modify actionsMetricsServer.replicaCount

* chart: Add documentation for actionsMetrics and actionsMetricsServer

Co-authored-by: Colin Heathman <cheathman@benchsci.com>
2022-12-10 08:24:28 +09:00
malachiobadeyi fbdfe0df8c
1770 update log format and add additional fields to webhook server logs (#1771)
* 1770 update log format and add runID and Id to worflow logs

update tests, change log format for controllers.HorizontalRunnerAutoscalerGitHubWebhook

use logging package

remove unused modules

add setup name to setuplog

add flag to change log format

change flag name to enableProdLogConfig

move log opts to logger package

remove empty else and reset timeEncoder

update flag description

use get function to handle nil

rename flag and update logger function

Update main.go

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

Update controllers/horizontal_runner_autoscaler_webhook.go

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

Update logging/logger.go

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>

copy log opt per each NewLogger call

revert to use autoscaler.log

update flag descript and remove unused imports

add logFormat to readme

 rename setupLog to logger

make fmt

* Fix E2E along the way

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-11-04 10:46:58 +09:00
Yusuke Kuoka 38644cf4e8
Remove redundant flags from webhook-based autoscaler (#1630)
* Remove redundant flags from webhook-based autoscaler

Ref #623

* fixup! Remove redundant flags from webhook-based autoscaler
2022-07-15 09:58:30 +09:00
Yusuke Kuoka e2c8163b8c
Make webhook-based scale race-free (#1477)
* Make webhook-based scale operation asynchronous

This prevents race condition in the webhook-based autoscaler when it received another webhook event while processing another webhook event and both ended up
scaling up the same horizontal runner autoscaler.

Ref #1321

* Fix typos

* Update rather than Patch HRA to avoid race among webhook-based autoscaler servers

* Batch capacity reservation updates for efficient use of apiserver

* Fix potential never-ending HRA update conflicts in batch update

* Extract batchScaler out of webhook-based autoscaler for testability

* Fix log levels and batch scaler hang on start

* Correlate webhook event with scale trigger amount in logs

* Fix log message
2022-06-27 18:31:48 +09:00
Yusuke Kuoka 4b557dc54c Add logging transport to log HTTP requests in log level -3
The log level -3 is the minimum log level that is supported today, smaller than debug(-1) and -2(used to log some HRA related logs).

This commit adds a logging HTTP transport to log HTTP requests and responses to that log level.

It implements http.RoundTripper so that it can log each HTTP request with useful metadata like `from_cache` and `ratelimit_remaining`.
The former is set to `true` only when the logged request's response was served from ARC's in-memory cache.
The latter is set to X-RateLimit-Remaining response header value if and only if the response was served by GitHub, not by ARC's cache.
2022-02-19 12:22:53 +00:00
Felipe Galindo Sanchez 9079c5d85f
fix: configure logger before trying to log (#1128)
Log about GitHub client not being initialized is not seen as logger is configured after adding the log
2022-02-19 20:56:58 +09:00
Yusuke Kuoka e22d981d58
githubwebhookserver: Tweak log levels of various messages (#1123)
Some of logs like `HRA keys indexed for HRA` were so excessive that it made testing and debugging the githubwebhookserver harder. This tries to fix that.
2022-02-17 09:15:26 +09:00
Felipe Galindo Sanchez d0d316252e
Option to consider runner group visibility on scale based on webhook (#1062)
This will work on GHES but GitHub Enterprise Cloud due to excessive GitHub API calls required.
More work is needed, like adding a cache layer to the GitHub client, to make it usable on GitHub Enterprise Cloud.

Fixes additional cases from https://github.com/actions-runner-controller/actions-runner-controller/pull/1012

If GitHub auth is provided in the webhooks controller then runner groups with custom visibility are supported. Otherwise, all runner groups will be assumed to be visible to all repositories

`getScaleUpTargetWithFunction()` will check if there is an HRA available with the following flow:

1. Search for **repository** HRAs - if so it ends here
2. Get available HRAs in k8s
3. Compute visible runner groups
  a. If GitHub auth is provided - get all the runner groups that are visible to the repository of the incoming webhook using GitHub API calls.  
  b. If GitHub auth is not provided - assume all runner groups are visible to all repositories
4. Search for **default organization** runners (a.k.a runners from organization's visible default runner group) with matching labels
5. Search for **default enterprise** runners (a.k.a runners from enterprise's visible default runner group) with matching labels
6. Search for **custom organization runner groups** with matching labels
7. Search for **custom enterprise runner groups** with matching labels

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2022-02-16 19:08:56 +09:00
Felipe Galindo Sanchez de1f48111a
feat: support routing GitHub API calls to custom proxy API (#1017)
GitHub currently has some limitations w.r.t permissions management on
runner groups as they all require org admin, however at our company
we're using runner groups to serve different internal teams (with
different permissions), thus we needed to deploy a custom proxy API with
our internal authentication to provide who has access to certain APIs
depending on the repository/runner group on a given org/enterprise

This change just allows to optionally send the GitHub API calls to an alternate custom
proxy URL instead of cloud github (github.com) or an enterprise URL with
basic authentication

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-23 09:24:10 +09:00
Felipe Galindo Sanchez 4ebec38208
Support runner groups with selected visibility in webhooks autoscaler (#1012)
The current implementation doesn't support yet runner groups with custom visibility (e.g selected repositories only). If there are multiple runner groups with selected visibility - not all runner groups may be a potential target to be scaled up. Thus this PR introduces support to allow having runner groups with selected visibility. This requires to query GitHub API to find what are the potential runner groups that are linked to a specific repository (whether using visibility all or selected).

This also improves resolving the `scaleTargetKey` that are used to match an HRA based on the inputs of the `RunnerSet`/`RunnerDeployment` spec to better support for runner groups.

This requires to configure github auth in the webhook server, to keep backwards compatibility if github auth is not provided to the webhook server, this will assume all runner groups have no selected visibility and it will target any available runner group as before
2021-12-19 18:29:44 +09:00
Bryan Peterson 961f01baed
allow providing webhook secret token via flag instead of environment variable (#876)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-12 17:00:32 +09:00
Yusuke Kuoka 8b90b0f0e3
Clean up import list (#645)
Resolves #644
2021-06-22 17:55:06 +09:00
Yusuke Kuoka 9e4dbf497c
feat: RunnerSet backed by StatefulSet (#629)
* feat: RunnerSet backed by StatefulSet

Unlike a runner deployment, a runner set can manage a set of stateful runners by combining a statefulset and an admission webhook that mutates statefulset-managed pods with required envvars and registration tokens.

Resolves #613
Ref #612

* Upgrade controller-runtime to 0.9.0

* Bump Go to 1.16.x following controller-runtime 0.9.0

* Upgrade kubebuilder to 2.3.2 for updated etcd and apiserver following local setup

* Fix startup failure due to missing LeaderElectionID

* Fix the issue that any pods become unable to start once actions-runner-controller got failed after the mutating webhook has been registered

* Allow force-updating statefulset

* Fix runner container missing work and certs-client volume mounts and DOCKER_HOST and DOCKER_TLS_VERIFY envvars when dockerdWithinRunner=false

* Fix runnerset-controller not applying statefulset.spec.template.spec changes when there were no changes in runnerset spec

* Enable running acceptance tests against arbitrary kind cluster

* RunnerSet supports non-ephemeral runners only today

* fix: docker-build from root Makefile on intel mac

* fix: arch check fixes for mac and ARM

* ci: aligning test data format and patching checks

* fix: removing namespace in test data

* chore: adding more ignores

* chore: removing leading space in shebang

* Re-add metrics to org hra testdata

* Bump cert-manager to v1.1.1 and fix deploy.sh

Co-authored-by: toast-gear <15716903+toast-gear@users.noreply.github.com>
Co-authored-by: Callum James Tait <callum.tait@photobox.com>
2021-06-22 17:10:09 +09:00
Yusuke Kuoka ae09e6ebb7
Make log level configurable (#541)
Resolves #425
2021-05-11 20:23:06 +09:00
Yusuke Kuoka 1b8a656051 Use --watch-namespace flag to restrict the namespace to watch
Ref https://github.com/summerwind/actions-runner-controller/issues/377#issuecomment-793172995
2021-03-09 09:46:21 +09:00
Yusuke Kuoka ab1c39de57
feat: HorizontalRunnerAutoscaler Webhook server (#282)
* feat: HorizontalRunnerAutoscaler Webhook server

This introduces a Webhook server that responds GitHub `check_run`, `pull_request`, and `push` events by scaling up matched HorizontalRunnerAutoscaler by 1 replica. This allows you to immediately add "resource slack" for future GitHub Actions job runs, without waiting next sync period to add insufficient runners.

This feature is highly inspired by https://github.com/philips-labs/terraform-aws-github-runner. terraform-aws-github-runner can manage one set of runners per deployment, where actions-runner-controller with this feature can manage as many sets of runners as you declare with HorizontalRunnerAutoscaler and RunnerDeployment pairs.

On each GitHub event received, the webhook server queries repository-wide and organizational runners from the cluster and searches for the single target to scale up. The webhook server tries to match HorizontalRunnerAutoscaler.Spec.ScaleUpTriggers[].GitHubEvent.[CheckRun|Push|PullRequest] against the event and if it finds only one HRA, it is the scale target. If none or two or more targets are found for repository-wide runners, it does the same on organizational runners.

Changes:

* Fix integration test
* Update manifests
* chart: Add support for github webhook server
* dockerfile: Include github-webhook-server binary
* Do not import unversioned go-github
* Update README
2021-02-07 17:37:27 +09:00