Commit Graph

31 Commits

Author SHA1 Message Date
Felipe Galindo Sanchez de1f48111a
feat: support routing GitHub API calls to custom proxy API (#1017)
GitHub currently has some limitations w.r.t permissions management on
runner groups as they all require org admin, however at our company
we're using runner groups to serve different internal teams (with
different permissions), thus we needed to deploy a custom proxy API with
our internal authentication to provide who has access to certain APIs
depending on the repository/runner group on a given org/enterprise

This change just allows to optionally send the GitHub API calls to an alternate custom
proxy URL instead of cloud github (github.com) or an enterprise URL with
basic authentication

Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2021-12-23 09:24:10 +09:00
Felipe Galindo Sanchez 4ebec38208
Support runner groups with selected visibility in webhooks autoscaler (#1012)
The current implementation doesn't support yet runner groups with custom visibility (e.g selected repositories only). If there are multiple runner groups with selected visibility - not all runner groups may be a potential target to be scaled up. Thus this PR introduces support to allow having runner groups with selected visibility. This requires to query GitHub API to find what are the potential runner groups that are linked to a specific repository (whether using visibility all or selected).

This also improves resolving the `scaleTargetKey` that are used to match an HRA based on the inputs of the `RunnerSet`/`RunnerDeployment` spec to better support for runner groups.

This requires to configure github auth in the webhook server, to keep backwards compatibility if github auth is not provided to the webhook server, this will assume all runner groups have no selected visibility and it will target any available runner group as before
2021-12-19 18:29:44 +09:00
Gabriele Mambrini 7261d927fb
fix: report busy status for offline workers (#1009)
ref #911 

Fix #993 cannot work because the runner busy status is not reported when offline
2021-12-15 08:57:13 +09:00
Patrick Ellis ea2dbc2807
Update go-github from v37 -> v39 (#925) 2021-12-11 21:43:40 +09:00
Yusuke Kuoka f858e2e432
Add POC of GitHub Webhook Delivery Forwarder (#682)
* Add POC of GitHub Webhook Delivery Forwarder

* multi-forwarder and ctrl-c existing and fix for non-woring http post

* Rename source files

* Extract signal handling into a dedicated source file

* Faster ctrl-c handling

* Enable automatic creation of repo hook on startup

* Add support for forwarding org hook deliveries

* Set hook secret on hook creation via envvar (HOOK_SECRET)

* Fix org hook support

* Fix HOOK_SECRET for consistency

* Refactor to prepare for custom log position provider

* Refactor to extract inmemory log position provider

* Add configmap-based log position provider

* Rename githubwebhookdeliveryforwarder to hookdeliveryforwarder

* Refactor to rename LogPositionProvider to Checkpointer and extract ConfigMap checkpointer into a dedicated pkg

* Refactor to extract logger initialization

* Add hookdeliveryforwarder README and bump go-github to unreleased ver
2021-07-14 10:18:55 +09:00
Yusuke Kuoka f19e7ea8a8
chore: Upgrade go-github to v36 (#681) 2021-07-04 17:43:52 +09:00
Yusuke Kuoka 8b90b0f0e3
Clean up import list (#645)
Resolves #644
2021-06-22 17:55:06 +09:00
Yusuke Kuoka 3f23501b8e
Reduce "No runner matching the specified labels was found" errors while runner replacement (#392)
We occasionally encountered those errors while the underlying RunnerReplicaSet is being recreated/replaced on RunnerDeployment.Spec.Template update. It turned out to be due to that the RunnerDeployment controller was waiting for the runner pod becomes `Running`, intead of the new replacement runner to have registered to GitHub. This fixes that, by trying to Runner.Status.Phase to `Running` only after the runner in the runner pod appears to be registered.

A side-effect of this change is that runner controller would call more "ListRunners" GitHub Actions API. I've reviewed and improved the runner controller code and Runner CRD to make make the number of calls minimum. In most cases, ListRunners should be called only twice for each runner creation.
2021-03-16 10:52:30 +09:00
Yusuke Kuoka 81016154c0
GITHUB_APP_PRIVATE_KEY can now be the content of the key (#383)
Resolves #382
2021-03-10 09:37:15 +09:00
Yusuke Kuoka 9ae3551744
Remove unnecessary GitHub API calls (#363)
The controller had the 2 extra and redundant calls to List Workflow Runs API.

Ref #362
2021-03-02 10:55:30 +09:00
Johannes Nicolai 4d4137aa28
Avoid zombie runners that missed token expiration by a bit (#345)
* if a new runner pod was just scheduled to start up right before a 
registration expired, it will not get a new registration token and go in 
an infinite update loop (until #341) kicks in
* if registzration tokens got updated a little bit before they actually 
expired, just starting up pods will way more likely get a working token
2021-02-25 09:07:49 +09:00
Johannes Nicolai 2d7fbbfb68
Handle offline runners gracefully (#341)
* if a runner pod starts up with an invalid token, it will go in an 
infinite retry loop, appearing as RUNNING from the outside
* normally, this error situation is detected because no corresponding 
runner objects exists in GitHub and the pod will get removed after 
registration timeout
* if the GitHub runner object already existed before - e.g. because a 
finalizer was not properly run as part of a partial Kubernetes crash, 
the runner will always stay in a running mode, even updating the 
registration token will not kill the problematic pod
* introducing RunnerOffline exception that can be handled in runner 
controller and replicaset controller
* as runners are offline when a pod is completed and marked for restart, 
only do additional restart checks if no restart was already decided, 
making code a bit cleaner and saving GitHub API calls after each job 
completion
2021-02-22 10:08:04 +09:00
Yusuke Kuoka eb2eaf8130
Fix TotalNumberOfQueuedAndInProgressWorkflowRuns to work with a lot of remaining `completed` jobs (#316)
I have heard from some user that they have hundred thousands of `status=completed` workflow runs in their repository which effectively blocked TotalNumberOfQueuedAndInProgressWorkflowRuns from working because of GitHub API rate limit due to excessive paginated requests.

This fixes that by separating list-workflow-runs calls to two - one for `queued` and one for `in_progress`, which can make the minimum API call from 1 to 2, but allows it to work regardless of number of remaining `completed` workflow runs.
2021-02-16 18:55:55 +09:00
Yusuke Kuoka 35d047db01
Fix enterprise runners misusing cached token (#314)
Follow-up for #290
2021-02-16 12:56:52 +09:00
Hidetake Iwata 4f3f2fb60d
Add metrics for GitHub API rate limit (#312) 2021-02-16 09:58:09 +09:00
Johannes Nicolai bc8bc70f69
Fix rate limit and runner registration logic (#309)
* errors.Is compares all members of a struct to return true which never 
happened
* switched to type check instead of exact value check
* notRegistered was using double negation in if statement which lead to 
unregistering runners after the registration timeout
2021-02-15 09:36:49 +09:00
Yusuke Kuoka bbb036e732
feat: Prevent blocking on transient runner registration failure (#297)
This enhances the controller to recreate the runner pod if the corresponding runner has failed to register itself to GitHub within 10 minutes(currently hard-coded).

It should alleviate #288 in case the root cause is some kind of transient failures(network unreliability, GitHub down, temporarly compute resource shortage, etc).

Formerly you had to manually detect and delete such pods or even force-delete corresponding runners to unblock the controller.

Since this enhancement, the controller does the pod deletion automatically after 10 minutes after pod creation, which result in the controller create another pod that might work.

Ref #288
2021-02-09 10:17:52 +09:00
Yusuke Kuoka 9301409aec
fix: Paginate ListRepositoryWorkflowRuns (#295)
When we used `QueuedAndInProgressWorkflowRuns`-based autoscaling, it only fetched and considered only the first 30 workflow runs at the reconcilation time. This may have resulted in unreliable scaling behaviour, like scale-in/out not happening when it was expected.
2021-02-09 10:13:53 +09:00
Jesse Haka 28e80a2d28
Add support for enterprise runners (#290)
* Add support for enterprise runners

* update docs
2021-02-05 09:31:06 +09:00
Reinier Timmer 8d6f77e07c
Remove beta GitHub client implementations (#228) 2020-12-10 09:08:51 +09:00
Erik Nobel a2b335ad6a
Github pkg: Bump github package to version 33 (#222) 2020-12-06 10:01:47 +09:00
ZacharyBenamram df99f394b4
Remove 10 minute buffer to token expiration (#214)
Co-authored-by: Zachary Benamram <zacharybenamram@blend.com>
2020-11-30 09:03:27 +09:00
Yusuke Kuoka 4eb45d3c7f Fix build error 2020-11-10 17:09:16 +09:00
Juho Saarinen 1c30bdf35b
Add GHE URL to transport (#152)
Fixes #149
2020-11-10 17:05:09 +09:00
Yusuke Kuoka 3f335ca628
Fix panic on startup when misconfigured (#154)
Fixes #153
2020-11-10 17:03:33 +09:00
Juho Saarinen 40c5050978
Added support for other than public GitHub URL (#146)
Refactoring a bit
2020-10-28 22:15:53 +09:00
Helder Moreira 7a2fa7fbce
runner-controller: do not delete runner if it is busy (#103)
Currently, after refreshing the token, the controller re-creates the runner with the new token. This results in jobs being interrupted. This PR makes sure the pod is not restarted if it is busy.

Closes #74
2020-10-05 09:06:37 +09:00
KUOKA Yusuke 5bb2694349
feat: Repository-wide RunnerDeployment Autoscaling (#57)
* feat: Repository-wide RunnerDeployment Autoscaling

This adds `maxReplicas` and `minReplicas` to the RunnerDeploymentSpec. If and only if both fields are set, the controller computes and sets desired `replicas` automatically depending on the demand.

The number of demanded runner replicas is computed by `queued workflow runs + in_progress workflow runs` for the repository. The support for organizational runners is not included.

Ref https://github.com/summerwind/actions-runner-controller/issues/10
2020-06-27 17:26:46 +09:00
Reinier Timmer 9f57f52e36 organization and repository are now exclusive 2020-04-28 11:14:31 +02:00
Reinier Timmer fb35dd4131 support for organization runners 2020-04-28 11:14:31 +02:00
Moto Ishizawa 5f608058cd Add github package 2020-04-13 22:27:05 +09:00