actions-runner-controller

Commit Graph

Author	SHA1	Message	Date
Yusuke Kuoka	5f2b5327f7	integration: Reduce error logs to ease debugging	2022-03-03 18:47:54 +09:00
Tingluo Huang	516695b275	Set UserAgent to 'actions-runner-controller' for all Http Client. (#1140 ) I can't find any requests made by user agent `actions-runner-controller` in GitHub.com's telemetry in the past 7 days. Turns out we only set user agent `actions-runner-controller` if we are configured to use BasicAuth which is not the case for most customers I think. I update the code a little bit to make sure it always set `actions-runner-controller` as UserAgent for the GitHub HttpClient in ARC. A further step would be somehow baking the ARC release version into the UserAgent as well.	2022-02-28 09:17:58 +09:00
Yusuke Kuoka	4b557dc54c	Add logging transport to log HTTP requests in log level -3 The log level -3 is the minimum log level that is supported today, smaller than debug(-1) and -2(used to log some HRA related logs). This commit adds a logging HTTP transport to log HTTP requests and responses to that log level. It implements http.RoundTripper so that it can log each HTTP request with useful metadata like `from_cache` and `ratelimit_remaining`. The former is set to `true` only when the logged request's response was served from ARC's in-memory cache. The latter is set to X-RateLimit-Remaining response header value if and only if the response was served by GitHub, not by ARC's cache.	2022-02-19 12:22:53 +00:00
Yusuke Kuoka	4c53e3aa75	Add GitHub API cache to avoid rate limit This will cache any GitHub API responses with correct Cache-Control header. `gregjones/httpcache` has been chosen as a library to implement this feature, as it is as recommended in `go-github`'s documentation: https://github.com/google/go-github#conditional-requests `gregjones/httpcache` supports a number of cache backends like `diskcache`, `s3cache`, and so on: https://github.com/gregjones/httpcache#cache-backends We stick to the built-in in-memory cache as a starter. Probably this will never becomes an issue as long as various HTTP responses for all the GitHub API calls that ARC makes, list-runners, list-workflow-jobs, list-runner-groups, etc., doesn't overflow the in-memory cache. `httpcache` has an known unfixed issue that it doesn't update cache on chunked responses. But we assume that the APIs that we call doesn't use chunked responses. See #1503 for more information on that. Ref #920	2022-02-19 12:22:53 +00:00
Felipe Galindo Sanchez	d0d316252e	Option to consider runner group visibility on scale based on webhook (#1062 ) This will work on GHES but GitHub Enterprise Cloud due to excessive GitHub API calls required. More work is needed, like adding a cache layer to the GitHub client, to make it usable on GitHub Enterprise Cloud. Fixes additional cases from https://github.com/actions-runner-controller/actions-runner-controller/pull/1012 If GitHub auth is provided in the webhooks controller then runner groups with custom visibility are supported. Otherwise, all runner groups will be assumed to be visible to all repositories `getScaleUpTargetWithFunction()` will check if there is an HRA available with the following flow: 1. Search for repository HRAs - if so it ends here 2. Get available HRAs in k8s 3. Compute visible runner groups a. If GitHub auth is provided - get all the runner groups that are visible to the repository of the incoming webhook using GitHub API calls. b. If GitHub auth is not provided - assume all runner groups are visible to all repositories 4. Search for default organization runners (a.k.a runners from organization's visible default runner group) with matching labels 5. Search for default enterprise runners (a.k.a runners from enterprise's visible default runner group) with matching labels 6. Search for custom organization runner groups with matching labels 7. Search for custom enterprise runner groups with matching labels Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2022-02-16 19:08:56 +09:00
Felipe Galindo Sanchez	de1f48111a	feat: support routing GitHub API calls to custom proxy API (#1017 ) GitHub currently has some limitations w.r.t permissions management on runner groups as they all require org admin, however at our company we're using runner groups to serve different internal teams (with different permissions), thus we needed to deploy a custom proxy API with our internal authentication to provide who has access to certain APIs depending on the repository/runner group on a given org/enterprise This change just allows to optionally send the GitHub API calls to an alternate custom proxy URL instead of cloud github (github.com) or an enterprise URL with basic authentication Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>	2021-12-23 09:24:10 +09:00
Felipe Galindo Sanchez	4ebec38208	Support runner groups with selected visibility in webhooks autoscaler (#1012 ) The current implementation doesn't support yet runner groups with custom visibility (e.g selected repositories only). If there are multiple runner groups with selected visibility - not all runner groups may be a potential target to be scaled up. Thus this PR introduces support to allow having runner groups with selected visibility. This requires to query GitHub API to find what are the potential runner groups that are linked to a specific repository (whether using visibility all or selected). This also improves resolving the `scaleTargetKey` that are used to match an HRA based on the inputs of the `RunnerSet`/`RunnerDeployment` spec to better support for runner groups. This requires to configure github auth in the webhook server, to keep backwards compatibility if github auth is not provided to the webhook server, this will assume all runner groups have no selected visibility and it will target any available runner group as before	2021-12-19 18:29:44 +09:00
Gabriele Mambrini	7261d927fb	fix: report busy status for offline workers (#1009 ) ref #911 Fix #993 cannot work because the runner busy status is not reported when offline	2021-12-15 08:57:13 +09:00
Patrick Ellis	ea2dbc2807	Update go-github from v37 -> v39 (#925 )	2021-12-11 21:43:40 +09:00
Yusuke Kuoka	f858e2e432	Add POC of GitHub Webhook Delivery Forwarder (#682 ) * Add POC of GitHub Webhook Delivery Forwarder * multi-forwarder and ctrl-c existing and fix for non-woring http post * Rename source files * Extract signal handling into a dedicated source file * Faster ctrl-c handling * Enable automatic creation of repo hook on startup * Add support for forwarding org hook deliveries * Set hook secret on hook creation via envvar (HOOK_SECRET) * Fix org hook support * Fix HOOK_SECRET for consistency * Refactor to prepare for custom log position provider * Refactor to extract inmemory log position provider * Add configmap-based log position provider * Rename githubwebhookdeliveryforwarder to hookdeliveryforwarder * Refactor to rename LogPositionProvider to Checkpointer and extract ConfigMap checkpointer into a dedicated pkg * Refactor to extract logger initialization * Add hookdeliveryforwarder README and bump go-github to unreleased ver	2021-07-14 10:18:55 +09:00
Yusuke Kuoka	f19e7ea8a8	chore: Upgrade go-github to v36 (#681 )	2021-07-04 17:43:52 +09:00
Yusuke Kuoka	8b90b0f0e3	Clean up import list (#645 ) Resolves #644	2021-06-22 17:55:06 +09:00
Yusuke Kuoka	e7020c7c0f	Fix scale-from-zero to retain the reg-only runner until other pods come up (#523 ) Fixes #516	2021-05-05 12:13:51 +09:00
Yusuke Kuoka	3f23501b8e	Reduce "No runner matching the specified labels was found" errors while runner replacement (#392 ) We occasionally encountered those errors while the underlying RunnerReplicaSet is being recreated/replaced on RunnerDeployment.Spec.Template update. It turned out to be due to that the RunnerDeployment controller was waiting for the runner pod becomes `Running`, intead of the new replacement runner to have registered to GitHub. This fixes that, by trying to Runner.Status.Phase to `Running` only after the runner in the runner pod appears to be registered. A side-effect of this change is that runner controller would call more "ListRunners" GitHub Actions API. I've reviewed and improved the runner controller code and Runner CRD to make make the number of calls minimum. In most cases, ListRunners should be called only twice for each runner creation.	2021-03-16 10:52:30 +09:00
Yusuke Kuoka	81016154c0	GITHUB_APP_PRIVATE_KEY can now be the content of the key (#383 ) Resolves #382	2021-03-10 09:37:15 +09:00
Yusuke Kuoka	9ae3551744	Remove unnecessary GitHub API calls (#363 ) The controller had the 2 extra and redundant calls to List Workflow Runs API. Ref #362	2021-03-02 10:55:30 +09:00
Yusuke Kuoka	9da123ae5e	Fix integration test flakiness (#351 ) Ref https://github.com/summerwind/actions-runner-controller/pull/345#issuecomment-785015406	2021-02-25 09:30:32 +09:00
Johannes Nicolai	4d4137aa28	Avoid zombie runners that missed token expiration by a bit (#345 ) * if a new runner pod was just scheduled to start up right before a registration expired, it will not get a new registration token and go in an infinite update loop (until #341) kicks in * if registzration tokens got updated a little bit before they actually expired, just starting up pods will way more likely get a working token	2021-02-25 09:07:49 +09:00
Johannes Nicolai	2d7fbbfb68	Handle offline runners gracefully (#341 ) * if a runner pod starts up with an invalid token, it will go in an infinite retry loop, appearing as RUNNING from the outside * normally, this error situation is detected because no corresponding runner objects exists in GitHub and the pod will get removed after registration timeout * if the GitHub runner object already existed before - e.g. because a finalizer was not properly run as part of a partial Kubernetes crash, the runner will always stay in a running mode, even updating the registration token will not kill the problematic pod * introducing RunnerOffline exception that can be handled in runner controller and replicaset controller * as runners are offline when a pod is completed and marked for restart, only do additional restart checks if no restart was already decided, making code a bit cleaner and saving GitHub API calls after each job completion	2021-02-22 10:08:04 +09:00
Yusuke Kuoka	eb2eaf8130	Fix TotalNumberOfQueuedAndInProgressWorkflowRuns to work with a lot of remaining `completed` jobs (#316 ) I have heard from some user that they have hundred thousands of `status=completed` workflow runs in their repository which effectively blocked TotalNumberOfQueuedAndInProgressWorkflowRuns from working because of GitHub API rate limit due to excessive paginated requests. This fixes that by separating list-workflow-runs calls to two - one for `queued` and one for `in_progress`, which can make the minimum API call from 1 to 2, but allows it to work regardless of number of remaining `completed` workflow runs.	2021-02-16 18:55:55 +09:00
Yusuke Kuoka	35d047db01	Fix enterprise runners misusing cached token (#314 ) Follow-up for #290	2021-02-16 12:56:52 +09:00
Hidetake Iwata	4f3f2fb60d	Add metrics for GitHub API rate limit (#312 )	2021-02-16 09:58:09 +09:00
Johannes Nicolai	bc8bc70f69	Fix rate limit and runner registration logic (#309 ) * errors.Is compares all members of a struct to return true which never happened * switched to type check instead of exact value check * notRegistered was using double negation in if statement which lead to unregistering runners after the registration timeout	2021-02-15 09:36:49 +09:00
Yusuke Kuoka	bbb036e732	feat: Prevent blocking on transient runner registration failure (#297 ) This enhances the controller to recreate the runner pod if the corresponding runner has failed to register itself to GitHub within 10 minutes(currently hard-coded). It should alleviate #288 in case the root cause is some kind of transient failures(network unreliability, GitHub down, temporarly compute resource shortage, etc). Formerly you had to manually detect and delete such pods or even force-delete corresponding runners to unblock the controller. Since this enhancement, the controller does the pod deletion automatically after 10 minutes after pod creation, which result in the controller create another pod that might work. Ref #288	2021-02-09 10:17:52 +09:00
Yusuke Kuoka	9301409aec	fix: Paginate ListRepositoryWorkflowRuns (#295 ) When we used `QueuedAndInProgressWorkflowRuns`-based autoscaling, it only fetched and considered only the first 30 workflow runs at the reconcilation time. This may have resulted in unreliable scaling behaviour, like scale-in/out not happening when it was expected.	2021-02-09 10:13:53 +09:00
Yusuke Kuoka	ab1c39de57	feat: HorizontalRunnerAutoscaler Webhook server (#282 ) * feat: HorizontalRunnerAutoscaler Webhook server This introduces a Webhook server that responds GitHub `check_run`, `pull_request`, and `push` events by scaling up matched HorizontalRunnerAutoscaler by 1 replica. This allows you to immediately add "resource slack" for future GitHub Actions job runs, without waiting next sync period to add insufficient runners. This feature is highly inspired by https://github.com/philips-labs/terraform-aws-github-runner. terraform-aws-github-runner can manage one set of runners per deployment, where actions-runner-controller with this feature can manage as many sets of runners as you declare with HorizontalRunnerAutoscaler and RunnerDeployment pairs. On each GitHub event received, the webhook server queries repository-wide and organizational runners from the cluster and searches for the single target to scale up. The webhook server tries to match HorizontalRunnerAutoscaler.Spec.ScaleUpTriggers[].GitHubEvent.[CheckRun\|Push\|PullRequest] against the event and if it finds only one HRA, it is the scale target. If none or two or more targets are found for repository-wide runners, it does the same on organizational runners. Changes: * Fix integration test * Update manifests * chart: Add support for github webhook server * dockerfile: Include github-webhook-server binary * Do not import unversioned go-github * Update README	2021-02-07 17:37:27 +09:00
Jesse Haka	28e80a2d28	Add support for enterprise runners (#290 ) * Add support for enterprise runners * update docs	2021-02-05 09:31:06 +09:00
Reinier Timmer	8d6f77e07c	Remove beta GitHub client implementations (#228 )	2020-12-10 09:08:51 +09:00
Erik Nobel	a2b335ad6a	Github pkg: Bump github package to version 33 (#222 )	2020-12-06 10:01:47 +09:00
ZacharyBenamram	df99f394b4	Remove 10 minute buffer to token expiration (#214 ) Co-authored-by: Zachary Benamram <zacharybenamram@blend.com>	2020-11-30 09:03:27 +09:00
Yusuke Kuoka	4eb45d3c7f	Fix build error	2020-11-10 17:09:16 +09:00
Juho Saarinen	1c30bdf35b	Add GHE URL to transport (#152 ) Fixes #149	2020-11-10 17:05:09 +09:00
Yusuke Kuoka	3f335ca628	Fix panic on startup when misconfigured (#154 ) Fixes #153	2020-11-10 17:03:33 +09:00
Juho Saarinen	40c5050978	Added support for other than public GitHub URL (#146 ) Refactoring a bit	2020-10-28 22:15:53 +09:00
Dominic LoBue	a63860029a	Prefer autoscaling based on jobs rather than workflows if available (#114 ) Adds the ability to autoscale on jobs in addition to workflows. We fall back to using workflow metrics if job details are not present. Resolves #89	2020-10-08 09:00:44 +09:00
Helder Moreira	7a2fa7fbce	runner-controller: do not delete runner if it is busy (#103 ) Currently, after refreshing the token, the controller re-creates the runner with the new token. This results in jobs being interrupted. This PR makes sure the pod is not restarted if it is busy. Closes #74	2020-10-05 09:06:37 +09:00
Yusuke Kuoka	4733edc20d	Add scaling-down scenario to integration test	2020-08-02 16:10:01 +09:00
KUOKA Yusuke	5bb2694349	feat: Repository-wide RunnerDeployment Autoscaling (#57 ) * feat: Repository-wide RunnerDeployment Autoscaling This adds `maxReplicas` and `minReplicas` to the RunnerDeploymentSpec. If and only if both fields are set, the controller computes and sets desired `replicas` automatically depending on the demand. The number of demanded runner replicas is computed by `queued workflow runs + in_progress workflow runs` for the repository. The support for organizational runners is not included. Ref https://github.com/summerwind/actions-runner-controller/issues/10	2020-06-27 17:26:46 +09:00
Reinier Timmer	9f57f52e36	organization and repository are now exclusive	2020-04-28 11:14:31 +02:00
Reinier Timmer	fb35dd4131	support for organization runners	2020-04-28 11:14:31 +02:00
Moto Ishizawa	5f608058cd	Add github package	2020-04-13 22:27:05 +09:00

1 2

91 Commits