Commit Graph

370 Commits

Author SHA1 Message Date
Yusuke Kuoka a4876c5d03 Make EphemeralRunnerController MaxConcurrentReconciles configurable 2024-11-01 01:56:11 +00:00
Yusuke Kuoka f58dd76763 Make EphemeralRunnerReconciler create runner pods faster and earlier 2024-09-30 05:25:43 +00:00
Nikola Jokic b349ded2be
Increase test timeouts to avoid CI test failures (#3554) 2024-06-21 13:45:48 +02:00
Nikola Jokic 6276c84493
AutoscalingListener controller: Inspect listener container state instead of pod phase (#3548) 2024-06-21 13:40:08 +02:00
Nikola Jokic a62ca3d853
Exclude label prefix propagation (#3607) 2024-06-21 12:12:14 +02:00
Nikola Jokic 2cc793a835
Remove `.Named()` from the ephemeral runner controller (#3596) 2024-06-17 10:36:08 +02:00
Serge e45ac190e2
Customize work directory (#3477) 2024-06-04 15:16:45 +02:00
Katarzyna d0fb7206a4
Fix problem with ephemeralRunner Succeeded state before build executed (#3528) 2024-06-03 10:49:45 +02:00
Nikola Jokic 9afd93065f
Remove finalizers in one pass to speed up cleanups AutoscalingRunnerSet (#3536) 2024-05-27 09:21:31 +02:00
Nikola Jokic ab92e4edc3
Re-use the last desired patch on empty batch (#3453) 2024-05-17 15:12:16 +02:00
Nikola Jokic fa7a4f584e
Extract single place to set up indexers (#3454) 2024-05-17 14:42:46 +02:00
Nikola Jokic 9b51f25800
Rename imports in tests to remove double import and to improve readability (#3455) 2024-05-17 14:37:13 +02:00
Bryan Peterson 109750f816
propogate arbitrary labels from runnersets to all created resources (#3157) 2024-04-23 11:19:32 +02:00
Nikola Jokic 8075e5ee74
Refactor actions client error to include request id (#3430)
Co-authored-by: Francesco Renzi <rentziass@gmail.com>
2024-04-16 12:57:44 +02:00
Nikola Jokic 963ae48a3f
Include self correction on empty batch and avoid removing pending runners when cluster is busy (#3426) 2024-04-16 12:55:25 +02:00
Nikola Jokic b6a95ae879
Change duplicate message key in logs while updating ephemeral runner status (#3380) 2024-03-26 12:57:46 +01:00
Nikola Jokic 7a643a5107
Fix overscaling when the controller is much faster then the listener (#3371)
Co-authored-by: Francesco Renzi <rentziass@gmail.com>
2024-03-20 15:36:12 +01:00
Nikola Jokic c9099a5a56
Add annotation with values hash to re-create listener (#3195) 2024-03-19 14:29:49 +01:00
Hidehito Yabuuchi 48706584fd
Propagate runner scale set name annotation to EphemeralRunner (#3098) 2024-03-19 12:50:49 +01:00
Nikola Jokic c00465973e
Publish metrics in the new ghalistener (#3193) 2024-01-25 14:46:42 +01:00
Nikola Jokic fe8c3bb789
Change listener container name (#3167) 2023-12-19 12:22:52 +01:00
Serge d7d479172d
Fix override listener pod spec (#3139) (#3161)
Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
2023-12-18 16:50:06 +01:00
Nikola Jokic b78cadd901
Refactoring listener app with configurable fallback (#3096) 2023-12-08 13:41:06 +01:00
Nikola Jokic 202a97ab12
Modify user agent format with subsystem and is proxy configured information (#3116) 2023-12-08 13:16:29 +01:00
Toru Komatsu b08d533105
Record the error when the creation pod fails (#3112)
Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-12-07 21:11:52 +01:00
Toru Komatsu 7793e1974a
Record a reason for pod failure in EphemeralRunner (#3074)
Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-11-21 08:26:29 +01:00
Nikola Jokic 2939640fa9
Add ResizePolicy and RestartPolicy on mergeListenerContainer (#3075) 2023-11-15 10:39:11 +01:00
Nikola Jokic 65fd04540c
Bump go version and all direct dependencies to newest for k8s compatibility (#2947) 2023-11-14 16:19:43 +01:00
Nikola Jokic 95487735a2
Remove inheritance of imagePullPolicy from manager to listeners (#3009) 2023-11-07 15:08:36 +01:00
Nikola Jokic 2117fd1892
Configure listener pod with the secret instead of env (#2965)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-10-19 12:29:32 +02:00
Nikola Jokic bffcb32b19
Fix role and rolebinding cleanup for the listener controller (#2970) 2023-10-16 12:40:38 +02:00
Nikola Jokic fdf7b6c525
Fix nil map when annotations are applied (#2916)
Co-authored-by: Hidetake Iwata <int128@gmail.com>
2023-09-26 11:21:16 +02:00
Nikola Jokic 07bff8aa1e
Extend the user agent and fix the build version for the listener app (#2892) 2023-09-14 20:10:49 +02:00
Nikola Jokic ea2fb32e20
Extend and generate crds allowing listener pod spec change (#2758)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-09-14 15:33:29 +02:00
Nikola Jokic a0a3916c80
Provide scale-set listener metrics (#2559)
Co-authored-by: Tingluo Huang <tingluohuang@github.com>
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-08-21 13:50:07 +02:00
Gavin Williams cd996e7c27
Fix `panic: slice bounds out of range` when runner spec contains `volumeMounts`. (#2720)
Signed-off-by: Gavin Williams <gavin.williams@machinemax.com>
2023-07-25 09:53:50 +09:00
Lars Lange 297442975e
fix: remove callbacks resulting in scales due to incomplete response (#2671)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2023-07-25 09:04:54 +09:00
Nikola Jokic 6fe8008640
Add configurable log format to values.yaml and propagate it to listener (#2686) 2023-07-05 21:06:42 +02:00
Nikola Jokic 94934819c4
Trim repo/org/enterprise to 63 characters in label values (#2657) 2023-06-09 20:57:20 +02:00
Nuru aac811f210
Update unconsumed HRA capacity reservation's expiration more frequently and consistently (#2502)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2023-05-30 09:04:57 +09:00
Thang Le e7ec736738
Use head_branch metric (#2549)
Co-authored-by: Yusuke Kuoka <ykuoka@gmail.com>
2023-05-28 16:36:55 +09:00
Daniel Hobley 90ea691e72
feat: allow for modifying `var-run` mount maximum size limit (#2624) 2023-05-27 11:47:23 +09:00
Bassem Dghaidi 8afef51c8b
Add DrainJobsMode (aka UpdateStrategy feature) (#2569) 2023-05-23 07:42:30 -04:00
argokasper a2d4b95b79
Fix GET validation for lowercase http methods (#2497)
Some requests send method in lowercase (verified with curl and as a default for AWS ALB health check requests), but Go HTTP library constant MethodGet is in upper.
2023-04-27 13:22:41 +09:00
Nuru 9bd4025e9c
Stricter filtering of check run completion events (#2520)
I observed that 100% of canceled jobs in my runner pool were not causing scale down events. This PR fixes that.

The problem was caused by #2119. 

#2119 ignores certain webhook events in order to fix #2118. However, #2119 overdoes it and filters out valid job cancellation events. This PR uses stricter filtering and add visibility for future troubleshooting.

<details><summary>Example cancellation event</summary>

This is the redacted top portion of a valid cancellation event my runner pool received and ignored.

```json
{
  "action": "completed",
  "workflow_job": {
    "id": 12848997134,
    "run_id": 4738060033,
    "workflow_name": "slack-notifier",
    "head_branch": "auto-update/slack-notifier-0.5.1",
    "run_url": "https://api.github.com/repos/nuru/<redacted>/actions/runs/4738060033",
    "run_attempt": 1,
    "node_id": "CR_kwDOB8Xtbc8AAAAC_dwjDg",
    "head_sha": "55bada8f3d0d3e12a510a1bf34d0c3e169b65f89",
    "url": "https://api.github.com/repos/nuru/<redacted>/actions/jobs/12848997134",
    "html_url": "https://github.com/nuru/<redacted>/actions/runs/4738060033/jobs/8411515430",
    "status": "completed",
    "conclusion": "cancelled",
    "created_at": "2023-04-19T00:03:12Z",
    "started_at": "2023-04-19T00:03:42Z",
    "completed_at": "2023-04-19T00:03:42Z",
    "name": "build (arm64)",
    "steps": [

    ],
    "check_run_url": "https://api.github.com/repos/nuru/<redacted>/check-runs/12848997134",
    "labels": [
      "self-hosted",
      "arm64"
    ],
    "runner_id": 0,
    "runner_name": "",
    "runner_group_id": 0,
    "runner_group_name": ""
  },
```

</details>
2023-04-27 13:15:23 +09:00
Yusuke Kuoka 94c089c407
Revert docker.sock path to /var/run/docker.sock (#2536)
Starting ARC v0.27.2, we've changed the `docker.sock` path from `/var/run/docker.sock` to `/var/run/docker/docker.sock`. That resulted in breaking some container-based actions due to the hard-coded `docker.sock` path in various places.

Even `actions/runner` seem to use `/var/run/docker.sock` for building container-based actions and for service containers?

Anyway, this fixes that by moving the sock file back to the previous location.

Once this gets merged, users stuck at ARC v0.27.1, previously upgraded to 0.27.2 or 0.27.3 and reverted back to v0.27.1 due to #2519, should be able to upgrade to the upcoming v0.27.4.

Resolves #2519
Resolves #2538
2023-04-27 13:06:35 +09:00
Nikola Jokic 9859bbc7f2
Use build.Version to check if resource version is a mismatch (#2521)
Co-authored-by: Bassem Dghaidi <568794+Link-@users.noreply.github.com>
2023-04-24 10:40:15 +02:00
Yusuke Kuoka 3e0bc3f7be
Fix docker.sock permission error for non-dind Ubuntu 20.04 runners since v0.27.2 (#2499)
#2490 has been happening since v0.27.2 for non-dind runners based on Ubuntu 20.04 runner images. It does not affect Ubuntu 22.04 non-dind runners(i.e. runners with dockerd sidecars) and Ubuntu 20.04/22.04 dind runners(i.e. runners without dockerd sidecars). However, presuming many folks are still using Ubuntu 20.04 runners and non-dind runners, we should fix it.

This change tries to fix it by defaulting to the docker group id 1001 used by Ubuntu 20.04 runners, and use gid 121 for Ubuntu 22.04 runners. We use the image tag to see which Ubuntu version the runner is based on. The algorithm is so simple- we assume it's Ubuntu-22.04-based if the image tag contains "22.04".

This might be a breaking change for folks who have already upgraded to Ubuntu 22.04 runners using their own custom runner images. Note again; we rely on the image tag to detect Ubuntu 22.04 runner images and use the proper docker gid- Folks using our official Ubuntu 22.04 runner images are not affected. It is a breaking change anyway, so I have added a remedy-

ARC got a new flag, `--docker-gid`, which defaults to `1001` but can be set to `121` or whatever gid the operator/admin likes. This can be set to `--docker-gid=121`, for example, if you are using your own custom runner image based on Ubuntu 22.04 and the image tag does not contain "22.04".

Fixes #2490
2023-04-17 21:30:41 +09:00
Nikola Jokic ba1ac0990b
Reordering methods and constants so it is easier to look it up (#2501) 2023-04-12 09:50:23 +02:00
Nikola Jokic b86af190f7
Extend manager roles to accept ephemeralrunnerset/finalizers (#2493) 2023-04-10 08:49:32 +02:00