Commit Graph

42 Commits

Author SHA1 Message Date
Fedor Korotkov 88506b1adb
Remove license tier validation (#428) 2026-04-12 15:27:27 -04:00
Nikolay Edigaryev 9092a9f172
Support Vetu virtualization on Linux in addition to Tart on macOS (#419)
* Support Vetu virtualization on Linux in addition to Tart on macOS

* api(portForward): ensure that rendezvousConn is closed

* Re-try SSH connections in integration tests

Because a VM might be still booting.
2026-03-16 11:12:28 +01:00
Nikolay Edigaryev 3fffe5fb74
Replace Prometheus with OpenTelemetry (#413) 2026-02-23 19:01:10 +01:00
Fedor Korotkov be869f10d4
Refactor listing VMs (#399)
* Removed unnesesary ListOptions

* Refactor genericList to accept string prefixes instead of byte slices

* Optimize VM listing logic with singleflight to deduplicate concurrent request

* Refactor VM listing logic: rename variables for clarity and update error messages

* fix: address PR review feedback

- use singleflight DoChan with context cancellation for list VMs

🤖 Generated with [Codex](https://chatgpt.com/codex)

Co-Authored-By: Codex <codex@openai.com>

---------

Co-authored-by: Codex <codex@openai.com>
2026-02-05 18:51:45 +01:00
Nikolay Edigaryev 7775515a73
Load testing: synthetic VMs, multiple worker support and Grafana k6 test (#389)
* Load testing: synthetic VMs, multiple worker support and Grafana k6 test

* echoserver: prevent fallthrough when Accept() fails

* Move default local-dev context logic to CreateDevController()

* Synthetic: add a random delay to startup script echoing
2026-01-28 10:54:55 +01:00
Nikolay Edigaryev 9cdfd75f79
Badger store: avoid code duplication by using generic methods (#369)
* Badger store: avoid code duplication by using generic methods

* No need to return PT, can return just *T
2025-11-17 18:34:59 +04:00
Nikolay Edigaryev af221cf3c1
Support for prefixed Orchard Controller API URLs (#355)
* Support for prefixed Orchard Controller API URLs

* Fix Swagger UI

* Remove spurious "fmt" import

* Use url.URL in order to correctly calculate API path for Swagger UI
2025-10-06 20:04:47 +04:00
Nikolay Edigaryev f5aa04e98b
orchard controller run: introduce configurable --worker-offline-timeout (#342) 2025-09-17 00:10:39 +04:00
Nikolay Edigaryev 26668f2cbd
orchard controller run: introduce --experimental-disable-db-compression (#336) 2025-08-19 17:31:18 +04:00
Nikolay Edigaryev 39fbbbc2a6
Disable Prometheus metrics by default (#331) 2025-07-17 00:58:13 +04:00
Nikolay Edigaryev a37a8914cd
orchard controller run: introduce --experimental-ping-interval (#316)
* orchard controller run: introduce --experimental-ping-interval

* Ensure that --experimental-ping-interval is always larger than 5s
2025-05-15 21:14:17 +04:00
Nikolay Edigaryev 9919117b9b
orchard controller run: create a default bootstrap context (#291)
* orchard controller run: create a default bootstrap context

* Dockerfile: correct AS casing

* Fix typo in BootstrapContextName
2025-03-27 18:48:04 +04:00
Nikolay Edigaryev 88fba8004d
Introduce WebSocket-based RPC v2 (#239)
* Introduce WebSocket-based RPC v2

* go test: add -ldflags="-B gobuildid"

* No need to change the "controller.workerNotifier.Notify()" error message

* No need to modify Protocol Buffers/gRPC generated code

* rpcWatch(): explain that connection shouldn't be normally be closed

* Avoid "port forwarding failed: " repetition in error messages

* Improve comments and avoid repetition in IP resolution errors
2025-01-30 17:33:32 +04:00
Nikolay Edigaryev 7fe0414981
"--scheduler-profile" option to allow different orchestration patterns (#224)
* "--scheduler-profile" option to allow different orchestration patterns

* API(cluster settings): provide a default value for scheduler profile
2024-11-28 20:07:46 +04:00
Nikolay Edigaryev 8aaf05c4f7
controller run: make bootstrap process more user-friendly (#201)
* controller run: make bootstrap process more user-friendly

* Badger: log to zap instead of standard error
2024-09-03 18:54:28 +04:00
Nikolay Edigaryev 76f192bdb0
API endpoint and associated RPC changes to resolve VMs IP's (#188)
* API endpoint and associated RPC changes to resolve VMs IP's

* Fix "Missing expected argument '<name>'" error when doing "tart set"

* Implement TestIPEndpoint() and IP() method in controller HTTP client
2024-07-03 22:56:43 +04:00
Nikolay Edigaryev 8119b22817
orchard controller run: introduce --insecure-ssh-no-client-auth (#187) 2024-06-28 23:55:18 +04:00
Nikolay Edigaryev ff0497b1d8
Produce OpenTelemetry metrics (#185)
* .golangci.yml: remove mentions of deprecated linters

* Fix "staticcheck" linter error by using grpc.NewClient

* Configure OpenTelemetry

Metrics only for now.

* Produce OpenTelemetry metrics

* Update DeploymentGuide.md

Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com>

* Update DeploymentGuide.md

Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com>

* Introduce "org.cirruslabs.orchard.controller.worker_status"

---------

Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com>
2024-06-24 18:19:51 +04:00
Nikolay Edigaryev d59bc7f8a7
Orchard Controller: implement an SSH server that acts as a jump host (#179)
* proxy.Connections(): require io.ReadWriteCloser instead of net.Conn

* Orchard Controller: implement an SSH server that acts as a jump host

* Issue a warning if the name used will be invalid in the future

* Further restrict uppercase characters in names in the future

The rationale is similar to https://github.com/kubernetes/kubernetes/issues/71140.

We won't want to munge the user's input and introduce subtle bugs doing
lowercase comparisons.
2024-06-11 19:32:45 +04:00
Nikolay Edigaryev 40f58e4aee
More RPC-related logs (#136)
* More RPC-related logs

* Notifier should be set before we use it in the scheduler
2023-09-27 20:16:00 +04:00
Nikolay Edigaryev fd88ce5890
Introduce ORCHARD_LICENSE_TIER environment variable (#111)
* Introduce ORCHARD_LICENSE_TIER environment variable

* Only parse ORCHARD_LICENSE_TIER if it was provided
2023-07-26 17:28:38 +04:00
Nikolay Edigaryev d57d18d380
Support for sharing files with the host system (#103)
* Support for sharing files with the host system

* Integration tests

* Added back TestVMGarbageCollection comment
2023-07-04 18:10:53 +04:00
Nikolay Edigaryev 4eafec99a5
Fail VMs if the worker had crashed/is unhealthy (#70)
* Fail VMs if the worker had crashed/is unhealthy

* OnDiskName: properly handle cases when VM's name contains hyphens

* Worker: introduce Offline() method and check it before scheduling

* tart.List(): use Tart's JSON output

* OnDiskName: remove empty parts check

* Scheduler: move health-checking logic to a separate function

* Only fail "running" VMs

* Only fail orphaned VMs if they're in terminal state

* Integration tests

* Run healthCheckingLoopIteration() before schedulingLoopIteration()

* Worker: sync on-disk VMs only once at start
2023-04-03 16:47:49 +04:00
Fedor Korotkov f152043f19
Reactive Scheduling (#67)
Before we had two main loops: controller loop to assign VMs and worker loop to start VMs. Each of the loops was performed upon an interval every N seconds.

This change introduces a mechanism for reactively requesting loop execution:

 1. Controller loop will be executed upon VM creation to try to immediately schedule.
 2. A worker will be notified upon a VM assigment and worker loop will be requested to sync immediately.

 Fixes #31
2023-03-28 20:51:41 +04:00
Nikolay Edigaryev cb39836ee0
Resources support (#63)
* Resources support

* Ability to provide VM and worker resources via the CLI

* orchard dev: always listen on :6120

* orchard dev: support --resources

* REST API: provide resource defaults when creating VM

* OpenAPI: document "resources" field

* orchard dev: serve Swagger API documentation on /v1/

* Integration guide
2023-03-27 17:30:54 +04:00
Nikolay Edigaryev 7647ccdc10
Remove Generation field (#57) 2023-03-24 17:23:07 +00:00
Nikolay Edigaryev 49753ebf4c
Tests: use separate controller listening ports to prevent conflicts (#58) 2023-03-24 17:22:58 +00:00
Fedor Korotkov 63ba8b5532
Separate context for `orchard dev` (#56)
Fixes #51
2023-03-24 13:10:35 -04:00
Nikolay Edigaryev af074f499d
Remove UID for now and use machine ID to differentiate workers (#48)
* Remove UID for now and use machine ID to differentiate workers

* Rename MetadataWorkerKey back to MetadataWorkerNameKey
2023-03-23 23:38:54 +04:00
Fedor Korotkov cdf5c5eb00
Simplified bootstrapping of a cluster (#40)
* Simplified bootstrapping of a cluster

Introduced a new convention about a pre-defined `bootstrap-admin` account for `orchard controller run`. Providing `ORCHARD_BOOTSTRAP_ADMIN_TOKEN` will auto-create such user for easier configuration. `bootstrap-admin` can be used for creating other service accounts on the first run and after that can be disposed.

Also change `orchard worker run` to expect controller URL as the only parameter and a bootstrap token passed via an argument instead of using a context that might not be created.

* Missing error check
2023-03-22 23:43:37 +04:00
Fedor Korotkov 9b5ad09841
Consolidate controller bootstrap login in `run` command (#38) 2023-03-21 15:36:55 -04:00
Nikolay Edigaryev 10f56bb5e3
Introduce "orchard ssh" and "orchard vnc" commands (#36)
* proxy.Connections(): handle "use of closed network connection" error

* Controller: less strict timeouts that work nicely for WebSockets

* Worker: only attempt connect to the gRPC once our UID is known

* Introduce "orchard ssh" and "orchard vnc" commands

* Worker: prevent context leak by moving logic into a separate function

* Fix linter errors

* Port forwarding integration test

* Check for "uname -mo" output
2023-03-21 14:58:24 -04:00
Fedor Korotkov fb3056d3ae
Refactorings for simplify readability (#35) 2023-03-17 06:11:28 -04:00
Nikolay Edigaryev 47fef47d1c
Port forwarding support (#30)
* Port forwarding support

* .golangci.yml: remove and replace deprecated and archived linters

* Client: pass credentials when calling WebSocket API methods

* API: require ServiceAccountRoleComputeWrite role for port forwarding

* Use Buf

* Rename Poll() RPC method to Watch()

* Split Rendezvous into two parts: Watcher and Proxy (#32)

* Split Rendezvous into two parts: Watcher and Proxy

* Implement Proxy cancellation

* Use Protocol Buffers structure directly in Watcher

* Fix TestWatcher after switching to Protocol Buffers structure

* portForwardVM(): ensure we also check for gin's context
2023-03-14 11:31:13 -04:00
Fedor Korotkov 165662bb0a
Better state syncing and other improvements (#24) 2023-03-01 11:42:16 -05:00
Nikolay Edigaryev 8df31f7c2d
Introduce service accounts and bootstrap tokens (#22) 2023-02-21 11:34:12 -05:00
Fedor Korotkov edb9b3d693
Refactored working with a storage (#20) 2023-02-08 19:36:30 -05:00
Nikolay Edigaryev 0b9b96b8c9
Introduce "orchard context" (#18) 2023-02-07 19:48:31 +04:00
Nikolay Edigaryev a7264370f5
Introduce "controller init" and generate self-signed X.509 certificate (#17) 2023-02-04 11:40:07 +04:00
Nikolay Edigaryev ab3ffe82fa
Remove janitor (#16)
* Remove controller's janitor

* Properly handle err == nil in mapErr() function
2023-02-01 19:02:56 +04:00
Nikolay Edigaryev 6bcc02d815
Use golangci-lint (#15) 2023-01-31 22:22:28 +04:00
Nikolay Edigaryev 92e8732d46
Initial version of the Orchard orchestration system (#3)
* Initial version of the Orchard orchestration system

* Update README.md

Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com>

Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com>
2023-01-26 23:46:23 +04:00