Commit Graph

97 Commits

Author SHA1 Message Date
Fedor Korotkov 88506b1adb
Remove license tier validation (#428) 2026-04-12 15:27:27 -04:00
Nikolay Edigaryev 3cfa244550
create vm: introduce --{os,arch,runtime} command-line arguments (#422)
* create vm: introduce --{os,arch,runtime} command-line arguments

* v1.VM: prevent unsupported fields for "vetu" runtime
2026-03-17 19:46:00 +01:00
Nikolay Edigaryev 9092a9f172
Support Vetu virtualization on Linux in addition to Tart on macOS (#419)
* Support Vetu virtualization on Linux in addition to Tart on macOS

* api(portForward): ensure that rendezvousConn is closed

* Re-try SSH connections in integration tests

Because a VM might be still booting.
2026-03-16 11:12:28 +01:00
Matt e20a16ee8e
Fix race in port-forward (#418) 2026-03-05 15:56:13 +01:00
Nikolay Edigaryev 3fffe5fb74
Replace Prometheus with OpenTelemetry (#413) 2026-02-23 19:01:10 +01:00
Nikolay Edigaryev a64f76a934
controller(api): new "GET /vms/{name}/exec" WebSocket-based endpoint (#408)
* controller(api): new "GET /vms/{name}/exec" WebSocket-based endpoint

* Split SSH connection and execution to avoid standard input handoff

* execstream: make Exit field a pointer again

To support serializing exit codes equal to 0.
2026-02-12 13:35:06 +01:00
Nikolay Edigaryev 2b4e3b6b01
Enable Gin's context fallback (#409) 2026-02-11 18:04:15 +01:00
Nikolay Edigaryev c4b7378883
controller(listVMs): avoid copy of each element when filtering (#401)
* controller(listVMs): avoid copy of each element when filtering

* Explain the change
2026-02-06 18:16:58 +01:00
Nikolay Edigaryev bdc2af3d58
controller(listVMs): reduce allocations (#400)
* controller(listVMs): reduce allocations

* Declare an empty, non-nil slice to return [] when no objects are found
2026-02-05 22:02:21 +01:00
Fedor Korotkov be869f10d4
Refactor listing VMs (#399)
* Removed unnesesary ListOptions

* Refactor genericList to accept string prefixes instead of byte slices

* Optimize VM listing logic with singleflight to deduplicate concurrent request

* Refactor VM listing logic: rename variables for clarity and update error messages

* fix: address PR review feedback

- use singleflight DoChan with context cancellation for list VMs

🤖 Generated with [Codex](https://chatgpt.com/codex)

Co-Authored-By: Codex <codex@openai.com>

---------

Co-authored-by: Codex <codex@openai.com>
2026-02-05 18:51:45 +01:00
Nikolay Edigaryev 2c0629f52b
Introduce "compute:connect" role (#393)
* Introduce "compute:connect" role

* Fix message fixture in TestAuthorizeAuthenticatedNoRoles
2026-01-29 19:55:28 +01:00
Nikolay Edigaryev 688238837a
Implement server-side filtering for VMs by worker (#392)
* Implement server-side filtering for VMs by worker

* Parse more than one filter but error out when more than one is provided

* Fix off-by-one

* No need to use "\n" in Debugf()
2026-01-29 17:52:24 +01:00
Nikolay Edigaryev 7775515a73
Load testing: synthetic VMs, multiple worker support and Grafana k6 test (#389)
* Load testing: synthetic VMs, multiple worker support and Grafana k6 test

* echoserver: prevent fallthrough when Accept() fails

* Move default local-dev context logic to CreateDevController()

* Synthetic: add a random delay to startup script echoing
2026-01-28 10:54:55 +01:00
Fedor Korotkov 6fe523ef69
Add pagination support for listing VM events (#386)
* Add pagination support for listing VM events

Introduced a paginated event listing API, added support for pagination parameters in the request, and included cursor-based navigation using headers. Relevant tests and Badger store implementations were updated to support the new logic.

* Remove support for ordering VM events

Dropped `ListOrder` type, `order` query parameter, and related logic for ordering VM events. Updated tests, API schema, and Badger store to reflect the removal.

* Remove invalid VM events ordering test

Deleted a test case for invalid VM events ordering since the `order` query parameter and related functionality have been removed.

* Add support for ordering VM events

Implemented `order` query parameter for specifying sort order (ascending/descending) of VM events. Updated API schema, Badger store, and added related tests.

* Add support for limiting and ordering VM logs

Introduced `--limit` and `--order` flags for controlling the number of log lines and their sort order (ascending/descending). Updated API client to handle new options.

* Update internal/controller/store/badger/badger_events.go

Co-authored-by: Nikolay Edigaryev <edigaryev@gmail.com>

* fix: address PR review feedback

- switch logs CLI to --tail with desc ordering
- reuse ParseLogsOrder in controller with helpful errors
- always use ListEventsPage and scope event cursors
- move events pagination coverage to integration test

🤖 Generated with [Codex](https://chatgpt.com/codex)

Co-Authored-By: Codex <codex@openai.com>

* refactor: simplify prefix trimming and improve error formatting

- Replaced manual prefix check with `bytes.TrimPrefix` in Badger store.
- Enhanced error message formatting in VM logs controller.

* fix: address PR review feedback

- use suggested reverse seek in badger events pagination
- add events pagination client helper and use it in integration test

🤖 Generated with [Codex](https://chatgpt.com/codex)

Co-Authored-By: Codex <codex@openai.com>

---------

Co-authored-by: Nikolay Edigaryev <edigaryev@gmail.com>
Co-authored-by: Codex <codex@openai.com>
2026-01-22 09:22:53 -05:00
Nikolay Edigaryev 76a552bade
Ability to set VM's power state and retrieve backing Tart VM's name (#373)
* Ability to set VM's power state and retrieve backing Tart VM's name

* Validate user-provided "powerState" field

* Introduce TestSpecUpdatePowerStateSuspend

* Introduce TestSpecUpdatePowerStateStopped

* OpenAPI specification: add note about suspended VMs to "tartName" desc.

* Sometimes we need to wait more than 30 seconds
2025-12-02 16:43:17 -05:00
Nikolay Edigaryev 9cdfd75f79
Badger store: avoid code duplication by using generic methods (#369)
* Badger store: avoid code duplication by using generic methods

* No need to return PT, can return just *T
2025-11-17 18:34:59 +04:00
Nikolay Edigaryev 4e0dc749d0
BadgerDB: set logging level to INFO (#367) 2025-11-11 22:43:48 +04:00
Nikolay Edigaryev 60303d11dd
VM specification: allow suspendable VMs (#366) 2025-11-11 21:16:28 +04:00
Nikolay Edigaryev bafcf6fac2
Simplify state reconciliation and support changing Softnet settings (#364)
* Simplify state reconciliation and support changing Softnet settings

* Remove unused "updateFunc" parameter from syncOnDiskVMs()

* Don't take an address of a loop variable

* ensure → ensures

* updateVMState(): don't forget to update VMState

* Introduce TestSpecUpdateSoftnet integration test

* Update OpenAPI specification to include generation/observedGeneration
2025-11-06 20:56:31 +04:00
Nikolay Edigaryev 08e9dfbbfe
Support "tart run"'s --net-softnet-allow and --net-softnet-block (#361)
* Support "tart run"'s --net-softnet-allow and --net-softnet-block

* Use ghcr.io/cirruslabs/macos-tahoe-base:latest by default
2025-10-27 23:07:43 +04:00
Nikolay Edigaryev af221cf3c1
Support for prefixed Orchard Controller API URLs (#355)
* Support for prefixed Orchard Controller API URLs

* Fix Swagger UI

* Remove spurious "fmt" import

* Use url.URL in order to correctly calculate API path for Swagger UI
2025-10-06 20:04:47 +04:00
Nikolay Edigaryev c5e0d68a3d
API: introduce ability to watch a VM (#351)
* API: introduce ability to watch a VM

* Document ?watch=true for GET /vms/{name} in the OpenAPI specification

* WatchVM: ensure that goroutine is terminated on early return with error

* WatchVM: close channels on goroutine exit

* WatchVM: ensure that we wait for the goroutine after additional barriers

* WatchVM: ignore unexpected keys instead of throwing an error

* WatchVM: perform context-aware writes to a bounded channel

* WatchVM: don't forget to close errCh on goroutine exit too

* WatchVM: don't close readyCh in goroutine to avoid ambiguity

* WatchVM: filter out spurious KVs that signify VM deletion
2025-10-03 21:34:53 +04:00
Nikolay Edigaryev 873efb24e7
ghcr.io/cirruslabs/macos-sequoia-base:latest for everything (#344) 2025-09-25 20:43:53 +04:00
Nikolay Edigaryev f5aa04e98b
orchard controller run: introduce configurable --worker-offline-timeout (#342) 2025-09-17 00:10:39 +04:00
Nikolay Edigaryev 26668f2cbd
orchard controller run: introduce --experimental-disable-db-compression (#336) 2025-08-19 17:31:18 +04:00
Nikolay Edigaryev 39fbbbc2a6
Disable Prometheus metrics by default (#331) 2025-07-17 00:58:13 +04:00
Nikolay Edigaryev ed7921ce16
Fix websocket.(*Conn).timeoutLoop goroutine leak (#329) 2025-07-11 15:23:50 +04:00
Nikolay Edigaryev a37a8914cd
orchard controller run: introduce --experimental-ping-interval (#316)
* orchard controller run: introduce --experimental-ping-interval

* Ensure that --experimental-ping-interval is always larger than 5s
2025-05-15 21:14:17 +04:00
Nikolay Edigaryev d52aa91927
Controller: periodically send PINGs on all WebSocket connections (#315) 2025-05-15 18:43:52 +04:00
Nikolay Edigaryev 0a3d9c6d1c
BadgerDB: periodically perform garbage collection (#307)
* BadgerDB: periodically perform garbage collection

* GC every hour
2025-04-16 00:44:04 +04:00
Nikolay Edigaryev 9919117b9b
orchard controller run: create a default bootstrap context (#291)
* orchard controller run: create a default bootstrap context

* Dockerfile: correct AS casing

* Fix typo in BootstrapContextName
2025-03-27 18:48:04 +04:00
Nikolay Edigaryev 818f4288c2
Controller API: correctly detect WebSocket closure in Watch RPC (#259) 2025-02-20 02:00:57 +04:00
Nikolay Edigaryev 61d7d34ea4
RPC v2: fix Ping() hanging due to PONG not being processed (#247) 2025-02-07 22:05:09 +04:00
Nikolay Edigaryev 8dd74db446
Worker notification improvements (#246)
* OpenAPI: document all default "wait" values

* Re-use waitContext instead of instantiating it anew
2025-02-07 00:38:04 +04:00
Fedor Korotkov 86f0afb5a3
Small timout for worker notification (#242)
* Small timout for worker notification

It seems at the moment if a worker re-establishes notify stream (for example, if network flips or proxy breaks the connection) then we can see "no worker registered with this name" errors.

This change makes Notifier to wait for 30 seconds before failing, at the time of calling `Notifier#Notify` we know such worker exists.

PS not sure if we need to make the timeout configurable.

* Wait via context

* Make sure all `context`s for `Notify` is time bounded

* Lint issues
2025-02-06 17:30:09 +00:00
Nikolay Edigaryev 26c8808506
Support scheduling by labels (#244) 2025-02-06 18:05:36 +04:00
Nikolay Edigaryev 581de320b9
Allow creating VMs with implicit CPU and memory (#243)
* Allow creating VMs with implicit CPU and memory

* Clarify why cpu/memory can be 0 a bit better

* Controller(API): don't forget to update DefaultCPU and DefaultMemory

* Add an integration test for implicit CPU and memory
2025-02-06 00:50:01 +04:00
Nikolay Edigaryev 88fba8004d
Introduce WebSocket-based RPC v2 (#239)
* Introduce WebSocket-based RPC v2

* go test: add -ldflags="-B gobuildid"

* No need to change the "controller.workerNotifier.Notify()" error message

* No need to modify Protocol Buffers/gRPC generated code

* rpcWatch(): explain that connection shouldn't be normally be closed

* Avoid "port forwarding failed: " repetition in error messages

* Improve comments and avoid repetition in IP resolution errors
2025-01-30 17:33:32 +04:00
Nikolay Edigaryev 077252f6d4
Prevent goroutine leak when Close()'ing *grpc_net_conn.Conn (#237) 2025-01-23 18:17:14 +04:00
Nikolay Edigaryev 1fce915d67
API: only overwrite specific worker fields when worker already exists (#236)
* API: only overwrite specific worker fields when worker already exists

* Don't forget to return when creating new worker

* Return updated worker when updating the worker
2025-01-16 16:42:17 +04:00
Nikolay Edigaryev d7b6f477e1
Never list workers in Update()/storeUpdate() transactions (#228)
* POST /v1/workers: do not list workers in a single update txn

* schedulingLoopIteration(): do not list workers in a single update txn

* .golangci.yml: remove mentions of fully deprecated linters
2024-12-05 16:59:50 +04:00
Nikolay Edigaryev d94690176e
Schedule opportunistically and more granularly (#225)
* Schedule opportunistically and more granularly

To avoid transaction conflicts.

* Measure scheduling loop iteration duration and log it at debugging level

* Use "continue NextWorker" instead of just "continue" for clarity
2024-12-03 14:11:48 +00:00
Nikolay Edigaryev 7fe0414981
"--scheduler-profile" option to allow different orchestration patterns (#224)
* "--scheduler-profile" option to allow different orchestration patterns

* API(cluster settings): provide a default value for scheduler profile
2024-11-28 20:07:46 +04:00
Nikolay Edigaryev 772336a7bd
Scheduler: stop iterating over workers when candidate worker is found (#220) 2024-11-13 17:59:08 +04:00
Nikolay Edigaryev 60948e14fe
Rendezvous: use a buffered channel of size 1 (#219)
* Rendezvous: use a buffered channel of size 1

* Fix spelling of "absence" in comment
2024-11-08 11:19:54 +04:00
Nikolay Edigaryev 2a2ddea62a
Controller: emit lifecycle events when the VM gets restarted or deleted (#208)
* Controller: emit lifecycle events when the VM gets restarted or deleted

* vm_{scheduling,run}_time → vm_{scheduling,run}_duration for clarity

* Update VM endpoint: only update VM started time when zero
2024-09-24 17:53:10 +04:00
Nikolay Edigaryev 1730eaf67c
orchard controller: make sure that output goes through the logger (#207)
...which emits JSON on the production for easier processing.
2024-09-17 22:54:43 +04:00
Mark McWhirter 979af1f699
Expose 2 new metrics about worker health (#203)
* Expose more metrics about worker health

* PR feedback

* PR feedback
2024-09-10 10:13:41 -04:00
Nikolay Edigaryev 8aaf05c4f7
controller run: make bootstrap process more user-friendly (#201)
* controller run: make bootstrap process more user-friendly

* Badger: log to zap instead of standard error
2024-09-03 18:54:28 +04:00
Nikolay Edigaryev cd9794197b
API: update service account fields on PUT (#198)
* API: update service account fields on PUT

* Disable G115 integer overflow linter of gosec
2024-08-21 20:03:52 +04:00