orchard

Commit Graph

Author	SHA1	Message	Date
Nikolay Edigaryev	9092a9f172	Support Vetu virtualization on Linux in addition to Tart on macOS (#419 ) * Support Vetu virtualization on Linux in addition to Tart on macOS * api(portForward): ensure that rendezvousConn is closed * Re-try SSH connections in integration tests Because a VM might be still booting.	2026-03-16 11:12:28 +01:00
Nikolay Edigaryev	3fffe5fb74	Replace Prometheus with OpenTelemetry (#413 )	2026-02-23 19:01:10 +01:00
Nikolay Edigaryev	76a552bade	Ability to set VM's power state and retrieve backing Tart VM's name (#373 ) * Ability to set VM's power state and retrieve backing Tart VM's name * Validate user-provided "powerState" field * Introduce TestSpecUpdatePowerStateSuspend * Introduce TestSpecUpdatePowerStateStopped * OpenAPI specification: add note about suspended VMs to "tartName" desc. * Sometimes we need to wait more than 30 seconds	2025-12-02 16:43:17 -05:00
Nikolay Edigaryev	26668f2cbd	orchard controller run: introduce --experimental-disable-db-compression (#336 )	2025-08-19 17:31:18 +04:00
Nikolay Edigaryev	39fbbbc2a6	Disable Prometheus metrics by default (#331 )	2025-07-17 00:58:13 +04:00
Fedor Korotkov	86f0afb5a3	Small timout for worker notification (#242 ) * Small timout for worker notification It seems at the moment if a worker re-establishes notify stream (for example, if network flips or proxy breaks the connection) then we can see "no worker registered with this name" errors. This change makes Notifier to wait for 30 seconds before failing, at the time of calling `Notifier#Notify` we know such worker exists. PS not sure if we need to make the timeout configurable. * Wait via context * Make sure all `context`s for `Notify` is time bounded * Lint issues	2025-02-06 17:30:09 +00:00
Nikolay Edigaryev	26c8808506	Support scheduling by labels (#244 )	2025-02-06 18:05:36 +04:00
Nikolay Edigaryev	581de320b9	Allow creating VMs with implicit CPU and memory (#243 ) * Allow creating VMs with implicit CPU and memory * Clarify why cpu/memory can be 0 a bit better * Controller(API): don't forget to update DefaultCPU and DefaultMemory * Add an integration test for implicit CPU and memory	2025-02-06 00:50:01 +04:00
Nikolay Edigaryev	d7b6f477e1	Never list workers in Update()/storeUpdate() transactions (#228 ) * POST /v1/workers: do not list workers in a single update txn * schedulingLoopIteration(): do not list workers in a single update txn * .golangci.yml: remove mentions of fully deprecated linters	2024-12-05 16:59:50 +04:00
Nikolay Edigaryev	d94690176e	Schedule opportunistically and more granularly (#225 ) * Schedule opportunistically and more granularly To avoid transaction conflicts. * Measure scheduling loop iteration duration and log it at debugging level * Use "continue NextWorker" instead of just "continue" for clarity	2024-12-03 14:11:48 +00:00
Nikolay Edigaryev	7fe0414981	"--scheduler-profile" option to allow different orchestration patterns (#224 ) * "--scheduler-profile" option to allow different orchestration patterns * API(cluster settings): provide a default value for scheduler profile	2024-11-28 20:07:46 +04:00
Nikolay Edigaryev	772336a7bd	Scheduler: stop iterating over workers when candidate worker is found (#220 )	2024-11-13 17:59:08 +04:00
Nikolay Edigaryev	2a2ddea62a	Controller: emit lifecycle events when the VM gets restarted or deleted (#208 ) * Controller: emit lifecycle events when the VM gets restarted or deleted * vm_{scheduling,run}_time → vm_{scheduling,run}_duration for clarity * Update VM endpoint: only update VM started time when zero	2024-09-24 17:53:10 +04:00
Mark McWhirter	979af1f699	Expose 2 new metrics about worker health (#203 ) * Expose more metrics about worker health * PR feedback * PR feedback	2024-09-10 10:13:41 -04:00
Nikolay Edigaryev	ff0497b1d8	Produce OpenTelemetry metrics (#185 ) * .golangci.yml: remove mentions of deprecated linters * Fix "staticcheck" linter error by using grpc.NewClient * Configure OpenTelemetry Metrics only for now. * Produce OpenTelemetry metrics * Update DeploymentGuide.md Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com> * Update DeploymentGuide.md Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com> * Introduce "org.cirruslabs.orchard.controller.worker_status" --------- Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com>	2024-06-24 18:19:51 +04:00
Nikolay Edigaryev	60e564da88	Implement restart policy for VMs (#83 ) * Implement restart policy for VMs * Do not update VM.Resource, we only use it as a read-only specification * Err()/setErr(): use atomic.Pointer instead of sync.Mutex	2023-04-24 19:30:08 +04:00
Fedor Korotkov	010df300a3	Add basic Prometheus metrics (#82 ) Fixes #71	2023-04-21 10:05:01 +04:00
Nikolay Edigaryev	84633d0e45	Introduce "orchard pause" and "orchard resume" commands (#73 )	2023-04-07 22:59:41 +04:00
Nikolay Edigaryev	4eafec99a5	Fail VMs if the worker had crashed/is unhealthy (#70 ) * Fail VMs if the worker had crashed/is unhealthy * OnDiskName: properly handle cases when VM's name contains hyphens * Worker: introduce Offline() method and check it before scheduling * tart.List(): use Tart's JSON output * OnDiskName: remove empty parts check * Scheduler: move health-checking logic to a separate function * Only fail "running" VMs * Only fail orphaned VMs if they're in terminal state * Integration tests * Run healthCheckingLoopIteration() before schedulingLoopIteration() * Worker: sync on-disk VMs only once at start	2023-04-03 16:47:49 +04:00
Fedor Korotkov	f152043f19	Reactive Scheduling (#67 ) Before we had two main loops: controller loop to assign VMs and worker loop to start VMs. Each of the loops was performed upon an interval every N seconds. This change introduces a mechanism for reactively requesting loop execution: 1. Controller loop will be executed upon VM creation to try to immediately schedule. 2. A worker will be notified upon a VM assigment and worker loop will be requested to sync immediately. Fixes #31	2023-03-28 20:51:41 +04:00
Nikolay Edigaryev	cb39836ee0	Resources support (#63 ) * Resources support * Ability to provide VM and worker resources via the CLI * orchard dev: always listen on :6120 * orchard dev: support --resources * REST API: provide resource defaults when creating VM * OpenAPI: document "resources" field * orchard dev: serve Swagger API documentation on /v1/ * Integration guide	2023-03-27 17:30:54 +04:00

21 Commits