orchard

Commit Graph

Author	SHA1	Message	Date
Nikolay Edigaryev	772336a7bd	Scheduler: stop iterating over workers when candidate worker is found (#220 )	2024-11-13 17:59:08 +04:00
Nikolay Edigaryev	2a2ddea62a	Controller: emit lifecycle events when the VM gets restarted or deleted (#208 ) * Controller: emit lifecycle events when the VM gets restarted or deleted * vm_{scheduling,run}_time → vm_{scheduling,run}_duration for clarity * Update VM endpoint: only update VM started time when zero	2024-09-24 17:53:10 +04:00
Mark McWhirter	979af1f699	Expose 2 new metrics about worker health (#203 ) * Expose more metrics about worker health * PR feedback * PR feedback	2024-09-10 10:13:41 -04:00
Nikolay Edigaryev	ff0497b1d8	Produce OpenTelemetry metrics (#185 ) * .golangci.yml: remove mentions of deprecated linters * Fix "staticcheck" linter error by using grpc.NewClient * Configure OpenTelemetry Metrics only for now. * Produce OpenTelemetry metrics * Update DeploymentGuide.md Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com> * Update DeploymentGuide.md Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com> * Introduce "org.cirruslabs.orchard.controller.worker_status" --------- Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com>	2024-06-24 18:19:51 +04:00
Nikolay Edigaryev	60e564da88	Implement restart policy for VMs (#83 ) * Implement restart policy for VMs * Do not update VM.Resource, we only use it as a read-only specification * Err()/setErr(): use atomic.Pointer instead of sync.Mutex	2023-04-24 19:30:08 +04:00
Fedor Korotkov	010df300a3	Add basic Prometheus metrics (#82 ) Fixes #71	2023-04-21 10:05:01 +04:00
Nikolay Edigaryev	84633d0e45	Introduce "orchard pause" and "orchard resume" commands (#73 )	2023-04-07 22:59:41 +04:00
Nikolay Edigaryev	4eafec99a5	Fail VMs if the worker had crashed/is unhealthy (#70 ) * Fail VMs if the worker had crashed/is unhealthy * OnDiskName: properly handle cases when VM's name contains hyphens * Worker: introduce Offline() method and check it before scheduling * tart.List(): use Tart's JSON output * OnDiskName: remove empty parts check * Scheduler: move health-checking logic to a separate function * Only fail "running" VMs * Only fail orphaned VMs if they're in terminal state * Integration tests * Run healthCheckingLoopIteration() before schedulingLoopIteration() * Worker: sync on-disk VMs only once at start	2023-04-03 16:47:49 +04:00
Fedor Korotkov	f152043f19	Reactive Scheduling (#67 ) Before we had two main loops: controller loop to assign VMs and worker loop to start VMs. Each of the loops was performed upon an interval every N seconds. This change introduces a mechanism for reactively requesting loop execution: 1. Controller loop will be executed upon VM creation to try to immediately schedule. 2. A worker will be notified upon a VM assigment and worker loop will be requested to sync immediately. Fixes #31	2023-03-28 20:51:41 +04:00
Nikolay Edigaryev	cb39836ee0	Resources support (#63 ) * Resources support * Ability to provide VM and worker resources via the CLI * orchard dev: always listen on :6120 * orchard dev: support --resources * REST API: provide resource defaults when creating VM * OpenAPI: document "resources" field * orchard dev: serve Swagger API documentation on /v1/ * Integration guide	2023-03-27 17:30:54 +04:00

10 Commits