* Small timout for worker notification
It seems at the moment if a worker re-establishes notify stream (for example, if network flips or proxy breaks the connection) then we can see "no worker registered with this name" errors.
This change makes Notifier to wait for 30 seconds before failing, at the time of calling `Notifier#Notify` we know such worker exists.
PS not sure if we need to make the timeout configurable.
* Wait via context
* Make sure all `context`s for `Notify` is time bounded
* Lint issues
* Allow creating VMs with implicit CPU and memory
* Clarify why cpu/memory can be 0 a bit better
* Controller(API): don't forget to update DefaultCPU and DefaultMemory
* Add an integration test for implicit CPU and memory
* Introduce WebSocket-based RPC v2
* go test: add -ldflags="-B gobuildid"
* No need to change the "controller.workerNotifier.Notify()" error message
* No need to modify Protocol Buffers/gRPC generated code
* rpcWatch(): explain that connection shouldn't be normally be closed
* Avoid "port forwarding failed: " repetition in error messages
* Improve comments and avoid repetition in IP resolution errors
* API: only overwrite specific worker fields when worker already exists
* Don't forget to return when creating new worker
* Return updated worker when updating the worker
* POST /v1/workers: do not list workers in a single update txn
* schedulingLoopIteration(): do not list workers in a single update txn
* .golangci.yml: remove mentions of fully deprecated linters
* Schedule opportunistically and more granularly
To avoid transaction conflicts.
* Measure scheduling loop iteration duration and log it at debugging level
* Use "continue NextWorker" instead of just "continue" for clarity
* Controller: emit lifecycle events when the VM gets restarted or deleted
* vm_{scheduling,run}_time → vm_{scheduling,run}_duration for clarity
* Update VM endpoint: only update VM started time when zero
* proxy.Connections(): require io.ReadWriteCloser instead of net.Conn
* Orchard Controller: implement an SSH server that acts as a jump host
* Issue a warning if the name used will be invalid in the future
* Further restrict uppercase characters in names in the future
The rationale is similar to https://github.com/kubernetes/kubernetes/issues/71140.
We won't want to munge the user's input and introduce subtle bugs doing
lowercase comparisons.
* Switch from golang.org/x/net/websocket to nhooyr.io/websocket
* Do not attach errors that we can handle to the Gin's context
* Add missing newline to "no credentials specified or found, ..." message
* Fix potential NPE in ChooseUsernameAndPassword()
* Fix type in PortForward() error message in "orchard ssh vm"
* Fix potential NPE in Connections()
* Use header.Set() for consistency's sake for Authorization header
* Change event prefix to preserve order under load
When there are a lot of events streamed from a worker, it's possible to have two batches coming for the same timestamp (which is a timestamp of the event on the worker). This way the existing logic would mess up the order because `index` and the random number doesn't guarantee the order.
To fix this I've changed the format of the prefix for the event to include tro things:
1. Timestamp in nanoseconds of the injection time on the controller so two sequential batches will have guaranteed order unless they are processed within a nanosecond.
2. Made the `index` being fixed length with trailing zeros, so they are properly lexicographically sorted (`000001`, `000002`, ...).
* No need to disable linting
* Implement restart policy for VMs
* Do not update VM.Resource, we only use it as a read-only specification
* Err()/setErr(): use atomic.Pointer instead of sync.Mutex
* Controller API: introduce controller's information endpoint
* Prevent generation of empty events after channel closure
* Allow events to be buffered in the events channel
* Controller API: introduce controller's information endpoint[1]
* IntegrationGuide.md: a couple of Python and Golang examples
* Rephrase a sentence
Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com>
---------
Co-authored-by: Fedor Korotkov <fedor.korotkov@gmail.com>
* Fail VMs if the worker had crashed/is unhealthy
* OnDiskName: properly handle cases when VM's name contains hyphens
* Worker: introduce Offline() method and check it before scheduling
* tart.List(): use Tart's JSON output
* OnDiskName: remove empty parts check
* Scheduler: move health-checking logic to a separate function
* Only fail "running" VMs
* Only fail orphaned VMs if they're in terminal state
* Integration tests
* Run healthCheckingLoopIteration() before schedulingLoopIteration()
* Worker: sync on-disk VMs only once at start
Before we had two main loops: controller loop to assign VMs and worker loop to start VMs. Each of the loops was performed upon an interval every N seconds.
This change introduces a mechanism for reactively requesting loop execution:
1. Controller loop will be executed upon VM creation to try to immediately schedule.
2. A worker will be notified upon a VM assigment and worker loop will be requested to sync immediately.
Fixes#31
* Resources support
* Ability to provide VM and worker resources via the CLI
* orchard dev: always listen on :6120
* orchard dev: support --resources
* REST API: provide resource defaults when creating VM
* OpenAPI: document "resources" field
* orchard dev: serve Swagger API documentation on /v1/
* Integration guide